ReClaim API Reference¶
This reference is written for Read the Docs and mirrors the current code in src/reclaim. It outlines the public surface area and the most relevant internal helpers so new contributors can understand how text is decomposed, aligned, and annotated.
Top-Level Package¶
- reclaim.extract_claims(text: str, model: str = 'gpt-4o') List[Claim]¶
Extract atomic claims from plain text using the LLM prompt flow in
doc2sentences. ReturnsClaimobjects with onlyclaim_textpopulated; alignment fields are left empty.
- reclaim.extract_and_align_claims(text, tokens, tokenizer, openai_model: str = 'gpt-4o', progress_bar: bool = True, n_threads: int = 1)¶
Extract claims from generated text and align each claim back to the token ids that produced it. Internally builds a
ClaimsExtractorwired to anOpenAIChatinstance.
- reclaim.batch_extract_and_align_claims(texts: List[str], tokens: List[List[int]], tokenizer, openai_model: str = 'gpt-4o', progress_bar: bool = True, n_threads: int = 1) List[List[Claim]]¶
Batch wrapper over
ClaimsExtractor.batch_claims_from_textsto process multiple model outputs in parallel. Returns a list of claim lists aligned to each input text.
- reclaim.annotate_claims(claims: List[str], contexts: List[str], openai_model: str = 'gpt-4o', progress_bar: bool = True, n_threads: int = 1)¶
Annotate claims for faithfulness and factuality relative to their contexts by delegating to
ClaimsAnnotator.
Data Models¶
- class reclaim.extract_claims.Claim(claim_text: str, decoded_claim: str, sentence: str, aligned_token_ids: List[int])¶
Dataclass representing a single claim.
claim_textholds the normalized claim,decoded_claimcarries the tokenizer-decoded span,sentencestores the sentence that supported the claim, andaligned_token_idslists token indices aligned to the claim.
- class reclaim.extract_claims.ClaimModel¶
Pydantic model with
claims: List[str]used to parse claim lists from LLM responses.
- class reclaim.extract_claims.ClaimSentence¶
Pydantic model with
sentenceandrelated_wordsused when aligning claims to text spans.
- class reclaim.extract_claims.ClaimSentences¶
Wrapper Pydantic model with
sentences: List[ClaimSentence].
Extraction and Alignment¶
- class reclaim.extract_claims.ClaimsExtractor(openai_chat: OpenAIChat, sent_separators: str = '.?!。?!\\n', language: str = 'en', progress_bar: bool = False, extraction_prompts: Dict[str, str] = CLAIM_EXTRACTION_PROMPTS, matching_prompts: Dict[str, str] = MATCHING_PROMPTS, n_threads: int = 1)¶
High-level orchestrator for claim extraction and alignment.
Initializes with prompt templates, language, and threading options.
Uses
OpenAIChatto call OpenAI for both claim extraction and matching.
- batch_claims_from_texts(texts: List[str], tokens: List[List[int]], tokenizer) Dict[str, List]¶
Runs
claims_from_text()across many texts concurrently via aThreadPoolExecutor. Returns aligned claims per input text, preserving order.
- claims_from_text(text: str, tokens: List[int], tokenizer) List[Claim]¶
Extract claims from a single generated text, identify supporting sentences, and align matched spans back to token indices. Filters out some known low-quality claims before alignment. Produces
Claimobjects with sentence provenance and token alignment.
- _claims_from_sentence(sent: str, sent_tokens: List[int], tokenizer) List[Claim]¶
Internal helper to extract claims from an individual sentence and align them within that sentence using prompt-driven matching.
- _match_string(sent: str, match_words: List[str]) str | None¶
Build a caret mask over
sentmarking the positions of orderedmatch_wordswhen they appear at word boundaries. Raises if not all words are found.
- _match_string_zh(sent: str, match_words: List[str]) str | None¶
Chinese-specific matcher that marks contiguous spans of characters corresponding to each match word; returns
Noneif any span is missing.
- _align(sent: str, match_str: str, sent_tokens: List[int], tokenizer) List[int]¶
Convert a character-level match mask into token indices by decoding tokens and checking for overlap with carets in
match_str.
- match_claim(text: str, claim: str, max_parsed_words: int)¶
Query the LLM for sentences and related words supporting
claim, construct a full-text caret mask, and pick the sentence with the richest match. Returns the mask and the best supporting sentence.
Annotation¶
- class reclaim.annotate_claims.ClaimsAnnotator(openai_model: str = 'gpt-4o', progress_bar: bool = True, n_threads: int = 1)¶
Thin wrapper that labels claims for faithfulness and factuality.
- annotate_claims(claims: list[str], contexts: list[str], language: str = 'en')¶
Annotate many claims in parallel against their paired contexts, returning model responses (typically boolean label pairs).
- _annotate_claim(claim: str, context: str, language: str = 'en')¶
Internal worker invoked in the thread pool; sends the annotation prompt and returns the parsed result.
Text Decomposition¶
- reclaim.decompose.doc2sentences(doc: str, mode: str = 'independent_sentences', model: str = 'gpt-4o', system_role: str = 'You are good at decomposing and decontextualizing text.', num_retries: int = 5, schema: BaseModel | None = None) List[str]¶
Core utility that asks an LLM to decompose a document. The
modeselects the appropriate prompt (plain sentences, independent sentences, claims, or atomic claims). Retries failed requests up tonum_retriesand optionally parses structured outputs via a Pydantic schema.
OpenAI Client Utilities¶
- class reclaim.openai_client.OpenAIChat(openai_model: str = 'gpt-4o', base_url: str | None = None, cache_path: str = '~/.cache', timeout: int = 600, max_tokens: int | None = None, rewrite_cache: bool = False)¶
Lightweight chat client used across extraction and annotation.
Reads
OPENAI_API_KEYfrom the environment.Stores cache path metadata (no cache implementation included here).
Supports optional base URL, timeout, and completion length limits.
- ask(message: str, schema: BaseModel | None = None)¶
Send a prompt to OpenAI with a configurable system role. If
schemais provided, usesresponses.parse; otherwise returns the string content while filtering common boilerplate refusals.
- _send_request(messages, schema: BaseModel | None = None)¶
Internal request dispatcher with progressive backoff across several wait intervals.