ReClaim API Reference¶

This reference is written for Read the Docs and mirrors the current code in src/reclaim. It outlines the public surface area and the most relevant internal helpers so new contributors can understand how text is decomposed, aligned, and annotated.

Top-Level Package¶

reclaim.extract_claims(text: str, model: str = 'gpt-4o') → List[Claim]¶: Extract atomic claims from plain text using the LLM prompt flow in doc2sentences. Returns Claim objects with only claim_text populated; alignment fields are left empty.

reclaim.extract_and_align_claims(text, tokens, tokenizer, openai_model: str = 'gpt-4o', progress_bar: bool = True, n_threads: int = 1)¶: Extract claims from generated text and align each claim back to the token ids that produced it. Internally builds a ClaimsExtractor wired to an OpenAIChat instance.

reclaim.batch_extract_and_align_claims(texts: List[str], tokens: List[List[int]], tokenizer, openai_model: str = 'gpt-4o', progress_bar: bool = True, n_threads: int = 1) → List[List[Claim]]¶: Batch wrapper over ClaimsExtractor.batch_claims_from_texts to process multiple model outputs in parallel. Returns a list of claim lists aligned to each input text.

reclaim.annotate_claims(claims: List[str], contexts: List[str], openai_model: str = 'gpt-4o', progress_bar: bool = True, n_threads: int = 1)¶: Annotate claims for faithfulness and factuality relative to their contexts by delegating to ClaimsAnnotator.

Data Models¶

class reclaim.extract_claims.Claim(claim_text: str, decoded_claim: str, sentence: str, aligned_token_ids: List[int])¶: Dataclass representing a single claim. claim_text holds the normalized claim, decoded_claim carries the tokenizer-decoded span, sentence stores the sentence that supported the claim, and aligned_token_ids lists token indices aligned to the claim.

class reclaim.extract_claims.ClaimModel¶: Pydantic model with claims: List[str] used to parse claim lists from LLM responses.

class reclaim.extract_claims.ClaimSentence¶: Pydantic model with sentence and related_words used when aligning claims to text spans.

class reclaim.extract_claims.ClaimSentences¶: Wrapper Pydantic model with sentences: List[ClaimSentence].

Extraction and Alignment¶

class reclaim.extract_claims.ClaimsExtractor(openai_chat: OpenAIChat, sent_separators: str = '.?!。？！\\n', language: str = 'en', progress_bar: bool = False, extraction_prompts: Dict[str, str] = CLAIM_EXTRACTION_PROMPTS, matching_prompts: Dict[str, str] = MATCHING_PROMPTS, n_threads: int = 1)¶

High-level orchestrator for claim extraction and alignment.

Initializes with prompt templates, language, and threading options.
Uses OpenAIChat to call OpenAI for both claim extraction and matching.

batch_claims_from_texts(texts: List[str], tokens: List[List[int]], tokenizer) → Dict[str, List]¶: Runs claims_from_text() across many texts concurrently via a ThreadPoolExecutor. Returns aligned claims per input text, preserving order.

claims_from_text(text: str, tokens: List[int], tokenizer) → List[Claim]¶: Extract claims from a single generated text, identify supporting sentences, and align matched spans back to token indices. Filters out some known low-quality claims before alignment. Produces Claim objects with sentence provenance and token alignment.

_claims_from_sentence(sent: str, sent_tokens: List[int], tokenizer) → List[Claim]¶: Internal helper to extract claims from an individual sentence and align them within that sentence using prompt-driven matching.

_match_string(sent: str, match_words: List[str]) → str | None¶: Build a caret mask over sent marking the positions of ordered match_words when they appear at word boundaries. Raises if not all words are found.

_match_string_zh(sent: str, match_words: List[str]) → str | None¶: Chinese-specific matcher that marks contiguous spans of characters corresponding to each match word; returns None if any span is missing.

_align(sent: str, match_str: str, sent_tokens: List[int], tokenizer) → List[int]¶: Convert a character-level match mask into token indices by decoding tokens and checking for overlap with carets in match_str.

match_claim(text: str, claim: str, max_parsed_words: int)¶: Query the LLM for sentences and related words supporting claim, construct a full-text caret mask, and pick the sentence with the richest match. Returns the mask and the best supporting sentence.

Annotation¶

class reclaim.annotate_claims.ClaimsAnnotator(openai_model: str = 'gpt-4o', progress_bar: bool = True, n_threads: int = 1)¶

Thin wrapper that labels claims for faithfulness and factuality.

annotate_claims(claims: list[str], contexts: list[str], language: str = 'en')¶: Annotate many claims in parallel against their paired contexts, returning model responses (typically boolean label pairs).

_annotate_claim(claim: str, context: str, language: str = 'en')¶: Internal worker invoked in the thread pool; sends the annotation prompt and returns the parsed result.

Text Decomposition¶

reclaim.decompose.doc2sentences(doc: str, mode: str = 'independent_sentences', model: str = 'gpt-4o', system_role: str = 'You are good at decomposing and decontextualizing text.', num_retries: int = 5, schema: BaseModel | None = None) → List[str]¶: Core utility that asks an LLM to decompose a document. The mode selects the appropriate prompt (plain sentences, independent sentences, claims, or atomic claims). Retries failed requests up to num_retries and optionally parses structured outputs via a Pydantic schema.

OpenAI Client Utilities¶

class reclaim.openai_client.OpenAIChat(openai_model: str = 'gpt-4o', base_url: str | None = None, cache_path: str = '~/.cache', timeout: int = 600, max_tokens: int | None = None, rewrite_cache: bool = False)¶

Lightweight chat client used across extraction and annotation.

Reads OPENAI_API_KEY from the environment.
Stores cache path metadata (no cache implementation included here).
Supports optional base URL, timeout, and completion length limits.

ask(message: str, schema: BaseModel | None = None)¶: Send a prompt to OpenAI with a configurable system role. If schema is provided, uses responses.parse; otherwise returns the string content while filtering common boilerplate refusals.

_send_request(messages, schema: BaseModel | None = None)¶: Internal request dispatcher with progressive backoff across several wait intervals.