ReClaim API Reference
=====================

This reference is written for Read the Docs and mirrors the current code in ``src/reclaim``. It outlines the public surface area and the most relevant internal helpers so new contributors can understand how text is decomposed, aligned, and annotated.

Top-Level Package
-----------------

.. module:: reclaim

.. py:function:: extract_claims(text: str, model: str = "gpt-4o") -> List[Claim]

   Extract atomic claims from plain text using the LLM prompt flow in ``doc2sentences``. Returns ``Claim`` objects with only ``claim_text`` populated; alignment fields are left empty.

.. py:function:: extract_and_align_claims(text, tokens, tokenizer, openai_model: str = "gpt-4o", progress_bar: bool = True, n_threads: int = 1)

   Extract claims from generated text and align each claim back to the token ids that produced it. Internally builds a :class:`~reclaim.extract_claims.ClaimsExtractor` wired to an :class:`~reclaim.openai_client.OpenAIChat` instance.

.. py:function:: batch_extract_and_align_claims(texts: List[str], tokens: List[List[int]], tokenizer, openai_model: str = "gpt-4o", progress_bar: bool = True, n_threads: int = 1) -> List[List[Claim]]

   Batch wrapper over ``ClaimsExtractor.batch_claims_from_texts`` to process multiple model outputs in parallel. Returns a list of claim lists aligned to each input text.

.. py:function:: annotate_claims(claims: List[str], contexts: List[str], openai_model: str = "gpt-4o", progress_bar: bool = True, n_threads: int = 1)

   Annotate claims for faithfulness and factuality relative to their contexts by delegating to :class:`~reclaim.annotate_claims.ClaimsAnnotator`.

Data Models
-----------

.. module:: reclaim.extract_claims

.. py:class:: Claim(claim_text: str, decoded_claim: str, sentence: str, aligned_token_ids: List[int])

   Dataclass representing a single claim. ``claim_text`` holds the normalized claim, ``decoded_claim`` carries the tokenizer-decoded span, ``sentence`` stores the sentence that supported the claim, and ``aligned_token_ids`` lists token indices aligned to the claim.

.. py:class:: ClaimModel

   Pydantic model with ``claims: List[str]`` used to parse claim lists from LLM responses.

.. py:class:: ClaimSentence

   Pydantic model with ``sentence`` and ``related_words`` used when aligning claims to text spans.

.. py:class:: ClaimSentences

   Wrapper Pydantic model with ``sentences: List[ClaimSentence]``.

Extraction and Alignment
------------------------

.. py:class:: ClaimsExtractor(openai_chat: OpenAIChat, sent_separators: str = ".?!。？！\\n", language: str = "en", progress_bar: bool = False, extraction_prompts: Dict[str, str] = CLAIM_EXTRACTION_PROMPTS, matching_prompts: Dict[str, str] = MATCHING_PROMPTS, n_threads: int = 1)

   High-level orchestrator for claim extraction and alignment.

   - Initializes with prompt templates, language, and threading options.
   - Uses :class:`~reclaim.openai_client.OpenAIChat` to call OpenAI for both claim extraction and matching.

   .. py:method:: batch_claims_from_texts(texts: List[str], tokens: List[List[int]], tokenizer) -> Dict[str, List]

      Runs :meth:`claims_from_text` across many texts concurrently via a ``ThreadPoolExecutor``. Returns aligned claims per input text, preserving order.

   .. py:method:: claims_from_text(text: str, tokens: List[int], tokenizer) -> List[Claim]

      Extract claims from a single generated text, identify supporting sentences, and align matched spans back to token indices. Filters out some known low-quality claims before alignment. Produces ``Claim`` objects with sentence provenance and token alignment.

   .. py:method:: _claims_from_sentence(sent: str, sent_tokens: List[int], tokenizer) -> List[Claim]

      Internal helper to extract claims from an individual sentence and align them within that sentence using prompt-driven matching.

   .. py:method:: _match_string(sent: str, match_words: List[str]) -> Optional[str]

      Build a caret mask over ``sent`` marking the positions of ordered ``match_words`` when they appear at word boundaries. Raises if not all words are found.

   .. py:method:: _match_string_zh(sent: str, match_words: List[str]) -> Optional[str]

      Chinese-specific matcher that marks contiguous spans of characters corresponding to each match word; returns ``None`` if any span is missing.

   .. py:method:: _align(sent: str, match_str: str, sent_tokens: List[int], tokenizer) -> List[int]

      Convert a character-level match mask into token indices by decoding tokens and checking for overlap with carets in ``match_str``.

   .. py:method:: match_claim(text: str, claim: str, max_parsed_words: int)

      Query the LLM for sentences and related words supporting ``claim``, construct a full-text caret mask, and pick the sentence with the richest match. Returns the mask and the best supporting sentence.

Annotation
----------

.. module:: reclaim.annotate_claims

.. py:class:: ClaimsAnnotator(openai_model: str = "gpt-4o", progress_bar: bool = True, n_threads: int = 1)

   Thin wrapper that labels claims for faithfulness and factuality.

   .. py:method:: annotate_claims(claims: list[str], contexts: list[str], language: str = "en")

      Annotate many claims in parallel against their paired contexts, returning model responses (typically boolean label pairs).

   .. py:method:: _annotate_claim(claim: str, context: str, language: str = "en")

      Internal worker invoked in the thread pool; sends the annotation prompt and returns the parsed result.

Text Decomposition
------------------

.. module:: reclaim.decompose

.. py:function:: doc2sentences(doc: str, mode: str = "independent_sentences", model: str = "gpt-4o", system_role: str = "You are good at decomposing and decontextualizing text.", num_retries: int = 5, schema: Optional[BaseModel] = None) -> List[str]

   Core utility that asks an LLM to decompose a document. The ``mode`` selects the appropriate prompt (plain sentences, independent sentences, claims, or atomic claims). Retries failed requests up to ``num_retries`` and optionally parses structured outputs via a Pydantic schema.

OpenAI Client Utilities
-----------------------

.. module:: reclaim.openai_client

.. py:class:: OpenAIChat(openai_model: str = "gpt-4o", base_url: Optional[str] = None, cache_path: str = "~/.cache", timeout: int = 600, max_tokens: Optional[int] = None, rewrite_cache: bool = False)

   Lightweight chat client used across extraction and annotation.

   - Reads ``OPENAI_API_KEY`` from the environment.
   - Stores cache path metadata (no cache implementation included here).
   - Supports optional base URL, timeout, and completion length limits.

   .. py:method:: ask(message: str, schema: Optional[BaseModel] = None)

      Send a prompt to OpenAI with a configurable system role. If ``schema`` is provided, uses ``responses.parse``; otherwise returns the string content while filtering common boilerplate refusals.

   .. py:method:: _send_request(messages, schema: Optional[BaseModel] = None)

      Internal request dispatcher with progressive backoff across several wait intervals.