batching

Batch text chunks together to fit within a model's context window.

Functions:

Name	Description
`batch_chunks_by_token_limit`	Batch text chunks together to fit within a model's context window.

batch_chunks_by_token_limit 🔗

batch_chunks_by_token_limit(chunks: Iterable[str], model: Model, buffer_percentage: float = 0.2, joiner: str = '\n\n---\n\n') -> Iterator[str]

Batch text chunks together to fit within a model's context window.

This function takes an iterable of text chunks and combines them into larger batches that will fit within the specified model's context window. It uses token estimation to determine how many chunks can be combined safely.

Chunks are combined with a joiner (defined by the joiner parameter) to help the model distinguish between different chunks in the same batch. The joiner is surrounded by double newlines for better visual separation.

This batching optimizes API usage by reducing the number of calls to the LLM provider, which can help avoid rate limiting errors for repositories with many commits or other text chunks.

The size of each chunk is overestimated to be conservative, that is, to ensure that the chunk will fit within the context window. The size of each batch is estimated by summing the sizes of its chunks, which is also an approximation.

Parameters:

Name	Type	Description	Default
`chunks` 🔗	`Iterable[str]`	An iterable of strings, where each string is a chunk of text.	required
`model` 🔗	`Model`	The model to use, which determines the context window size.	required
`buffer_percentage` 🔗	`float`	A percentage (0.0 to 1.0) of the context window to reserve as buffer. This helps prevent exceeding the context window due to estimation errors. Higher values (e.g., 0.3) are more conservative, while lower values (e.g., 0.1) allow for more chunks per batch but increase the risk of exceeding the limit.	`0.2`
`joiner` 🔗	`str`	The string to use for joining chunks when batching them together.	`'\n\n---\n\n'`

Yields:

Type	Description
`str`	Batches of text chunks, each fitting within the model's context window.

Source code in src/brag/batching.py

def batch_chunks_by_token_limit(
    chunks: Iterable[str],
    model: Model,
    buffer_percentage: float = 0.2,
    joiner: str = "\n\n---\n\n",
) -> Iterator[str]:
    """Batch text chunks together to fit within a model's context window.

    This function takes an iterable of text chunks and combines them into larger
    batches that will fit within the specified model's context window. It uses
    token estimation to determine how many chunks can be combined safely.

    Chunks are combined with a joiner (defined by the `joiner` parameter) to help
    the model distinguish between different chunks in the same batch. The joiner is
    surrounded by double newlines for better visual separation.

    This batching optimizes API usage by reducing the number of calls to the LLM provider,
    which can help avoid rate limiting errors for repositories with many commits or
    other text chunks.

    The size of each chunk is overestimated to be conservative, that is, to ensure that the chunk
    will fit within the context window. The size of each batch is estimated by summing the sizes of
    its chunks, which is also an approximation.

    Args:
        chunks: An iterable of strings, where each string is a chunk of text.
        model: The model to use, which determines the context window size.
        buffer_percentage: A percentage (0.0 to 1.0) of the context window to reserve as buffer.
            This helps prevent exceeding the context window due to estimation errors.
            Higher values (e.g., 0.3) are more conservative, while lower values (e.g., 0.1)
            allow for more chunks per batch but increase the risk of exceeding the limit.
        joiner: The string to use for joining chunks when batching them together.

    Yields:
        Batches of text chunks, each fitting within the model's context window.
    """
    # Get model context window size
    max_tokens_per_batch = int(model.context_window_size * (1 - buffer_percentage))

    current_batch: str = ""
    current_batch_tokens = 0

    for chunk in chunks:
        # Use the overestimate strategy to be conservative, that is, to ensure that the chunk
        # will fit within the context window.
        chunk_tokens = estimate_token_count(chunk, approximation_mode="overestimate")

        # If adding this chunk would exceed the limit, yield the current batch and start a new one
        if current_batch and (
            current_batch_tokens + chunk_tokens > max_tokens_per_batch
        ):
            yield current_batch
            current_batch = chunk
            current_batch_tokens = chunk_tokens
        else:
            current_batch = promptify(current_batch, chunk, joiner=joiner)
            current_batch_tokens += chunk_tokens

    # Add the last batch if it's not empty
    if current_batch:
        yield current_batch

batching

batch_chunks_by_token_limit 🔗

`chunks` 🔗

`model` 🔗

`buffer_percentage` 🔗

`joiner` 🔗

batching

batch_chunks_by_token_limit 🔗

chunks 🔗

model 🔗

buffer_percentage 🔗

joiner 🔗

`chunks` 🔗

`model` 🔗

`buffer_percentage` 🔗

`joiner` 🔗