tokens

Estimate the number of tokens in a text.

Functions:

Name	Description
`estimate_token_count`	Estimate the number of tokens in a text.

estimate_token_count 🔗

estimate_token_count(text: str, approximation_mode: Literal['underestimate', 'overestimate'] = 'overestimate') -> int

Estimate the number of tokens in a text.

This is a rough estimate and should only be used for rough comparisons.

Parameters:

Name	Type	Description	Default
`text` 🔗	`str`	The text to estimate the token count of.	required
`approximation_mode` 🔗	`Literal['underestimate', 'overestimate']`	The mode to use for the approximation. - "underestimate": Underestimate the token count. Useful to be conservative when avoiding hitting the context window limit. - "overestimate": Overestimate the token count. Useful to be generous when estimating the maximum number of tokens that can be used.	`'overestimate'`

Source code in src/brag/tokens.py

def estimate_token_count(
    text: str,
    approximation_mode: Literal["underestimate", "overestimate"] = "overestimate",
) -> int:
    """Estimate the number of tokens in a text.

    This is a rough estimate and should only be used for rough comparisons.

    Args:
        text: The text to estimate the token count of.
        approximation_mode: The mode to use for the approximation.
            - "underestimate": Underestimate the token count. Useful to be conservative when avoiding hitting the context window limit.
            - "overestimate": Overestimate the token count. Useful to be generous when estimating the maximum number of tokens that can be used.
    """
    match approximation_mode:
        case "underestimate":
            return math.floor(len(text) / 5)
        case "overestimate":
            return math.ceil(len(text) / 3)
        case never:
            assert_never(never)

tokens

estimate_token_count 🔗

`text` 🔗

`approximation_mode` 🔗

tokens

estimate_token_count 🔗

text 🔗

approximation_mode 🔗

`text` 🔗

`approximation_mode` 🔗