-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Labels
CAMEL 2.0P0Task with high level priorityTask with high level priorityenhancementNew feature or requestNew feature or request
Milestone
Description
Required prerequisites
- I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
- Consider asking first in a Discussion.
Motivation
The current token counting implementation using BaseTokenCounter and its subclasses (OpenAITokenCounter,
AnthropicTokenCounter, LiteLLMTokenCounter, MistralTokenCounter) presents several significant challenges:
- Accuracy Issues: Manual token counting via tiktoken and other tokenizers is prone to inaccuracies, especially
with:
- Different model-specific tokenization rules (GPT-3.5, GPT-4, O1 models each have different tokens_per_message
and tokens_per_name values)
- Image token calculations for vision models requiring complex logic
- Model-specific edge cases and special tokens - Streaming Mode Limitations: Token counting in streaming mode is particularly problematic as:
- The full response isn't available until streaming completes
- Manual accumulation of streamed chunks is error-prone
- OpenAI now supports stream_options: {"include_usage": true} to get accurate usage in the final chunk - Maintenance Burden: Supporting all models requires:
- Model-specific token counter implementations for each provider
- Keeping up with changes in tokenization rules
- Complex logic for different content types (text, images, structured outputs)
Proposed Solution
Deprecate BaseTokenCounter and its implementations in favor of using the native usage data from LLM responses:
- OpenAI/Compatible APIs: Use response.usage which provides accurate prompt_tokens, completion_tokens, and
total_tokens - Streaming: Leverage stream_options: {"include_usage": true} to get usage data in the final streamed chunk
- Other providers: Each provider's SDK returns usage information in their response objects
Benefits
- Accuracy: Usage data comes directly from the model provider, ensuring 100% accuracy
- Simplicity: Eliminates ~500+ lines of complex token counting code
- Maintainability: No need to update tokenization logic when providers change their models
- Streaming support: Native support for token usage in streaming responses
- Universal compatibility: All major LLM providers include usage data in their responses
Migration Path
- Update model implementations to extract and return usage data from native responses
- Provide a deprecation warning for BaseTokenCounter usage
- Update documentation and examples to use the new approach
- Remove BaseTokenCounter and related code in a future major version
Code References
- Token counting implementation: camel/utils/token_counting.py:77-544
- Usage data already captured in some models: camel/models/litellm_model.py:217
- Streaming with usage example: examples/agents/chatagent_stream.py:44
Solution
No response
Alternatives
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
CAMEL 2.0P0Task with high level priorityTask with high level priorityenhancementNew feature or requestNew feature or request