-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Update perplexity cost tracking #15556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update perplexity cost tracking #15556
Conversation
This fixes Claude's models via the Converse API, which should also fix Claude Code.
* fix(router): update model_name_to_deployment_indices on deployment removal When a deployment is deleted, the model_name_to_deployment_indices map was not being updated, causing stale index references. This could lead to incorrect routing behavior when deployments with the same model_name were dynamically removed. Changes: - Update _update_deployment_indices_after_removal to maintain model_name_to_deployment_indices mapping - Remove deleted indices and decrement indices greater than removed index - Clean up empty entries when no deployments remain for a model name - Update test to verify proper index shifting and cleanup behavior * fix(router): remove redundant index building during initialization Remove duplicate index building operations that were causing unnecessary work during router initialization: 1. Removed redundant `_build_model_id_to_deployment_index_map` call in __init__ - `set_model_list` already builds all indices from scratch 2. Removed redundant `_build_model_name_index` call at end of `set_model_list` - the index is already built incrementally via `_create_deployment` -> `_add_model_to_list_and_index_map` Both indices (model_id_to_deployment_index_map and model_name_to_deployment_indices) are properly maintained as lookup indexes through existing helper methods. This change eliminates O(N) duplicate work during initialization without any behavioral changes. The indices continue to be correctly synchronized with model_list on all operations (add/remove/upsert).
…environment (#14929) Co-authored-by: sotazhang <[email protected]>
merge main
merge main
* fix(router): update model_name_to_deployment_indices on deployment removal When a deployment is deleted, the model_name_to_deployment_indices map was not being updated, causing stale index references. This could lead to incorrect routing behavior when deployments with the same model_name were dynamically removed. Changes: - Update _update_deployment_indices_after_removal to maintain model_name_to_deployment_indices mapping - Remove deleted indices and decrement indices greater than removed index - Clean up empty entries when no deployments remain for a model name - Update test to verify proper index shifting and cleanup behavior * fix(router): remove redundant index building during initialization Remove duplicate index building operations that were causing unnecessary work during router initialization: 1. Removed redundant `_build_model_id_to_deployment_index_map` call in __init__ - `set_model_list` already builds all indices from scratch 2. Removed redundant `_build_model_name_index` call at end of `set_model_list` - the index is already built incrementally via `_create_deployment` -> `_add_model_to_list_and_index_map` Both indices (model_id_to_deployment_index_map and model_name_to_deployment_indices) are properly maintained as lookup indexes through existing helper methods. This change eliminates O(N) duplicate work during initialization without any behavioral changes. The indices continue to be correctly synchronized with model_list on all operations (add/remove/upsert).
…environment (#14929) Co-authored-by: sotazhang <[email protected]>
Add tiered pricing and cost calculation for xai
…_block_repair Add support for thinking blocks and redacted thinking blocks in Anthropic v1/messages API
(feat) Add voyage model integration in sagemaker
Add support for extended thinking in Anthropic's models via Bedrock's Converse API
* docs: fix doc * docs(index.md): bump rc * [Fix] GEMINI - CLI - add google_routes to llm_api_routes (#15500) * fix: add google_routes to llm_api_routes * test: test_virtual_key_llm_api_routes_allows_google_routes * build: bump version * bump: version 1.78.0 → 1.78.1 * add application level encryption in SQS * add application level encryption in SQS --------- Co-authored-by: Krrish Dholakia <[email protected]> Co-authored-by: Ishaan Jaff <[email protected]> Co-authored-by: deepanshu <[email protected]>
…t/completions API with LiteLLM (#15509) * docs: fix doc * docs(index.md): bump rc * [Fix] GEMINI - CLI - add google_routes to llm_api_routes (#15500) * fix: add google_routes to llm_api_routes * test: test_virtual_key_llm_api_routes_allows_google_routes * add AnthropicCitation * fix async_post_call_success_deployment_hook * fix add vector_store_custom_logger to global callbacks * test_e2e_bedrock_knowledgebase_retrieval_with_llm_api_call * async_post_call_success_deployment_hook * add async_post_call_streaming_deployment_hook * async def test_e2e_bedrock_knowledgebase_retrieval_with_llm_api_call_streaming(setup_vector_store_registry): * fix _call_post_streaming_deployment_hook * fix async_post_call_streaming_deployment_hook * test update * docs: Accessing Search Results * docs KB * fix chatUI * fix searchResults * fix onSearchResults * fix kb --------- Co-authored-by: Krrish Dholakia <[email protected]>
* docs: fix doc * docs(index.md): bump rc * [Fix] GEMINI - CLI - add google_routes to llm_api_routes (#15500) * fix: add google_routes to llm_api_routes * test: test_virtual_key_llm_api_routes_allows_google_routes * build: bump version * bump: version 1.78.0 → 1.78.1 * fix: KeyRequestBase * fix rpm_limit_type * fix dynamic rate limits * fix use dynamic limits here * fix _should_enforce_rate_limit * fix _should_enforce_rate_limit * fix counter * test_dynamic_rate_limiting_v3 * use _create_rate_limit_descriptors --------- Co-authored-by: Krrish Dholakia <[email protected]>
Litellm fix mypy ruff errors1
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def _get_openai_compatible_provider_info( | ||
| self, api_base: Optional[str], api_key: Optional[str] | ||
| ) -> Tuple[Optional[str], Optional[str]]: | ||
| api_base = api_base or get_secret_str("PERPLEXITY_API_BASE") or "https://api.perplexity.ai" # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't work on python 3.8, please don't change this
| encoding: Any, | ||
| api_key: Optional[str] = None, | ||
| json_mode: Optional[bool] = None, | ||
| encoding: Any, # noqa: ANN401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same point ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
| # Store cost in hidden params for the cost calculator to use | ||
| if not hasattr(model_response, "_hidden_params"): | ||
| model_response._hidden_params = {} # noqa: SLF001 | ||
| if "additional_headers" not in model_response._hidden_params: # noqa: SLF001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we not avoid all these linting rules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
| api_key: Optional[str] = None, | ||
| json_mode: Optional[bool] = None, | ||
| encoding: Any, | ||
| api_key: str | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert this - it will fail on python 3.8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
789eec0 to
5a637cc
Compare
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rebase with main
|
Closing this due to many conflicts. New one - #15743 |
Title
Add Perplexity cost extraction from API response
Relevant issues
Fixes #15547
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unitType
Bug FIx
Changes
This PR adds cost extraction functionality to Perplexity's chat transformation, allowing Perplexity to override LiteLLM's cost calculation by providing its own cost information in the API response.
I have updated perplexity provider to base_llm_http_handler and its own transformation methods. And I have used concept of over riding the cost as now perplexity provides cost in its response itself
This change ensures Perplexity users get accurate cost tracking that reflects the provider's actual pricing, improving cost transparency and accuracy for Perplexity API usage.
Change in perplexity API repsonse: