-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Description
What happened?
Description :
Unable to retrieve usage information and billing details for the GPT-4O-Transcribe series, which is trained based on the LLM and charged per input audio token.
Conditions and Context:
-docker image: ghcr.io/berriai/litellm-database:main-latest (11-02 pulled)
-Although the "input_cost_per_audio_token" parameter is set in the configuration file, but it is actually optional(right?).
-The gpt-4o-mini-transcribe and gpt-4o-transcribe series are speech transcription models trained based on LLM. The pricing is not based on the traditional audio duration but rather calculated according to the input audio tokens and output audio tokens, similar to the billing model of LLM.
Test result details:
-No usage information has been received from the final result.
-No billing information and usage information in the database of LiteLLM. All values are zero.
-It can be confirmed that the raw_response output in litellm log contains usage information.
Investigation result:
It seems that LitelLM does not manage the usage information and billing for audio.
file: litellm/litellm_core_utils/llm_response_utils/convert_dict_to_response.py
function: convert_to_model_response_object
Handle branches : elif response_type == "audio_transcription" and ...
Processing logic: Did not extract usage information like other branches.
elif response_type == "audio_transcription" and (
model_response_object is None
or isinstance(model_response_object, TranscriptionResponse)
):
if response_object is None:
raise Exception("Error in response object format")
if model_response_object is None:
model_response_object = TranscriptionResponse()
if "text" in response_object:
model_response_object.text = response_object["text"]
optional_keys = ["language", "task", "duration", "words", "segments"]
for key in optional_keys: # not guaranteed to be in response
if key in response_object:
setattr(model_response_object, key, response_object[key])
if hidden_params is not None:
model_response_object._hidden_params = hidden_params
if _response_headers is not None:
model_response_object._response_headers = _response_headers
return model_response_object
Relevant log output
# RAW RESPONSE:
10:24:03 - LiteLLM:DEBUG: litellm_logging.py:1034 - RAW RESPONSE:
{"text": "===Omit here===", "logprobs": null, "usage": {"input_tokens": 784, "output_tokens": 235, "total_tokens": 1019, "type": "tokens", "input_token_details": {"audio_tokens": 784, "text_tokens": 0}}}
# spend_tracking info:
10:24:03 - LiteLLM Proxy:DEBUG: spend_tracking_utils.py:416 - SpendTable: created payload - payload: {
"request_id": "b570d307-127b-431b-999b-732f646bd156",
"call_type": "atranscription",
"api_key": "==omit here==",
"cache_hit": "None",
"startTime": "2025-11-05 10:23:48.110804+00:00",
"endTime": "2025-11-05 10:24:03.413610+00:00",
"completionStartTime": "2025-11-05 10:24:03.413610+00:00",
"model": "gpt-4o-mini-transcribe",
"user": "default_user_id",
"team_id": "",
"metadata": "{\"user_api_key\": \"==omit here==\", \"user_api_key_alias\": null, \"user_api_key_team_id\": null, \"user_api_key_org_id\": null, \"user_api_key_user_id\": \"default_user_id\", \"user_api_key_team_alias\": null, \"requester_ip_address\": \"\", \"applied_guardrails\": [], \"batch_models\": null, \"mcp_tool_call_metadata\": null, \"vector_store_request_metadata\": null, \"guardrail_information\": null, \"usage_object\": {\"completion_tokens\": 0, \"prompt_tokens\": 0, \"total_tokens\": 0, \"completion_tokens_details\": null, \"prompt_tokens_details\": null}, \"model_map_information\": {\"model_map_key\": \"gpt-4o-mini-transcribe\", \"model_map_value\": {\"key\": \"azure/gpt-4o-mini-transcribe\", \"max_tokens\": null, \"max_input_tokens\": 16000, \"max_output_tokens\": 2000, \"input_cost_per_token\": 1.25e-06, \"input_cost_per_token_flex\": null, \"input_cost_per_token_priority\": null, \"cache_creation_input_token_cost\": null, \"cache_read_input_token_cost\": null, \"cache_read_input_token_cost_flex\": null, \"cache_read_input_token_cost_priority\": null, \"cache_creation_input_token_cost_above_1hr\": null, \"input_cost_per_character\": null, \"input_cost_per_token_above_128k_tokens\": null, \"input_cost_per_token_above_200k_tokens\": null, \"input_cost_per_query\": null, \"input_cost_per_second\": null, \"input_cost_per_audio_token\": 3e-06, \"input_cost_per_token_batches\": null, \"output_cost_per_token_batches\": null, \"output_cost_per_token\": 5e-06, \"output_cost_per_token_flex\": null, \"output_cost_per_token_priority\": null, \"output_cost_per_audio_token\": null, \"output_cost_per_character\": null, \"output_cost_per_reasoning_token\": null, \"output_cost_per_token_above_128k_tokens\": null, \"output_cost_per_character_above_128k_tokens\": null, \"output_cost_per_token_above_200k_tokens\": null, \"output_cost_per_second\": null, \"output_cost_per_video_per_second\": null, \"output_cost_per_image\": null, \"output_vector_size\": null, \"citation_cost_per_token\": null, \"tiered_pricing\": null, \"litellm_provider\": \"azure\", \"mode\": \"audio_transcription\", \"supports_system_messages\": null, \"supports_response_schema\": null, \"supports_vision\": null, \"supports_function_calling\": null, \"supports_tool_choice\": null, \"supports_assistant_prefill\": null, \"supports_prompt_caching\": null, \"supports_audio_input\": null, \"supports_audio_output\": null, \"supports_pdf_input\": null, \"supports_embedding_image_input\": null, \"supports_native_streaming\": null, \"supports_web_search\": null, \"supports_url_context\": null, \"supports_reasoning\": null, \"supports_computer_use\": null, \"search_context_cost_per_query\": null, \"tpm\": null, \"rpm\": null, \"ocr_cost_per_page\": null, \"annotation_cost_per_page\": null, \"supported_openai_params\": [\"temperature\", \"n\", \"stream\", \"stream_options\", \"stop\", \"max_tokens\", \"max_completion_tokens\", \"tools\", \"tool_choice\", \"presence_penalty\", \"frequency_penalty\", \"logit_bias\", \"user\", \"function_call\", \"functions\", \"tools\", \"tool_choice\", \"top_p\", \"logprobs\", \"top_logprobs\", \"response_format\", \"seed\", \"extra_headers\", \"parallel_tool_calls\", \"prediction\", \"modalities\", \"audio\", \"web_search_options\"]}}, \"cold_storage_object_key\": null, \"additional_usage_values\": {}}",
"cache_key": "Cache OFF",
"spend": 0.0,
"total_tokens": 0,
"prompt_tokens": 0,
"completion_tokens": 0,
"request_tags": "[\"User-Agent: python-requests\", \"User-Agent: python-requests/2.32.3\"]",
"end_user": "==omit here==",
"api_base": "https://==omit here==/openai/",
"model_group": "azure/gpt-4o-mini-transcribe",
"model_id": "d78bb8ace1a641d937acbc010e3174c9ff935f17adb313a5f888747bd28844ed",
"mcp_namespaced_tool_name": null,
"requester_ip_address": "",
"custom_llm_provider": "azure",
"messages": "{}",
"response": "{\"text\": \"\"}",
"proxy_server_request": "{\"model\": \"azure/gpt-4o-mini-transcribe\"}",
"session_id": "af7d9117-5326-402c-8cfa-fe4479bff2eb",
"status": "success"
}Are you a ML Ops Team?
No
What LiteLLM version are you on ?
v1.79.1.rc.2
Twitter / LinkedIn details
No response