Skip to content

No Output was Generated Flask OpenAI Endpoint #1852

@RobertEichner

Description

@RobertEichner

Bug description

I have an Endpoint written in flask to mimic an openai Endpoint. When i want to use this Endpoint in Huggingface Chat UI i get an error:

Image

Looking in the Debbuger, the content is empty compared to other requests. My flask reply looks like one from a working llm.

Steps to reproduce

The Flask Endpoint:

@app.route('/v1/chat/completions', methods=['POST'])
def chat_completion():
    try:
        # Get the JSON data from the request
        data = request.json

        print(data)

        # Here you would typically process the request and interact with your LLM
        # For this example, we'll just echo back a simple response

   
        client = OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")
        # Creating a mock response
        response = ai_response("test", "", client)

        #return jsonify(data), 200
        tmp = openai_object_to_dict(response)

        return jsonify(
            {
        "id": tmp["id"],
        "choices": [
            {
                "finish_reason": tmp["choices"][0]["finish_reason"],
                "logprobs": None,
                "index": tmp["choices"][0]["index"],
                "message": {
                    "content": tmp["choices"][0]["message"]["content"],
                    "role": tmp["choices"][0]["message"]["role"],
                }
            }
        ],
        "created": tmp["created"],
        "model": tmp["model"],
        "object": tmp["object"],
        "system_fingerprint": tmp["system_fingerprint"],
        "usage": {
            "completion_tokens": tmp["usage"]["completion_tokens"],
            "prompt_tokens": tmp["usage"]["prompt_tokens"],
            "total_tokens": tmp["usage"]["total_tokens"]
        },
        }), 200

The Code to serialize the openai chat completion:

def openai_object_to_dict(obj, visited=None):
    """
    Recursively converts an OpenAI response/model object into a JSON-serializable dictionary
    by calling __dict__ on each nested object, if available.

    :param obj: The initial object (e.g., a ChatCompletion response from OpenAI).
    :param visited: A set to track visited objects and avoid infinite recursion for circular references.
    :return: A dictionary (or basic Python object) that can be serialized to JSON.
    """
    if visited is None:
        visited = set()

    # Avoid infinite recursion in case of circular references
    obj_id = id(obj)
    if obj_id in visited:
        return None
    visited.add(obj_id)

    # Base types can be returned as-is
    if isinstance(obj, (str, int, float, bool, type(None))):
        return obj
    
    # If it's a list or tuple, process each item
    if isinstance(obj, (list, tuple)):
        return [openai_object_to_dict(item, visited) for item in obj]
    
    # If it's a dictionary, process each key/value
    if isinstance(obj, dict):
        return {k: openai_object_to_dict(v, visited) for k, v in obj.items()}

    # If the object has __dict__, convert it recursively
    if hasattr(obj, '__dict__'):
        # Here we convert all attributes, skipping those starting with '__' by convention
        return {
            k: openai_object_to_dict(v, visited)
            for k, v in obj.__dict__.items()
            if not k.startswith('__')
        }

    # Fallback: convert to string if we have no better way
    return str(obj)

When I access the Endpoint with Postman I get this result (the content is not empty):

{
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "Ich kann auf Ihre Anfrage leider keine qualifizierte Antwort liefern. Versuchen Sie eine andere Anfrage. Vielen Dank!",
                "role": "assistant"
            }
        }
    ],
    "created": 1749719897,
    "id": "chatcmpl-9n3g3ffhyean0ha9wf94k",
    "model": "gemma-3-12b-it",
    "object": "chat.completion",
    "system_fingerprint": "gemma-3-12b-it",
    "usage": {
        "completion_tokens": 23,
        "prompt_tokens": 396,
        "total_tokens": 419
    }
}

As some background, I host a gemma-3-12b-it on LM Studio. Accessing this model directly from the LM Studio Server works with Chat UI.
However, I want to implement a simple RAG. For this I need to wrap the model in a custom endpoint via flask.

I acces both endpoint the same in Chat UI local.env:

{
      "name": "Local Gemma 12b",
      "description": "LLM",
      "promptExamples": [
        {
          "title": "What is a LLM?",
          "prompt": "What is a LLM?"
        }
      ],
      "endpoints": [
        {
         "type": "openai",
         "model": "gemma-3-12b-it",
         "baseURL": "http://localhost:1234/v1",
        }
      ],
    },
    {
      "name": "RAG",
      "description": "RAG",
      "promptExamples": [
        {
          "title": "What is a LLM?",
          "prompt": "What is a LLM?"
        }
      ],
      "endpoints": [
        {
         "type": "openai",
         "model": "gemma-3-12b-it",
         "baseURL": "http://localhost:5001/v1",
        }
      ],
} ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions