Skip to content

Refactor + Add support for /v1/chat/completions#2

Merged
dvcrn merged 17 commits intomainfrom
refactor
Oct 8, 2025
Merged

Refactor + Add support for /v1/chat/completions#2
dvcrn merged 17 commits intomainfrom
refactor

Conversation

@dvcrn
Copy link
Owner

@dvcrn dvcrn commented Oct 8, 2025

No description provided.

Introduce StreamGenerateContent to forward raw SSE lines to a provided
channel. Handle 401 by refreshing the token and retrying. Set Accept:
text/event-stream and increase scanner buffer for large events. The
function does not close the output channel.
- Change StreamGenerateContent to take context and use
NewRequestWithContext; set Accept to text/event-stream and log scanner
errors - Add SSE keepalives and client cancel handling in stream handler
- Parse request as GeminiInternalRequest; introduce typed content,
tools, and generation config structs - Add /v1/chat/completions route -
Update deps: uuid, testify, and indirect test utilities
- Implement SSE handler bridging to Gemini and normalizing models - Add
OpenAI types for chat, messages, and tools - Convert OpenAI messages to
Gemini contents and config - Stream CloudCode SSE through
OpenAI-compatible transformer - Add unit tests for OpenAI→Gemini
transform
- Add FunctionCall and FunctionResponse to ContentPart - Convert OpenAI
tools to Gemini tools in request transform - Log detailed request, model
normalization, and streaming events - Emit tool call chunks for function
calls in SSE stream
Resolve tool response names via tool_call_id or explicit name. Map
assistant tool_calls to functionCall and tool outputs to
functionResponse. Extend openai.Message with ToolCalls, ToolCallID, and
Name. Fix INVALID_ARGUMENT by ensuring functionResponse.name is always
set. Add test covering tool response name resolution.
- Implement non-streaming /v1/chat/completions with OpenAI-style JSON -
Add /v1/models endpoint returning Gemini model metadata - Refactor
chatCompletionsHandler to support both streaming and non-streaming - Add
Gemini-to-OpenAI response transformer - Improve logging for tool calls
and responses
Add test to ensure tool_calls and tool responses are matched and
aggregated into a single user turn, satisfying CloudCode's function
call/response parity requirement.
@gemini-code-assist
Copy link

Summary of Changes

Hello @dvcrn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive support for the OpenAI /v1/chat/completions API endpoint, enabling OpenAI-compatible clients to seamlessly integrate with the Gemini Code Assist API. Key changes include the addition of new packages for OpenAI types and transformations, refactoring of the core Gemini client with token refresh logic, and a restructured server to handle new routes and delegate requests. The build system was also updated from just to mise, and documentation was revised accordingly.

Highlights

  • Intent: This pull request primarily aims to add support for the OpenAI /v1/chat/completions endpoint, enabling OpenAI-compatible clients to interact with the Gemini Code Assist API through this proxy. It also includes significant refactoring to improve code structure and maintainability.
  • Key Changes: The core changes involve introducing a new internal/openai package to define OpenAI API types and handle streaming transformations from Gemini's SSE format to OpenAI's SSE format. A new internal/transform package was added to manage the conversion logic between OpenAI and Gemini request/response structures, including complex mapping for messages, system instructions, and tool calls/responses. The internal/gemini client was enhanced with GenerateContent and StreamGenerateContent methods, incorporating automatic token refresh. The server now includes dedicated handlers for /v1/chat/completions (streaming and non-streaming) and /v1/models (providing a static list of supported models). The build system was migrated from just to mise, and documentation (README.md, CLAUDE.md) was updated to reflect these changes and new API support.
  • Refactoring: The HTTP client implementation was moved from internal/server to a new internal/http package for better modularity. The main server routes were reorganized into dedicated handler files (chatCompletionsHandler.go, modelsHandler.go, streamGenerateContentHandler.go) to separate concerns and improve readability.
  • New Features: Full support for the OpenAI /v1/chat/completions endpoint, including streaming and non-streaming requests, message role mapping (user, assistant, tool), system instructions, and tool call/response handling. A /v1/models endpoint was added to list available Gemini models in an OpenAI-compatible format.
  • Documentation and Build: Updated README.md and CLAUDE.md to reflect the new OpenAI API support, installation instructions, and migration from just to mise for task management. New mise.toml file replaces the old justfile.
  • Impact: These changes significantly expand the proxy's compatibility, allowing a wider range of AI clients (those built for OpenAI's API) to leverage the Gemini Code Assist backend. The refactoring improves the maintainability and extensibility of the codebase.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@dvcrn dvcrn merged commit ad55c5d into main Oct 8, 2025
1 check failed
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant and valuable changes, including a major refactor and the addition of an OpenAI-compatible endpoint. The new features are well-implemented with thorough testing.

I've identified a few key areas for improvement:

  • Critical Bug: There's a bug in the token refresh retry logic that will cause requests to fail after a token expires. I've provided a detailed comment and suggestion to fix this.
  • Code Duplication: The token refresh logic is duplicated in several places. Centralizing this would improve maintainability and prevent bugs like the one mentioned above.
  • Documentation: The README.md has been simplified, but in the process, some very useful documentation about the proxy's transformation logic was removed. I'd recommend restoring it or moving it to a separate file.

Overall, this is a great contribution. Addressing the critical issues will make the proxy much more robust.

Comment on lines +71 to +93
// Check for 401 Unauthorized and attempt a token refresh
if resp.StatusCode == http.StatusUnauthorized {
resp.Body.Close() // Close the first response body

if err := c.provider.RefreshToken(); err != nil {
return nil, fmt.Errorf("failed to refresh token: %w", err)
}

// Reload credentials after refresh
refreshedCreds, err := c.provider.GetCredentials()
if err != nil {
return nil, fmt.Errorf("failed to reload credentials after refresh: %w", err)
}

// Re-create the request with the new token
req.Header.Set("Authorization", "Bearer "+refreshedCreds.AccessToken)

// Retry the request
resp, err = c.httpClient.Do(req)
if err != nil {
return nil, fmt.Errorf("request execution error after refresh: %w", err)
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This token refresh and retry logic is duplicated in LoadCodeAssist, GenerateContent, and StreamGenerateContent. This code duplication is problematic for a few reasons:

  1. Maintainability: Any changes to this logic must be manually applied in multiple places, which is error-prone.
  2. Bugs: It has led to a critical bug in this implementation.

Critical Bug
The request body is not being reset before the retry. After the first c.httpClient.Do(req) call, req.Body has been read and is at EOF. The retry will therefore send an empty request body, causing the upstream API call to fail silently or with a validation error. The implementation in internal/project/discover.go correctly handles this by recreating the body reader.

Refactoring Suggestion
To fix both the bug and the duplication, this logic should be centralized. The most idiomatic Go approach would be to create a custom http.RoundTripper that wraps the default transport. This RoundTripper would handle the 401 response, token refresh, and retry transparently for all HTTP calls made through the client.

Immediate Fix
If a full refactor is out of scope, you should at least extract this logic into a private helper method on *Client and fix the body reset issue. The suggestion below fixes the bug for this function. Please apply a similar fix to GenerateContent and StreamGenerateContent (noting the request variable is httpReq in those functions).

    // Check for 401 Unauthorized and attempt a token refresh
    if resp.StatusCode == http.StatusUnauthorized {
        resp.Body.Close() // Close the first response body

        if err := c.provider.RefreshToken(); err != nil {
            return nil, fmt.Errorf("failed to refresh token: %w", err)
        }

        // Reload credentials after refresh
        refreshedCreds, err := c.provider.GetCredentials()
        if err != nil {
            return nil, fmt.Errorf("failed to reload credentials after refresh: %w", err)
        }

        // Re-create the request with the new token and reset the body
        req.Header.Set("Authorization", "Bearer "+refreshedCreds.AccessToken)
        // CRITICAL: The request body must be reset for the retry.
        req.Body = ioutil.NopCloser(bytes.NewReader(bodyBytes))

        // Retry the request
        resp, err = c.httpClient.Do(req)
        if err != nil {
            return nil, fmt.Errorf("request execution error after refresh: %w", err)
        }
    }


// StreamGenerateContent performs a streaming request and sends each raw SSE line to the provided channel.
// It does not transform or interpret SSE content; lines are forwarded as-is.
// The caller owns the lifecycle of the 'out' channel; this function will not close it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This comment is dangerously incorrect and contradicts the implementation. The comment states that the caller owns the channel lifecycle, but the function's implementation at line 258 (defer close(out)) correctly closes the channel.

If a caller follows this comment and also tries to close the channel, it will cause a panic (double close). The function should be responsible for closing the channel to signal to the receiver that the stream is complete. The implementation is correct, but the comment must be fixed to prevent misuse.

Suggested change
// The caller owns the lifecycle of the 'out' channel; this function will not close it.
// This function is responsible for closing the 'out' channel upon completion.

- `gemini-1.5-flash` → `gemini-2.5-flash`
- `gemini-pro-latest` → `gemini-2.5-pro`

### 3. Request Body Transformation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The refactoring of the README has removed several sections that were very helpful for understanding how the proxy works, specifically the Request Body Transformation, Response Transformation, and Usage Examples sections.

While the new README is more concise, this information is crucial for users and contributors to understand the proxy's core functionality. Without it, it's much harder to know how to structure requests or what to expect in responses.

Suggestion:
Please consider re-adding these sections. If the goal is to keep the main README.md lean, this detailed information could be moved to a separate file like ARCHITECTURE.md or DETAILS.md and linked from here.

"testing"
)

func TestCreateOpenAIStreamTransformer_BasicText(t *testing.T) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with other new tests in this PR (e.g., openai_to_gemini_test.go), it would be better to use the testify library for assertions (assert and require). This makes tests more readable and provides better failure messages.

Comment on lines +402 to +417
openAIResp := map[string]interface{}{
"id": "chatcmpl",
"object": "chat.completion",
"created": created,
"model": req.Model,
"choices": []map[string]interface{}{
{
"index": 0,
"message": map[string]interface{}{
"role": "assistant",
"content": contentText,
},
"finish_reason": "stop",
},
},
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of manually constructing the response as a map[string]interface{}, it's better to use the openai.ChatCompletionResponse struct you've already defined. This provides type safety, makes the code cleaner, and reduces the chance of typos in JSON keys.

Additionally, the current implementation uses a hardcoded, non-unique id ("chatcmpl"). OpenAI API responses have unique IDs. You should generate a unique ID for each response, for example using the uuid package which is already a dependency in this project.

Note: You will need to add fmt and github.com/google/uuid to the imports for this file.

Suggested change
openAIResp := map[string]interface{}{
"id": "chatcmpl",
"object": "chat.completion",
"created": created,
"model": req.Model,
"choices": []map[string]interface{}{
{
"index": 0,
"message": map[string]interface{}{
"role": "assistant",
"content": contentText,
},
"finish_reason": "stop",
},
},
}
openAIResp := openai.ChatCompletionResponse{
ID: fmt.Sprintf("chatcmpl-%s", uuid.New().String()),
Object: "chat.completion",
Created: created,
Model: req.Model,
Choices: []openai.Choice{
{
Index: 0,
Message: openai.Message{
Role: "assistant",
Content: contentText,
},
FinishReason: "stop",
},
},
}

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Comment on lines +159 to +163
// Re-create the request with the new token
httpReq.Header.Set("Authorization", "Bearer "+refreshedCreds.AccessToken)

// Retry the request
resp, err = c.httpClient.Do(httpReq)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Retry after 401 reuses consumed request body

When the first generateContent call returns 401, the code updates the Authorization header and retries the same httpReq. However the request was constructed with a bytes.NewReader and after the first Do the body has been consumed and closed, so the second Do will either send an empty payload or fail with http: Request Body errors. Any expired token will therefore cause non‑streaming completions to fail even after refresh. Rebuild the http.Request (or reset the body via GetBody) before retrying.

Useful? React with 👍 / 👎.

Comment on lines +235 to +239
// Re-create the request with the new token
httpReq.Header.Set("Authorization", "Bearer "+refreshedCreds.AccessToken)

// Retry the request
resp, err = c.httpClient.Do(httpReq)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Streaming retry drops request payload after token refresh

The streaming path handles a 401 by refreshing the token and reusing the same httpReq. Because the request body was already read during the first Do, the retry sends an empty body and the upstream SSE request fails, meaning token refresh does not work for streaming calls. Similar to the non‑streaming method, the retry must rebuild the request or reset its body before issuing the second Do.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant