chore(providers): bump Gemini defaults to current GA models#370
Conversation
Bundles two upstream PRs into one chore — both are blocking real users today and both are simple default-string bumps with no API contract change. LLM default (was PR #368, @yut304) - `gemini-2.0-flash` is deprecated in Google's Gemini API and returns 429 rate-limit errors under load. Replace the default with `gemini-flash-latest`. Users on a pinned `GEMINI_MODEL` in `~/.agentmemory/.env` are unaffected. Embedding default (was PR #246, @AmmarSaleh50) - `text-embedding-004` is deprecated (shutdown Jan 14 2026). Replace with `gemini-embedding-001` (GA): 100+ languages, MRL dims (768 / 1536 / 3072), 2048-token input. - URL path changes from `:batchEmbedContent` to `:batchEmbedContents` (plural — the new model's batch endpoint). - Each request now sends `outputDimensionality: 768` so the returned vectors match the existing index dim guard from #248 — no reindex needed. - L2-normalize each returned vector before pushing to the result array. `gemini-embedding-001` does not normalize by default, unlike `text-embedding-004`. Without this the cosine-similarity math elsewhere in the search pipeline (which assumes unit-length vectors) collapses. Verified - `npm test` clean: 903 / 903. - `npm run build` clean. Closes #368, closes #246.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughSwitch Gemini embedding to ChangesGemini Provider Migration and Configuration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/providers/embedding/gemini.ts (1)
58-65: ⚡ Quick winConsider logging or throwing on zero-norm vectors.
The function silently returns the unnormalized vector when
norm === 0(line 62). A zero-norm embedding from the API would indicate a problem upstream, but this implementation swallows it. Consider logging a warning or throwing an error to surface the issue rather than injecting an unnormalized (zero) vector into results that are expected to be unit-length for cosine similarity.🔍 Proposed enhancement
function l2Normalize(vec: Float32Array): Float32Array { let sum = 0; for (let i = 0; i < vec.length; i++) sum += vec[i]! * vec[i]!; const norm = Math.sqrt(sum); - if (norm === 0) return vec; + if (norm === 0) { + throw new Error("Cannot normalize zero-length embedding vector"); + } for (let i = 0; i < vec.length; i++) vec[i] = vec[i]! / norm; return vec; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/providers/embedding/gemini.ts` around lines 58 - 65, The l2Normalize function currently returns the original array when norm === 0, silently allowing zero-length embeddings; update l2Normalize to surface this upstream error by either throwing a descriptive Error (e.g., "zero-norm embedding returned from upstream") or logging a warning with context before failing, and ensure callers can handle the exception; refer to the function name l2Normalize and modify its norm === 0 branch to throw or log (and return a safe fallback only if explicitly wanted), including details such as the embedding length or source to aid debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/config.ts`:
- Line 79: The default model string used for the config key "model" (fallback
when env["GEMINI_MODEL"] is unset) should not use the auto-updating alias
"gemini-flash-latest"; change the fallback to a stable pinned identifier such as
"gemini-2.5-flash" so production behavior is deterministic, i.e., update the
expression that sets model (the `model: env["GEMINI_MODEL"] ||
"gemini-flash-latest",` assignment) to use a stable model name as the default.
In `@src/providers/embedding/gemini.ts`:
- Around line 32-36: The requests payload is using camelCase key
outputDimensionality which Gemini expects as snake_case output_dimensionality;
update the chunk.map(...) object so the property is output_dimensionality:
this.dimensions (instead of outputDimensionality) wherever you build requests
for MODEL in the embedding/gemini provider, ensuring any other occurrences of
outputDimensionality are renamed to output_dimensionality so the API receives
the intended 768-dimension vector setting.
---
Nitpick comments:
In `@src/providers/embedding/gemini.ts`:
- Around line 58-65: The l2Normalize function currently returns the original
array when norm === 0, silently allowing zero-length embeddings; update
l2Normalize to surface this upstream error by either throwing a descriptive
Error (e.g., "zero-norm embedding returned from upstream") or logging a warning
with context before failing, and ensure callers can handle the exception; refer
to the function name l2Normalize and modify its norm === 0 branch to throw or
log (and return a safe fallback only if explicitly wanted), including details
such as the embedding length or source to aid debugging.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 69176075-be9c-4340-a799-ed5ea3ec84c2
📒 Files selected for processing (3)
README.mdsrc/config.tssrc/providers/embedding/gemini.ts
| requests: chunk.map((t) => ({ | ||
| model: "models/text-embedding-004", | ||
| model: MODEL, | ||
| content: { parts: [{ text: t }] }, | ||
| outputDimensionality: this.dimensions, | ||
| })), |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Gemini gemini-embedding-001 outputDimensionality parameter reduce dimensions
💡 Result:
For the Gemini embedding model gemini-embedding-001, the parameter you use to reduce the embedding dimensionality (i.e., the “outputDimensionality/output_dimensionality” behavior) is output_dimensionality / outputDimensionality. 1) What it does - By default, gemini-embedding-001 returns 3072-dimensional embeddings. [1] - You can pass output_dimensionality (or outputDimensionality) to control the size of the output embedding vector; smaller values truncate the embedding to the requested size (to save storage and improve downstream efficiency). [1][2] 2) Parameter name(s) by API surface - Gemini API (ai.google.dev docs): use output_dimensionality in the embed content request. [3] - Vertex AI Generative AI API (cloud.google.com docs): use output_dimensionality in EmbedContentConfig (shown as output_dimensionality=... in examples). [1] - Vertex AI model reference also describes outputDimensionality as an optional int used to specify output embedding size and truncates if set. [2] 3) Practical example (what to change) - Request without dimensionality: defaults to 3072 dims. [1] - Request with dimensionality: set output_dimensionality/outputDimensionality to e.g. 768, 1536, or 3072 (Google recommends 768/1536/3072 for quality/storage tradeoffs). [3][4] 4) Common gotcha - Ensure your client/library actually sends the parameter; otherwise you’ll get the default 3072-dimension vectors, which can break downstream vector databases expecting the reduced dimension. [5] So, to “reduce dimensions” for gemini-embedding-001, set output_dimensionality (Gemini API) or outputDimensionality/output_dimensionality (Vertex AI) in the embedContent request config; e.g., 768 instead of the default 3072. [1][2][3]
Citations:
- 1: https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings#:~:text=For%20gemini%2Dembedding%2D001%20%2C,of%20the%20output%20embedding%20vector.
- 2: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api
- 3: https://ai.google.dev/gemini-api/docs/embeddings
- 4: https://developers.googleblog.com/gemini-embedding-available-gemini-api/
- 5: Google Gemini embedding model outputDimensionality not working vercel/ai#8033
🏁 Script executed:
cat -n src/providers/embedding/gemini.ts | head -50Repository: rohitg00/agentmemory
Length of output: 2110
Change outputDimensionality to output_dimensionality (snake_case).
The code targets the Gemini API (ai.google.dev), which expects the parameter as output_dimensionality in snake_case, not outputDimensionality in camelCase (which is the Vertex AI convention). Using the wrong parameter name will cause the API to ignore it and return default 3072-dimensional vectors instead of the intended 768 dimensions, breaking compatibility with existing indexes.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/providers/embedding/gemini.ts` around lines 32 - 36, The requests payload
is using camelCase key outputDimensionality which Gemini expects as snake_case
output_dimensionality; update the chunk.map(...) object so the property is
output_dimensionality: this.dimensions (instead of outputDimensionality)
wherever you build requests for MODEL in the embedding/gemini provider, ensuring
any other occurrences of outputDimensionality are renamed to
output_dimensionality so the API receives the intended 768-dimension vector
setting.
…norm Addresses CodeRabbit findings on PR #370. 1. Pin Gemini LLM default to gemini-2.5-flash. `gemini-flash-latest` is a moving alias that points to whatever Google promotes next. Production behaviour should be deterministic from a release perspective — users who upgrade agentmemory should not also get a Gemini model rotation in the same step. Switch the default to the current stable GA model `gemini-2.5-flash`. Users who want the moving alias keep getting it via `GEMINI_MODEL=gemini-flash-latest` in `~/.agentmemory/.env`. 2. Warn-once on zero-norm embedding in l2Normalize. `gemini-embedding-001` can return a zero-norm vector for degenerate input. The previous code silently returned the zero vector — downstream cosine-similarity math then divides by zero and the call site sees `NaN` scores with no signal as to why. Emit a one-time stderr warning naming the model + vector length so operators can correlate index quality dips with upstream embedding regressions. Behaviour otherwise unchanged: return the zero vector and let BM25 carry the search signal. Throwing was the other option — rejected because a single bad embedding in a 100-item batch would abort the whole batch and surface as an indexing pipeline halt. Soft-fail + warn matches the rest of the embedding provider error handling. Skipped finding: - `outputDimensionality` → `output_dimensionality` snake_case rename. CodeRabbit asserts the REST API expects snake_case. The Gemini REST API actually uses camelCase on the wire — confirmed against ai.google.dev/api/embeddings (field labelled `outputDimensionality` in the REST schema; the Python SDK alone uses snake_case and translates internally). Current code is correct as-shipped; the snake_case rename would silently break the dim override. Verified: 903 / 903 tests pass; build clean.
…loy templates + Gemini GA bumps (#383) * chore(release): v0.9.13 — env-example discovery + CJK tokenizer + load harness + deploy templates + Gemini GA bumps + 14 advisories closed Six PRs landed since v0.9.12: - #372 .env.example discovery (this commit) — repo-root template + `init` CLI command + CI sync-checker - #362 CJK BM25 tokenizer (`@node-rs/jieba` + tiny-segmenter + Hangul) - #363 `benchmark/load-100k.ts` harness with p50/p90/p99 + per-release results dir - #361 one-click deploy templates for fly.io / Railway / Render / Coolify (multi-stage Dockerfile, `iiidev/iii` base, `gosu` privilege drop, first-boot HMAC, verified end-to-end on fly.io) - #364 Python ecosystem via `iii-sdk` example (replaces closed PR #360) - #370 Gemini GA bumps (LLM default → gemini-2.5-flash, embedding → gemini-embedding-001 + L2-norm + 768 dims) Plus 14 open Dependabot advisories closed in PR #348 via Next.js → 16.2.6 and PostCSS → 8.5.10 overrides. Bumped: - src/version.ts: VERSION 0.9.12 → 0.9.13 - package.json: 0.9.12 → 0.9.13, files += ".env.example", build script copies .env.example into dist/ - packages/mcp/package.json: 0.9.12 → 0.9.13 (lockstep with main) - plugin/.claude-plugin/plugin.json, plugin/.codex-plugin/plugin.json: 0.9.12 → 0.9.13 - src/types.ts: ExportData.version union extended with "0.9.13" - src/functions/export-import.ts: supportedVersions Set extended - test/export-import.test.ts: expected version updated New surface: - .env.example at repo root — every env var read by src/ documented in one place, grouped by surface (LLM, embedding, auth, search tuning, behaviour flags, CLI runtime, ports, iii engine pin, Claude Code bridge, Obsidian export). Every line commented out by default so the file is a template. - agentmemory init — copies bundled .env.example to ~/.agentmemory/.env if absent, refuses to overwrite, prints a diff command. Wired into CLI dispatch + help block. - scripts/check-env-example.mjs — walks src/ for env-read patterns, fails CI on drift in either direction. Plugged into ci.yml after npm test. Initial bootstrap: 60 keys in sync. Verified: npm test 903/903, npm run build clean, init smoke pass (creates ~/.agentmemory/.env on first run, refuses overwrite on second). * fix(init): atomic copy via COPYFILE_EXCL; address CodeRabbit review Two valid findings from the CodeRabbit pass on PR #383. 1. `runInit` race between existsSync(target) + copyFile(template, target). A parallel `agentmemory init` (or any other process touching ~/.agentmemory/.env between the two calls) would silently overwrite the config the operator just wrote. Switch to a single atomic `copyFile(template, target, fsConstants.COPYFILE_EXCL)` and treat the EEXIST error as the "already configured" signal — same warning + diff hint as before, but the check + copy now happen in one syscall so they cannot race. Other failure paths still surface as process exit 1. 2. Comment on `scripts/check-env-example.mjs::walk` claimed it matched ".ts / .mts / .mjs" but the regex also matched ".js". Rewrote the comment to match the regex (".ts / .mts / .mjs / .js"). Same comment pass: noted that test/ never enters because the walk is rooted at src/, not because of an explicit skip. Skipped findings: - WHAT-style comment on `findEnvExample` — kept a one-liner explaining the package-vs-source priority since both paths are real; reduced the block from 4 lines to 2 instead of removing it entirely. - "Add trailing newline to .env.example" — file already ends with `\n` (verified `tail -c 5` shows `tion\n`). Verified locally: - `npm run build` clean. - `npm test` 903 / 903 pass. - First `agentmemory init` against a clean HOME creates the file. - Second init against the same HOME hits EEXIST and prints the "leaving it untouched" warning + diff hint without overwriting. - `node scripts/check-env-example.mjs` — in sync (60 keys).
Summary
Bundles two upstream PRs into one chore — both block real users today and both are default-string bumps with zero API-contract change.
LLM default
gemini-2.0-flashis deprecated in Google's Gemini API and returns 429 rate-limit errors under load. Default switches togemini-flash-latest.Users on a pinned
GEMINI_MODELin~/.agentmemory/.envare unaffected — defaults only.Embedding default
text-embedding-004is deprecated (shutdown Jan 14 2026). Default switches togemini-embedding-001(GA): 100+ languages, MRL dims (768 / 1536 / 3072), 2048-token input.Three implementation details that go with the model swap:
:batchEmbedContent→:batchEmbedContents(plural; the new model's batch endpoint).outputDimensionality: 768— sent on every request so returned vectors matchGeminiEmbeddingProvider.dimensions = 768and the index-restore dim guard from PR fix(embedding): guard provider responses against dimension mismatches #248 — no reindex needed for existing users.text-embedding-004,gemini-embedding-001does not normalize by default — without this the cosine-similarity math elsewhere in the search pipeline (which assumes unit-length vectors) silently collapses recall.Closes
Closes #368 — @yut304's bump
Closes #246 — @AmmarSaleh50's bump
Test plan
npm testpasses — 903 / 903.npm run buildclean.GEMINI_API_KEY=...set,npx agentmemory doctorreports provider =llm, model =gemini-flash-latest.Summary by CodeRabbit
Documentation
New Features