Skip to content

Conversation

@chrisraygill
Copy link
Collaborator

No description provided.

…erence

- Remove enterprise tip section to streamline content
- Update example prompts to be more educational and neutral
- Add detailed model capabilities table for Gemini series
- Document all available models across Gemini 2.5, 2.0, 1.5, and Gemma 3 series
- Include embedding and image model references
- Switch embedding example to use gemini-embedding-001 model
- Improve section organization and clarity
- Remove parentheses from response.text() and response.media() methods
- Upgrade default Imagen model from 3.0 to 4.0-generate-001
- Add Imagen 4 series models with quality variants
- Update gemini-2.0-flash capability from unavailable to experimental
- Simplify context caching documentation to focus on automatic caching
- Remove explicit caching examples for clarity
Restructure the Google GenAI documentation to better organize available models by category (text generation, embedding, image generation, video generation, TTS, and music generation). Add support for new Veo video models and Lyria music generation models, improving readability and discoverability of different model capabilities.
Remove '-preview' suffix from gemini-2.5-flash-image model name in documentation examples, reflecting the model's transition from preview to general availability.
Remove documentation and examples for Lyria RealTime music generation model, including usage examples, configuration options, and API
- Remove outdated Gemini 1.5 and 2.0 models from capabilities table
- Update Gemini 2.0 series description to indicate legacy status
- Add note about Google AI Files API documentation for file operations
- Keep focus on currently supported Gemini 2.5 models
…gle GenAI

Adds comprehensive documentation for accessing safety ratings from Gemini API responses and understanding grounding metadata when using Google Search grounding. Includes code examples and detailed explanations of metadata structure for better developer experience.
Update documentation to show accessing safety ratings from `response.custom` instead of `response.raw`. This reflects changes in how provider-specific metadata is accessed in the API response structure.
- Restructure safety ratings section with dedicated header
- Expand Google Search Grounding documentation with detailed workflow explanation
- Add comprehensive examples for accessing grounding metadata
- Improve organization and readability of integration guide
…e examples

- Replace GoogleAIFileManager with GoogleGenAI client usage
- Add complete file lifecycle examples (upload, get, list, delete)
- Include usage limits and auto-deletion details in notes section
- Update code examples to use modern API patterns
- Expand from basic example to comprehensive guide
Add comprehensive documentation for Google Maps grounding feature including basic usage, metadata access, widget integration, and combining with other tools. Provides TypeScript examples and explains how to leverage Google Maps data for location-aware AI responses.
Replace brief pass-through tools section with comprehensive URL Context documentation including usage examples, metadata access patterns, and tool combination strategies
Add comprehensive documentation for Gemini video processing capabilities including timestamp referencing, transcription with visual descriptions, custom video processing options, and best practices for video analysis workflows.
Add detailed documentation for Google GenAI authentication methods including environment variables, plugin configuration, and per-request API key override. Include code example and use cases for per-request API key option such as multi-tenant applications and cost tracking.

The following table shows the capabilities of popular Gemini models:

| Model | Image Input | Object Generation | Tool Usage | Tool Streaming | Thinking | Code Execution | Google Search |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image generation is missing here.


### File Inputs and YouTube URLs

Gemini models support various file types including PDFs and can process YouTube videos directly:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Gemini models support receiving different file types as inputs which include PDFs, Images, videos, or YouTube URLs.

YouTube URLs (public or unlisted videos) are supported directly. You can specify one YouTube video URL per request.
:::

### Video Understanding

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this duplicative maybe of the "Main file input" section? doesn't feel like it needs its own callout.

const model = googleAI.model('imagen-4.0-generate-001');
```

## Provider Options

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend maybe removing this as an overall header - since your headers are so nested you are losing visibility on the sidebar for the options you'd want to highlight. Perhaps this was intentional not to crowd the sidebar?

- `HARM_CATEGORY_HARASSMENT`
- `HARM_CATEGORY_SEXUALLY_EXPLICIT`

Available thresholds:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these ones deserve a bit of a description on what they mean

### Context Caching

The following features are available through the `googleAI` plugin.
Gemini 2.5 models automatically cache common content prefixes, providing a 75% token discount on cached tokens:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe explaining a bit more on why this is helpful here


### Code Execution

Certain models can generate and execute Python code for calculations and problem-solving. Enable code execution in the model configuration:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Certain models can generate and execute Python code for calculations and problem-solving. Enable code execution in the model configuration:
Certain models can generate and execute Python code for calculations and problem-solving. To Enable code execution in the model configuration:


### Google Search Grounding

Enable Google Search to provide answers with current information and verifiable sources. When enabled, the model automatically determines if a search is needed and executes queries to ground its responses in real-time web content.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great example of explaining "why" this is helpful. Doing the same for line 406 for example would be great.


You can create models that call the Google Generative AI API. The models support tool calls and some have multi-modal capabilities.

### Model Capabilities

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this section and then the one right after for the type of models, I wonder if it might make sense to use the same style that Firebase has for "support input/ouputs" and "supported capabilities" - and have 2 tables on there and consolidate this stuff.

https://firebase.google.com/docs/ai-logic/models#input-output-comparison


### Image Generation

```typescript

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mentioned needing the response modalities for using the gemini models for image generation - we should have an example in here for it.

Copy link
Collaborator

@ifielker ifielker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen all of it yet, but here's a start


<LanguageContent lang="js">

The Google GenAI plugin provides a unified interface to connect with Google's generative AI models through the **Gemini Developer API** using API key authentication. It is a replacement for the previous `googleAI` plugin.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still be clear here that it's the googleAI plugin from the google-genai package. There is no Google GenAI plugin.

The Google GenAI plugin provides a unified interface to connect with Google's generative AI models through the **Gemini Developer API** using API key authentication. It is a replacement for the previous `googleAI` plugin.

:::tip[Enterprise Users]
For enterprise features like grounding, context caching, Vector Search, and Model Garden access, see the [Vertex AI plugin](/docs/integrations/vertex-ai).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vector search plugin and Model Garden plugin are still in the old vertexai package... Not to be confused with the new vertexai Plugin from the google-genai package...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! This was confusing to us as well at first.


## Language Models

You can create models that call the Google Generative AI API. The models support tool calls and some have multi-modal capabilities.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rephrase, the user is not creating models, just calling them. Creating a reference to them at most.


The following table shows the capabilities of popular Gemini models:

| Model | Image Input | Object Generation | Tool Usage | Tool Streaming | Thinking | Code Execution | Google Search |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "Object Generation"? Does that mean JSON responses that follow a certain schema?
There are also the speech generation models. (TTS) (gemini-2.5-flash-preview-tts and gemini-2.5-pro-preview-tts), and the image generation models (nano banana) gemini-2.5-flash-image

- `veo-3.0-generate-001` - Latest Veo model with improved quality
- `veo-3.0-fast-generate-001` - Fast video generation
- `veo-2.0-generate-001` - Previous generation video model

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Music Generation (Lyria) also works... I'm not sure if there was a reason not to mention it... I feel like something is incomplete, but I don't remember what...

});
```

Available harm categories:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are more categories now...

Gemini can transcribe audio and provide visual descriptions by processing both the audio track and visual frames (sampled at 1 frame per second):

```typescript
// Using an uploaded file from the Files API
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Files API is part of the Google Genai SDK. This is a separate download and installation. The google-genai package does not use the Google Genai SDK and does not provide its own files upload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants