diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md new file mode 100644 index 0000000..0b86843 --- /dev/null +++ b/.github/CONTRIBUTING.md @@ -0,0 +1,37 @@ +# Contributing + +Thank you for taking the time to contribute! + +## Techstack + +Gemini AI is built with the following tools. Ensure you have `bun` [installed](https://bun.sh/). + +- [**Bun**](https://bun.sh/) as the package manager +- [**Biome**](https://biomejs.dev/) as the linter and formatter +- [**tsup**](https://tsup.egoist.dev/) as the build tool + +Follow the [Github flow](https://docs.github.com/en/get-started/using-github/github-flow) to clone the repo and create a branch. + +It is recommended to install Biome integration with your IDE to format your code. + +## Scripts + +### `bun run build` + +Use this to build your project, should you need to test it locally. + +It uses `tsup` under the hood. + +### `bun run test` + +Use this to test your project with existing unit tests. These tests will also be ran on your PR, so ensure they are passing! + +It uses `vitest` under the hood. + +You can also use `bun run coverage` to check coverage of your tests. + +### `bun run check` + +Use this to check if your code follows our formatting standards. + +It uses Biome under the hood. diff --git a/.github/README.md b/.github/README.md index 1c09379..d04ee1d 100644 --- a/.github/README.md +++ b/.github/README.md @@ -1,41 +1,79 @@ - - - - - Gemini AI Banner - +Gemini AI Banner

- + - - + +

Docs | GitHub | FAQ

-> [!NOTE] -> With the release of Gemini AI 1.1, there is now **streaming support**! Check it out [here](#streaming). - ## Features -- 🌎 [**Multimodal**](#auto-model-selection): Interact with text and images—Native to the model. +- 🌎 [**Multimodal**](#auto-model-selection): Interact with images, videos, audio, and more, built into the model. - 🌐 [**Contextual Conversations**](#geminicreatechat): Chat with Gemini, built in. -- 🧪 [**Parameter**](#method-patterns): Easily modify `temperature`, `topP`, and more +- 🧪 [**Simple Parameters**](#method-patterns): Easily modify `temperature`, `topP`, and more - ⛓️ [**Streaming**](#streaming): Get AI output the second it's available. +- 🔒 [**Typesafe**](#types): Types built in. Gemini AI is written with TypeScript + +## Why Gemini AI + +> Why should I use this, instead of Google's [own API](https://www.npmjs.com/package/@google/generative-ai)? + +It's all about simplicity. Gemini AI allows you to make requests to Gemini at just about a quarter of the code necessary with Google's API. + +
+Don't believe me? Take a look. + +
+ +Google's own API (CommonJS): + +```javascript +const { GoogleGenerativeAI } = require("@google/generative-ai"); + +const genAI = new GoogleGenerativeAI(API_KEY); + +async function run() { + const model = genAI.getGenerativeModel({ model: "gemini-1.5-pro-latest" }); + + const prompt = "Hi!"; + + const result = await model.generateContent(prompt); + const response = await result.response; + const text = response.text(); + console.log(text); +} + +run(); +``` + +Gemini AI (ES6 Modules): + +```javascript +import Gemini from "gemini-ai"; + +const gemini = new Gemini(API_KEY); +console.log(await gemini.ask("Hi!")); +``` + +That's nearly 4 times less code! + +
-## Highlights +And there's no sacrifices either. Gemini AI uses Google's REST API under the hood, so you get simplicity without compromise. -Gemini AI v1.0 compared to Google's [own API](https://www.npmjs.com/package/@google/generative-ai) +And, there's also more! -- ⚡ [**Native REST API**](#inititalization): Have simplicity without compromises -- 🚀 [**Easy**](#feature-highlight-auto-model-selection): Auto model selection based on context -- 🎯 [**Concise**](#why-gemini-ai): _**4x**_ less code needed +- 📝 [**Optimized File Uploads**](#optimized-file-uploads): Automatically uses Google's File API when necessary +- 📁 [**Automatic File Type Detection**](#optimized-file-uploads): Gemini AI will detect MIME types of files automatically +- 🧩 [**Automatic Request Creation**](): Auto-formats your requests—So you don't have to. ## Table of Contents @@ -54,16 +92,16 @@ Gemini AI v1.0 compared to Google's [own API](https://www.npmjs.com/package/@goo ## Getting an API Key -1. Go to [Google Makersuite](https://makersuite.google.com) -2. Click "Get API key" at the top, and follow the steps to get your key +1. Go to [Google AI Studio's API keys tab](https://aistudio.google.com/app/apikey) +2. Follow the steps to get an API key 3. Copy this key, and use it below when `API_KEY` is mentioned. -> [!CAUTION] +> [!WARNING] > Do not share this key with other people! It is recommended to store it in a `.env` file. ## Quickstart -Make a text request (`gemini-pro`): +Make a text request: ```javascript import Gemini from "gemini-ai"; @@ -73,7 +111,7 @@ const gemini = new Gemini(API_KEY); console.log(await gemini.ask("Hi!")); ``` -Make a streaming text request (`gemini-pro`): +Make a streaming text request: ```javascript import Gemini from "gemini-ai"; @@ -85,10 +123,10 @@ gemini.ask("Hi!", { }); ``` -Chat with Gemini (`gemini-pro`): +Chat with Gemini: ```javascript -import fs from "fs"; +import Gemini from "gemini-ai"; const gemini = new Gemini(API_KEY); const chat = gemini.createChat(); @@ -100,7 +138,7 @@ console.log(await chat.ask("What's the last thing I said?")); ### Other useful features
-Make a text request with images (gemini-pro-vision): +Make a text request with images:
```javascript @@ -110,16 +148,14 @@ import Gemini from "gemini-ai"; const gemini = new Gemini(API_KEY); console.log( - gemini.ask("What's this show?", { - data: [fs.readFileSync("./test.png")], - }) + await gemini.ask(["What do you see?", fs.readFileSync("./cat.png")]) ); ```
-Make a text request with custom parameters (gemini-pro): +Make a text request with custom parameters:
```javascript @@ -128,10 +164,9 @@ import Gemini from "gemini-ai"; const gemini = new Gemini(API_KEY); console.log( - gemini.ask("Hello!", { + await gemini.ask("Hello!", { temperature: 0.5, topP: 1, - topK: 10, }) ); ``` @@ -139,7 +174,7 @@ console.log(
-Embed Text (`embedding-001`): +Embed text:
```javascript @@ -147,17 +182,13 @@ import fs from "fs"; const gemini = new Gemini(API_KEY); -gemini.embed("Hi!"); +console.log(await gemini.embed("Hi!")); ```
## Special Features -### Auto Model Selection - -Google has released two models this time for Gemini—`gemini-pro`, and `gemini-pro-vision`. The former is text-specific, while the latter is for multimodal use. Gemini AI has been designed so that it will automatically select which model it will use! - ### Streaming Here's a quick demo: @@ -176,9 +207,25 @@ Let's walk through what this code is doing. Like always, we first initialize `Ge Note that this automatically cuts to the `streamContentGenerate` command... you don't have to worry about that! -> [!NOTE] +> [!TIP] > Realize that you don't need to call `ask` async if you're handling stream management on your own. If you want to tap the final answer, it still is returned by the method, and you call it async as normal. +### Types + +Gemini AI v2 is completely written in TypeScript, which means that all parameters, and more importantly configuration, have type hints. + +Furthermore, return types are also conditional based on what `format` you place in the configuration to guarentee great DX. + +### Optimized File Uploads + +Google requires large files to be sent through their dedicated File API, instead of being included directly in the `POST` request. + +With Gemini AI v2, large files like videos and audio will automatically be detected and sent through the File API, while smaller images are still included inline—without you having to worry about any of that going on. + +This ensures the fastest file upload experience, while ensuring all your files are safely included. + +Gemini also automatically detects the MIME type of your file to pass to the server, so you don't need to worry abotu it. + ### Proxy Support Use a proxy when fetching from Gemini. To keep package size down and adhere to the [SRP](https://en.wikipedia.org/wiki/Single_responsibility_principle), the actual proxy handling is delegated to the [undici library](https://undici.nodejs.org/#/). @@ -194,12 +241,12 @@ npm i undici Initialize it with Gemini AI: ```javascript -import { ProxyAgent } from 'undici' -import Gemini from 'gemini-ai' +import { ProxyAgent } from "undici"; +import Gemini from "gemini-ai"; let gemini = new Gemini(API_KEY, { - dispatcher: new ProxyAgent(PROXY_URL) -}) + dispatcher: new ProxyAgent(PROXY_URL), +}); ``` And use as normal! @@ -232,31 +279,92 @@ await gemini.ask("Hi!", { // Config temperature: 0.5, topP: 1, - topK: 10, }); ``` -> [!NOTE] -> All methods are async! This means you should call them something like this: `await gemini.ask(...)` +> [!TIP] +> All methods (_except_ `Gemini.createChat()`) are async! This means you should call them something like this: `await gemini.ask(...)` + +#### JSON Output + +You have the option to set format to `Gemini.JSON` + +```javascript +await gemini.ask("Hi!", { + format: Gemini.JSON, +}); +``` + +This gives you the full response from Gemini's REST API. Note that the output to `Gemini.JSON` varies depending on the model and command, and is not documented here in detail due to the fact that it is unnecessary to use in most scenarios. You can find more information about the REST API's raw output [here](https://ai.google.dev/tutorials/rest_quickstart). +If you are using typescript, you get type annotations for all the responses, so autocomplete away. + ### `Gemini.ask()` This method uses the `generateContent` command to get Gemini's response to your input. -Config available: -| Field Name | Description | Default Value | -| ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------- | -| `format` | Whether to return the detailed, raw JSON output. Typically not recommended, unless you are an expert. Can either be `Gemini.JSON` or `Gemini.TEXT` | `Gemini.TEXT` | -| `topP` | See [Google's parameter explanations](https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/api-quickstart#parameter_definitions) | `0.8` | -| `topK` | See [Google's parameter explanations](https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/api-quickstart#parameter_definitions) | `10` | -| `temperature` | See [Google's parameter explanations](https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/api-quickstart#parameter_definitions) | `1` | -| `model` | Which model to use. Can be any model Google has available, but certain features are not available on some models. Currently: `gemini-pro` and `gemini-pro-vision` | Automatic based on Context | -| `maxOutputTokens` | Max tokens to output | `800` | -| `messages` | Array of `[userInput, modelOutput]` pairs to show how the bot is supposed to behave | `[]` | -| `data` | An array of `Buffer`s to input to the model. Automatically toggles model to `gemini-pro-vision` | `[]` | -| `stream` | A function that is called with every new chunk of JSON or Text (depending on the format) that the model receives. [Learn more](#feature-highlight-streaming)| `undefined` | +#### Uploading Media + +The first parameter of the `ask()` method can take in 3 different forms: + +##### String Form: + +This is simply a text query to Gemini. + +_Example:_ + +```javascript +await gemini.ask("Hi!"); +``` + +##### Array Form: + +In this array, which represents ordered "parts", you can put strings, or Buffers (these are what you get directly from `fs.readFileSync()`!). These will be fed, in order to Gemini. + +Gemini accepts most major file formats, so you shouldn't have to worry about what format you give it. However, check out a [comprehensive list here](https://ai.google.dev/gemini-api/docs/prompting_with_media?lang=javascript#supported_file_formats). + +There's a whole ton of optimizations under the hood for file uploads too, but you don't have to worry about them! [Learn more here...](#optimized-file-uploads) + +_Example:_ + +```javascript +import fs from "fs"; + +await gemini.ask([ + "Between these two cookies, which one appears to be home-made, and which one looks store-bought? Cookie 1:", + fs.readFileSync("./cookie1.png"), + "Cookie 2", + fs.readFileSync("./cookie2.png"), +]); +``` + +> [!NOTE] +> that you can also place buffers in the `data` field in the config (this is the v1 method, but it still works). These buffers will be placed, in order, directly after the content in the main message. + +##### Message Form: + +This is the raw message format. It is not meant to be used directly, but can be useful when needing raw control over file uploads, and it is also used internally by the `Chat` class. + +Please check `src/types.ts` for more information about what is accepted in the `Message` field. + +#### Config Available: + +> [!NOTE] +> These are Google REST API defaults. + +| Field Name | Description | Default Value | +| ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------- | +| `format` | Whether to return the detailed, raw JSON output. Typically not recommended, unless you are an expert. Can either be `Gemini.JSON` or `Gemini.TEXT` | `Gemini.TEXT` | +| `topP` | See [Google's parameter explanations](https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/api-quickstart#parameter_definitions) | `0.94` | +| `topK` | See [Google's parameter explanations](https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/api-quickstart#parameter_definitions). Note that this field is **not** available on v1.5 models. | `32` | +| `temperature` | See [Google's parameter explanations](https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/api-quickstart#parameter_definitions) | `1` | +| `model` | `gemini-1.5-flash-latest` | +| `maxOutputTokens` | Max tokens to output | `2048` | +| `messages` | Array of `[userInput, modelOutput]` pairs to show how the bot is supposed to behave | `[]` | +| `data` | An array of `Buffer`s to input to the model. It is recommended that you [directly pass data through the message in v2](#uploading-media). | `[]` | +| `stream` | A function that is called with every new chunk of JSON or Text (depending on the format) that the model receives. [Learn more](#feature-highlight-streaming) | `undefined` | Example Usage: @@ -269,7 +377,6 @@ console.log( await gemini.ask("Hello!", { temperature: 0.5, topP: 1, - topK: 10, }) ); ``` @@ -314,22 +421,19 @@ console.log(await gemini.embed("Hello!")); ### `Gemini.createChat()` -`Gemini.createChat()` is a unique method. For one, it isn't asynchronously called. Additionally, it returns a brand new `Chat` object. The `Chat` object only has one method, which is `Chat.ask()`, which has the _exact same syntax_ as the `Gemini.ask()` method, documented [above](#geminiask). The only small difference is that most parameters are passed into the `Chat` through `createChat()`, and cannot be overriden by the `ask()` method. The only parameters that can be overridden is `format`, `stream`, and `data` (**As of 12/13/2023, `data` is not supported yet**). - -> [!IMPORTANT] -> Google has not yet allowed the use of the `gemini-pro-vision` model in continued chats yet—The feature is already implemented, to a certain degree, but cannot be used due to Google's API limitations. +`Gemini.createChat()` is a unique method. For one, it isn't asynchronously called. Additionally, it returns a brand new `Chat` object. The `Chat` object only has one method, which is `Chat.ask()`, which has the _exact same syntax_ as the `Gemini.ask()` method, documented [above](#geminiask). The only small difference is that most parameters are passed into the `Chat` through `createChat()`, and cannot be overriden by the `ask()` method. The only parameters that can be overridden is `format`, `stream`, and `data`. All important data in the `Chat` object is stored in the `Chat.messages` variable, and can be used to create a new `Chat` that "continues" the conversation, as will be demoed in the example usage section. Config available for `createChat`: | Field Name | Description | Default Value | | ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------- | -| `topP` | See [Google's parameter explanations](https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/api-quickstart#parameter_definitions) | `0.8` | -| `topK` | See [Google's parameter explanations](https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/api-quickstart#parameter_definitions) | `10` | +| `topP` | See [Google's parameter explanations](https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/api-quickstart#parameter_definitions) | `0.94` | +| `topK` | See [Google's parameter explanations](https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/api-quickstart#parameter_definitions). Note that this field is **not** available on v1.5 models. | `10` | | `temperature` | See [Google's parameter explanations](https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/api-quickstart#parameter_definitions) | `1` | -| `model` | Which model to use. Can be any model Google has available, but certain features are not available on some models. Currently: `gemini-pro` and `gemini-pro-vision` | Automatic based on Context | -| `maxOutputTokens` | Max tokens to output | `800` | -| `messages` | Array of `[userInput, modelOutput]` pairs to show how the bot is supposed to behave | `[]` | +| `model` | `gemini-1.5-flash-latest` | +| `maxOutputTokens` | Max tokens to output | `2048` | +| `messages` | Array of `[userInput, modelOutput]` pairs to show how the bot is supposed to behave (or to continue a conversation) | `[]` | Example Usage: @@ -343,6 +447,8 @@ const gemini = new Gemini(API_KEY); const chat = gemini.createChat(); console.log(await chat.ask("Hi!")); + +// Now, you can start a conversation console.log(await chat.ask("What's the last thing I said?")); ``` @@ -357,6 +463,8 @@ const chat = gemini.createChat(); console.log(await chat.ask("Hi!")); +// Creating a new chat, with existing messages + const newChat = gemini.createChat({ messages: chat.messages, }); @@ -366,49 +474,87 @@ console.log(await newChat.ask("What's the last thing I said?")); ## FAQ -### Why Gemini AI? +### What's the difference between `data` and directly passing buffers in the message? -Well, simply put, it makes using Gemini just that much easier... see the code necessary to make a request using Google's own API, compared to Gemini AI: +`data` was the old way to pass Media data. It is now not recommended, but kept for backwards compatability. The new method is to simply pass an array of strings/buffers into the first parameter of `ask()`. The major benefit is now you can include strings between buffers, which you couldn't do before. Here's a quick demo of how to migrate: -
-See the comparison -
+With `data`: -Google's own API (CommonJS): +```javascript +import fs from "fs"; + +await gemini.ask( + "Between these two cookies, which one appears to be home-made, and which one looks store-bought?", + { + data: [fs.readFileSync("./cookie1.png"), fs.readFileSync("./cookie2.png")], + } +); +``` + +New Version: ```javascript -const { GoogleGenerativeAI } = require("@google/generative-ai"); +import fs from "fs"; -const genAI = new GoogleGenerativeAI(API_KEY); +await gemini.ask([ + "Between these two cookies, which one appears to be home-made, and which one looks store-bought?", + fs.readFileSync("./cookie1.png"), + fs.readFileSync("./cookie2.png"), +]); +``` -async function run() { - const model = genAI.getGenerativeModel({ model: "gemini-pro" }); +Learn more in the [dedicated section](#uploading-media). - const prompt = "Hi!"; +### What do I need to do for v2? - const result = await model.generateContent(prompt); - const response = await result.response; - const text = response.text(); - console.log(text); -} +> Does everything still work? -run(); -``` +Yes! Gemini AI v2 should completely be backward-compatible. Most changes are under-the-hood, so your DX should be much smoother, [especially for TS developers](#types)! -Gemini AI (ES6 Modules): +The only thing that you can consider changing is using the new array message format instead of the old buffer format. See the [dedicated question](#whats-the-difference-between-data-and-directly-passing-buffers-in-the-message) to learn more. + +### What is the default model? + +> And, by extension, why is it the default model? + +By default, Gemini AI uses `gemini-1.5-flash-latest`, Google's leading efficiency-based model. The reason that this is the default model is because of two main reasons regarding DX: + +1. 📈 **Higher Rate Limits**: Gemini 1.5 Pro is limited to 2 requests per minute, versus the 15 for Flash, so we choose the one with the higher rate limit, which is especially useful for development. +2. ⚡ **Faster Response Time**: Gemini 1.5 Pro is a significant amount slower, so we use the faster model by default. + +But, of course, should you need to change the model, it's as easy as passing it into the configuration of your request. For example: ```javascript import Gemini from "gemini-ai"; const gemini = new Gemini(API_KEY); -console.log(await gemini.ask("Hi!")); + +console.log( + await gemini.ask("Hello!", { + model: "gemini-1.5-pro-latest", + }) +); ``` -That's nearly 4 times less code! +### Changing the API Version -
+> What if I want to use a deprecated command? + +When initializing `Gemini`, you can pass in an API version. This feature mainly exists to futureproof, as the current recommended API version (and the one used) is `v1beta`. Note that some modern models (including the default Gemini 1.5 Flash) may not work on other API versions. + +Here's how you can change it to, say, v1: + +```javascript +import Gemini from "gemini-ai"; + +const gemini = new Gemini(API_KEY, { + apiVersion: "v1", +}); +``` + +### How to Polyfill Fetch -### I'm in a browser environment! What do I do? +> I'm in a browser environment! What do I do? Everything is optimized so it works for both browsers and Node.js—Files are passed as Buffers, so you decide how to get them, and adding a fetch polyfill is as easy as: diff --git a/.gitignore b/.gitignore index f05f220..65bf0c0 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,4 @@ /node_modules .DS_Store -/coverage \ No newline at end of file +/coverage +/dist \ No newline at end of file diff --git a/.npmignore b/.npmignore index 535b9fd..338c4f2 100644 --- a/.npmignore +++ b/.npmignore @@ -2,4 +2,8 @@ /coverage /.github /test -/bun.lockb \ No newline at end of file +/bun.lockb +/src +tsconfig.json +tsup.config.ts +biome.json diff --git a/README.md b/README.md index a2ddf21..26f8a13 100644 --- a/README.md +++ b/README.md @@ -5,10 +5,10 @@ - + - - + +

@@ -17,7 +17,8 @@ ## Quickstart -Make a text request (`gemini-pro`): + +Make a text request: ```javascript import Gemini from "gemini-ai"; @@ -27,7 +28,7 @@ const gemini = new Gemini(API_KEY); console.log(await gemini.ask("Hi!")); ``` -Make a streaming text request (`gemini-pro`): +Make a streaming text request: ```javascript import Gemini from "gemini-ai"; @@ -39,7 +40,7 @@ gemini.ask("Hi!", { }); ``` -Chat with Gemini (`gemini-pro`): +Chat with Gemini: ```javascript import Gemini from "gemini-ai"; diff --git a/assets/banner.png b/assets/banner.png new file mode 100644 index 0000000..4ef6e87 Binary files /dev/null and b/assets/banner.png differ diff --git a/assets/banner@dark.svg b/assets/banner@dark.svg deleted file mode 100644 index 62526d8..0000000 --- a/assets/banner@dark.svg +++ /dev/null @@ -1,84 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/assets/banner@light.svg b/assets/banner@light.svg deleted file mode 100644 index ecda0a9..0000000 --- a/assets/banner@light.svg +++ /dev/null @@ -1,84 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/biome.json b/biome.json index 59c2ef1..8ec7d8e 100644 --- a/biome.json +++ b/biome.json @@ -10,6 +10,6 @@ } }, "files": { - "ignore": ["coverage"] + "ignore": ["coverage", "dist"] } } diff --git a/bun.lockb b/bun.lockb index c812b0b..707cdcf 100755 Binary files a/bun.lockb and b/bun.lockb differ diff --git a/index.js b/index.js deleted file mode 100644 index 3e77dbc..0000000 --- a/index.js +++ /dev/null @@ -1,415 +0,0 @@ -const fileTypeFromBuffer = (arrayBuffer) => { - const uint8arr = new Uint8Array(arrayBuffer); - - const len = 4; - if (uint8arr.length >= len) { - const signatureArr = new Array(len); - for (let i = 0; i < len; i++) - signatureArr[i] = new Uint8Array(arrayBuffer)[i].toString(16); - const signature = signatureArr.join("").toUpperCase(); - switch (signature) { - case "89504E47": - return "image/png"; - case "47494638": - return "image/gif"; - case "FFD8FFDB": - case "FFD8FFE0": - return "image/jpeg"; - default: - throw new Error( - "Unknown file type. Please provide a .png, .gif, or .jpeg/.jpg file.", - ); - } - } - throw new Error( - "Unknown file type. Please provide a .png, .gif, or .jpeg/.jpg file.", - ); -}; - -const answerPairToParameter = (message) => { - if (message.length !== 2) { - throw new Error( - "Message format must be an array of [user, model] pairs. See docs for more information.", - ); - } - return [ - { - parts: [{ text: message[0] }], - role: "user", - }, - { - parts: [{ text: message[1] }], - role: "model", - }, - ]; -}; - -export default class Gemini { - #fetch; - #dispatcher; - - static JSON = "json"; - static TEXT = "markdown"; - - constructor(key, rawConfig = {}) { - let defaultFetch; - - try { - defaultFetch = fetch; - } catch {} - - const config = this.#parseConfig(rawConfig, { - fetch: defaultFetch, - dispatcher: undefined, - }); - - if (!config.fetch) - throw new Error( - "Fetch was not found in environment, and no polyfill was provided. Please install a polyfill, and put it in the `fetch` property of the Gemini configuration.", - ); - - this.#fetch = config.fetch; - this.key = key; - this.#dispatcher = config.dispatcher; - } - - #parseConfig(raw = {}, defaults = {}) { - const extras = Object.keys(raw).filter( - (item) => !Object.keys(defaults).includes(item), - ); - if (extras.length) - throw new Error( - `These following configurations are not available on this function: ${extras.join( - ", ", - )}`, - ); - return { ...defaults, ...raw }; - } - - #switchFormat(format, response) { - switch (format) { - case Gemini.TEXT: - return response.candidates[0].content.parts[0].text; - case Gemini.JSON: - return response; - default: - throw new Error( - `${config.format} is not a valid format. Use Gemini.TEXT or Gemini.JSON.`, - ); - } - } - - async #query(model, command, body) { - const opts = { - method: "POST", - headers: { - "Content-Type": "application/json", - }, - body: JSON.stringify(body), - dispatcher: this.#dispatcher, - }; - - const response = await this.#fetch( - `https://generativelanguage.googleapis.com/v1beta/models/${model}:${command}?key=${this.key}`, - opts, - ); - - if (!response.ok) { - throw new Error( - `There was an HTTP error when fetching Gemini. HTTP status: ${response.status}`, - ); - } - - return response; - } - - async #queryJSON(model, command, body) { - const response = await this.#query(model, command, body); - - const json = await response.json(); - if (!response.ok) - throw new Error( - `An error occurred when fetching Gemini: \n${json.error.message}`, - ); - - return json; - } - - async #queryStream(model, command, body, callback) { - const response = await this.#query(model, command, body); - - const reader = response.body.getReader(); - const decoder = new TextDecoder("utf-8"); - - let jsonString = ""; - let json; - - await reader.read().then(function processText({ done, value }) { - if (done) { - return; - } - - jsonString += decoder.decode(value, { stream: true }); - - try { - const parsedJSON = JSON.parse(`${jsonString}]`); - json = { ...json, ...parsedJSON[parsedJSON.length - 1] }; - callback(json); - } catch {} - - return reader.read().then(processText); - }); - } - - async ask(message, rawConfig = {}) { - const config = this.#parseConfig(rawConfig, { - temperature: 1, - topP: 0.8, - topK: 10, - format: Gemini.TEXT, - maxOutputTokens: 800, - model: undefined, - data: [], - messages: [], - stream: undefined, - }); - - const body = { - contents: [ - ...config.messages.flatMap(answerPairToParameter), - { - parts: [{ text: message }], - role: "user", - }, - ], - generationConfig: { - temperature: config.temperature, - maxOutputTokens: config.maxOutputTokens, - topP: config.topP, - topK: config.topK, - }, - }; - - if (config.data.length) { - for (const data of config.data) { - body.contents.at(-1).parts.push({ - inline_data: { - mime_type: fileTypeFromBuffer(data), - data: data.toString("base64"), - }, - }); - } - } - - if (config.stream) { - let finalJSON = undefined; - - await this.#queryStream( - config.model || - (config.data.length ? "gemini-pro-vision" : "gemini-pro"), - "streamGenerateContent", - body, - (streamContent) => { - if (!finalJSON) finalJSON = streamContent; - else - finalJSON.candidates[0].content.parts[0].text += - streamContent.candidates[0].content.parts[0].text; - - if (streamContent.promptFeedback.blockReason) { - throw new Error( - `Your prompt was blocked by Google. Here is Gemini's feedback: \n${JSON.stringify( - response.promptFeedback, - null, - 4, - )}`, - ); - } - - config.stream(this.#switchFormat(config.format, streamContent)); - }, - ); - - return this.#switchFormat(config.format, finalJSON); - } - - const response = await this.#queryJSON( - config.model || (config.data.length ? "gemini-pro-vision" : "gemini-pro"), - "generateContent", - body, - ); - - if (response.promptFeedback.blockReason) { - throw new Error( - `Your prompt was blocked by Google. Here is Gemini's feedback: \n${JSON.stringify( - response.promptFeedback, - null, - 4, - )}`, - ); - } - - return this.#switchFormat(config.format, response); - } - - async count(message, rawConfig = {}) { - const config = this.#parseConfig(rawConfig, { - model: "gemini-pro", - }); - - const body = { - contents: [ - { - parts: [{ text: message }], - role: "user", - }, - ], - }; - - const response = await this.#queryJSON(config.model, "countTokens", body); - - return response.totalTokens; - } - - async embed(message, rawConfig = {}) { - const config = this.#parseConfig(rawConfig, { - model: "embedding-001", - }); - - const body = { - model: `models/${config.model}`, - content: { - parts: [{ text: message }], - role: "user", - }, - }; - - const response = await this.#queryJSON(config.model, "embedContent", body); - - return response.embedding.values; - } - - createChat(rawChatConfig) { - class Chat { - constructor(gemini, rawConfig = {}) { - this.gemini = gemini; - this.config = this.gemini.#parseConfig(rawConfig, { - messages: [], - temperature: 1, - topP: 0.8, - topK: 10, - model: "gemini-pro", - maxOutputTokens: 800, - }); - this.messages = this.config.messages.flatMap(answerPairToParameter); - } - - async ask(message, rawConfig) { - const config = { - ...this.config, - ...this.gemini.#parseConfig(rawConfig, { - format: Gemini.TEXT, - data: [], - stream: undefined, - }), - }; - - if (this.messages.at(-1)?.role === "user") { - throw new Error( - "Please ensure you are running chat commands asynchronously. You cannot send 2 messages at the same time in the same chat. Use standard Gemini.ask() for this.", - ); - } - - const currentMessage = { - parts: [{ text: message }], - role: "user", - }; - - if (config.data.length) { - try { - this.config.model = "gemini-pro-vision"; - for (const data of config.data) { - currentMessage.parts.push({ - inline_data: { - mime_type: fileTypeFromBuffer(data).mime, - data: data.toString("base64"), - }, - }); - } - } catch { - console.error( - "It is currently not supported by Google to use non-text data with the chat function.", - ); - } - } - - this.messages.push(currentMessage); - - const body = { - contents: [this.messages], - generationConfig: { - temperature: config.temperature, - maxOutputTokens: config.maxOutputTokens, - topP: config.topP, - topK: config.topK, - }, - }; - - if (config.stream) { - let finalJSON = {}; - - await this.gemini.#queryStream( - config.model || - (config.data.length ? "gemini-pro-vision" : "gemini-pro"), - "streamGenerateContent", - body, - (streamContent) => { - finalJSON = streamContent; - - if (streamContent.promptFeedback?.blockReason) { - this.messages.pop(); - throw new Error( - `Your prompt was blocked by Google. Here is Gemini's feedback: \n${JSON.stringify( - response.promptFeedback, - null, - 4, - )}`, - ); - } - - config.stream( - this.gemini.#switchFormat(config.format, streamContent), - ); - }, - ); - - this.messages.push(finalJSON.candidates[0].content); - - return this.gemini.#switchFormat(config.format, finalJSON); - } - const response = await this.gemini.#queryJSON( - config.model || - (config.data.length ? "gemini-pro-vision" : "gemini-pro"), - "generateContent", - body, - ); - - if (response.promptFeedback?.blockReason) { - this.messages.pop(); - throw new Error( - `Your prompt was blocked by Google. Here is Gemini's feedback: \n${JSON.stringify( - response.promptFeedback, - null, - 4, - )}`, - ); - } - - this.messages.push(response.candidates[0].content); - - return this.gemini.#switchFormat(config.format, response); - } - } - - return new Chat(this, rawChatConfig); - } -} diff --git a/package.json b/package.json index e9d10bb..ceaf490 100644 --- a/package.json +++ b/package.json @@ -1,29 +1,35 @@ { - "name": "gemini-ai", - "version": "1.1.0", - "author": "EvanZhouDev", - "main": "index.js", - "description": "The easiest way to use the powerful Google Gemini model.", - "license": "GPL-3.0", - "repository": { - "type": "git", - "url": "https://github.com/EvanZhouDev/gemini-ai.git" - }, - "homepage": "https://github.com/EvanZhouDev/gemini-ai", - "scripts": { - "test": "vitest", - "coverage": "vitest --coverage", - "check": "bunx @biomejs/biome check --apply ." - }, - "keywords": [ - "google", - "ai", - "gemini" - ], - "type": "module", - "devDependencies": { - "@biomejs/biome": "1.4.1", - "@vitest/coverage-v8": "^1.0.4", - "vitest": "^1.0.4" - } + "name": "gemini-ai", + "version": "2.0.0", + "author": "EvanZhouDev", + "description": "The easiest way to use the powerful Google Gemini model.", + "license": "GPL-3.0", + "repository": { + "type": "git", + "url": "git+https://github.com/EvanZhouDev/gemini-ai.git" + }, + "homepage": "https://github.com/EvanZhouDev/gemini-ai", + "scripts": { + "build": "tsup", + "test": "vitest", + "coverage": "vitest --coverage", + "check": "bunx @biomejs/biome check --apply ." + }, + "keywords": [ + "google", + "ai", + "gemini" + ], + "devDependencies": { + "@biomejs/biome": "1.4.1", + "@vitest/coverage-v8": "^1.0.4", + "tsup": "^8.0.2", + "vitest": "^1.0.4" + }, + "dependencies": { + "file-type": "^19.0.0", + "undici": "^6.18.1" + }, + "main": "dist/index.mjs", + "typings": "dist/index.d.mts" } diff --git a/src/index.ts b/src/index.ts new file mode 100644 index 0000000..7661c84 --- /dev/null +++ b/src/index.ts @@ -0,0 +1,470 @@ +import { Command } from "./types"; + +import type { + ChatAskOptions, + ChatOptions, + CommandOptionMap, + CommandResponseMap, + Format, + FormatType, + GeminiOptions, + GeminiResponse, + Message, + Part, + QueryBodyMap, + QueryResponseMap, +} from "./types"; + +import { getFileType, handleReader, pairToMessage } from "./utils"; + +const uploadFile = async ({ + file, + mimeType, + gemini, +}: { + file: Uint8Array | ArrayBuffer; + mimeType: string; + gemini: Gemini; +}) => { + const BASE_URL = "https://generativelanguage.googleapis.com"; + + function generateBoundary() { + let str = ""; + for (let i = 0; i < 2; i++) { + str = str + Math.random().toString().slice(2); + } + return str; + } + + const boundary = generateBoundary(); + + const generateBlob = ( + boundary: string, + file: Uint8Array | ArrayBuffer, + mime: string, + ) => + new Blob([ + `--${boundary}\r\nContent-Type: application/json; charset=utf-8\r\n\r\n${JSON.stringify( + { + file: { + mimeType: mime, + }, + }, + )}\r\n--${boundary}\r\nContent-Type: ${mime}\r\n\r\n`, + file, + `\r\n--${boundary}--`, + ]); + + const fileSendDataRaw = await gemini + .fetch(`${BASE_URL}/upload/${gemini.apiVersion}/files?key=${gemini.key}`, { + method: "POST", + headers: { + "Content-Type": `multipart/related; boundary=${boundary}`, + "X-Goog-Upload-Protocol": "multipart", + }, + body: generateBlob(boundary, file, mimeType), + }) + .then((res: Response) => res.json()); + + console.log(fileSendDataRaw); + + const fileSendData = fileSendDataRaw.file; + + let waitTime = 250; // Initial wait time in milliseconds + const MAX_BACKOFF = 5000; // Maximum backoff time in milliseconds + + // Keep polling until the file state is "ACTIVE" + while (true) { + try { + const url = `${BASE_URL}/${gemini.apiVersion}/${fileSendData.name}?key=${gemini.key}`; + + const response = await gemini.fetch(url, { method: "GET" }); + const data = await response.json(); + + if (data.error) { + throw new Error(data.error.message); + } + + if (data.state === "ACTIVE") break; + + await new Promise((resolve) => setTimeout(resolve, waitTime)); + + waitTime = Math.min(waitTime * 1.5, MAX_BACKOFF); + } catch (error) { + console.error(`An error occurred: ${error.message}`); + break; + } + } + + return fileSendData.uri; +}; + +export const messageToParts = async ( + messages: (Uint8Array | ArrayBuffer | string)[], + gemini: Gemini, +): Promise => { + const parts = []; + + for (const msg of messages) { + if (typeof msg === "string") { + parts.push({ text: msg }); + } else if (msg instanceof ArrayBuffer || msg instanceof Uint8Array) { + const mimeType = await getFileType(msg); + if (!mimeType.startsWith("image")) { + const fileURI = await uploadFile({ + file: msg, + mimeType: mimeType, + gemini: gemini, + }); + parts.push({ + fileData: { + mime_type: mimeType, + fileUri: fileURI, + }, + }); + } else { + parts.push({ + inline_data: { + mime_type: await getFileType(msg), + data: Buffer.from(msg).toString("base64"), + }, + }); + } + } + } + + return parts; +}; + +class Gemini { + readonly key: string; + readonly apiVersion: string; + readonly fetch: typeof fetch; + + static TEXT = "text" as const; + static JSON = "json" as const; + + constructor(key: string, options: Partial = {}) { + const parsedOptions: GeminiOptions = { + ...{ + apiVersion: "v1beta", + fetch: fetch, + }, + ...options, + }; + + this.key = key; + this.fetch = parsedOptions.fetch; + this.apiVersion = parsedOptions.apiVersion; + } + + async query( + model: string, + command: C, + body: QueryBodyMap[C], + ): Promise { + const opts = { + method: "POST", + headers: { + "Content-Type": "application/json", + }, + body: JSON.stringify(body), + }; + + const url = new URL( + `https://generativelanguage.googleapis.com/${this.apiVersion}/models/${model}:${command}`, + ); + + url.searchParams.append("key", this.key); + if (command === Command.StreamGenerate) + url.searchParams.append("alt", "sse"); + + const response = await this.fetch(url.toString(), opts); + + if (!response.ok) { + throw new Error( + `There was an error when fetching Gemini.\n${await response.text()}`, + ); + } + + return response; + } + + async count( + message: string, + options: Partial = {}, + ): Promise { + const parsedOptions: CommandOptionMap[Command.Count] = { + ...{ + model: "gemini-1.5-flash-latest", + }, + ...options, + }; + + const body: QueryBodyMap[Command.Count] = { + contents: [ + { + parts: [{ text: message }], + role: "user", + }, + ], + }; + + const response: Response = await this.query( + parsedOptions.model, + Command.Count, + body, + ); + + const output: QueryResponseMap[Command.Count] = await response.json(); + return output.totalTokens; + } + + async embed( + message: string, + options: Partial = {}, + ) { + const parsedOptions: CommandOptionMap[Command.Embed] = { + ...{ + model: "embedding-001", + }, + ...options, + }; + + const body: QueryBodyMap[Command.Embed] = { + model: `models/${parsedOptions.model}`, + content: { + parts: [{ text: message }], + role: "user", + }, + }; + + const response: Response = await this.query( + parsedOptions.model, + Command.Embed, + body, + ); + + const output: QueryResponseMap[Command.Embed] = await response.json(); + return output.embedding.values; + } + + private getTextObject = (response: GeminiResponse) => + response.candidates[0].content.parts[0]; + + private switchFormat = + (format: F = Gemini.TEXT as F) => + (response: GeminiResponse): FormatType => { + if (response.candidates[0].finishReason === "SAFETY") { + throw new Error( + `Your prompt was blocked by Google. Here are the Harm Categories: \n${JSON.stringify( + response.candidates[0].safetyRatings, + null, + 4, + )}`, + ); + } + + switch (format) { + case Gemini.TEXT: + return this.getTextObject(response).text as FormatType; + case Gemini.JSON: + return response as FormatType; + } + }; + + private getText = this.switchFormat(Gemini.TEXT); + + private handleStream = async ( + response: Response, + format: F, + cb: (response: FormatType) => void, + ) => { + const formatter: (response: GeminiResponse) => FormatType = + this.switchFormat(format); + + let res: GeminiResponse; + let text = ""; + + await handleReader(response, (value: GeminiResponse) => { + res = value; + text += this.getText(value); + + cb(formatter(value)); + }); + + this.getTextObject(res).text = text; + + return formatter(res); + }; + + async ask( + message: string | (string | Uint8Array | ArrayBuffer)[] | Message, + options: Partial[Command.Generate]> = {}, + ): Promise[Command.Generate]> { + const parsedOptions: CommandOptionMap[Command.Generate] = { + ...{ + model: "gemini-1.5-flash-latest", + temperature: 1, + topP: 0.94, + topK: 32, + format: Gemini.TEXT as F, + maxOutputTokens: 2048, + data: [], + messages: [], + }, + ...options, + }; + + const command = parsedOptions.stream + ? Command.StreamGenerate + : Command.Generate; + + const contents = [ + ...parsedOptions.messages.flatMap( + (msg: [string, string] | Message) => { + if (Array.isArray(msg)) { + return pairToMessage(msg); + } + return msg; + }, + ), + ]; + + if (!Array.isArray(message) && typeof message !== "string") { + if (message.role === "model") + throw new Error("Please prompt with role as 'user'"); + contents.push(message); + } else { + const messageParts = [message, parsedOptions.data].flat(); + const parts = await messageToParts(messageParts, this); + + contents.push({ + parts: parts, + role: "user", + }); + } + + const body: QueryBodyMap[typeof command] = { + contents, + generationConfig: { + temperature: parsedOptions.temperature, + maxOutputTokens: parsedOptions.maxOutputTokens, + topP: parsedOptions.topP, + topK: parsedOptions.topK, + }, + }; + + const response: Response = await this.query( + parsedOptions.model, + command, + body, + ); + + if (parsedOptions.stream) { + return this.handleStream( + response, + parsedOptions.format, + parsedOptions.stream, + ); + } + + return this.switchFormat(parsedOptions.format)(await response.json()); + } + + createChat(options: Partial = {}) { + return new Chat(this, options); + } +} + +class Chat { + gemini: Gemini; + options: ChatOptions; + messages: Message[]; + + constructor(gemini: Gemini, options?: Partial) { + const parsedOptions: ChatOptions = { + ...{ + messages: [], + temperature: 1, + topP: 0.94, + topK: 1, + model: "gemini-1.5-flash-latest", + maxOutputTokens: 2048, + }, + ...options, + }; + + this.gemini = gemini; + this.options = parsedOptions; + + this.messages = parsedOptions.messages.flatMap(pairToMessage); + } + + async ask( + message: string | (string | Uint8Array | ArrayBuffer)[], // make this support Message + options: Partial> = {}, + ): Promise[Command.Generate]> { + const parsedConfig: CommandOptionMap[Command.Generate] = { + ...this.options, + ...{ + data: [], + format: Gemini.TEXT as F, + }, + ...options, + }; + + if (this.messages.at(-1)?.role === "user") { + throw new Error( + "Gemini has not yet responded to your last message. Please ensure you are running chat commands asynchronously.", + ); + } + + try { + const parsedMessage: Message = { + parts: await messageToParts([message].flat(), this.gemini), + role: "user", + }; + + const response = await this.gemini.ask(parsedMessage, { + ...parsedConfig, + format: Gemini.JSON, + messages: this.messages, + stream: parsedConfig.stream + ? (res) => + parsedConfig.stream( + options.format === Gemini.JSON + ? (res as FormatType) + : (res.candidates[0].content.parts[0].text as FormatType), + ) + : undefined, + }); + + this.messages.push(parsedMessage); + this.messages.push({ + parts: response.candidates[0].content.parts, + role: "model", + }); + + return options.format === Gemini.JSON + ? (response as FormatType) + : (response.candidates[0].content.parts[0].text as FormatType); + } catch (e) { + throw new Error(e); + } + } +} + +export default Gemini; + +export type { + Format, + Message, + Part, + GeminiResponse, + CommandResponseMap, + CommandOptionMap, + GeminiOptions, + ChatOptions, + ChatAskOptions, +}; diff --git a/src/polyfillTextDecoderStream.ts b/src/polyfillTextDecoderStream.ts new file mode 100644 index 0000000..7b64abc --- /dev/null +++ b/src/polyfillTextDecoderStream.ts @@ -0,0 +1,42 @@ +// TextDecoderStream Polyfill for Bun. +// https://github.com/oven-sh/bun/issues/5648#issuecomment-1824093837 + +export class PolyfillTextDecoderStream extends TransformStream< + Uint8Array, + string +> { + readonly encoding: string; + readonly fatal: boolean; + readonly ignoreBOM: boolean; + + constructor( + encoding = "utf-8", + { + fatal = false, + ignoreBOM = false, + }: ConstructorParameters[1] = {}, + ) { + const decoder = new TextDecoder(encoding, { fatal, ignoreBOM }); + super({ + transform( + chunk: Uint8Array, + controller: TransformStreamDefaultController, + ) { + const decoded = decoder.decode(chunk); + if (decoded.length > 0) { + controller.enqueue(decoded); + } + }, + flush(controller: TransformStreamDefaultController) { + const output = decoder.decode(); + if (output.length > 0) { + controller.enqueue(output); + } + }, + }); + + this.encoding = encoding; + this.fatal = fatal; + this.ignoreBOM = ignoreBOM; + } +} diff --git a/src/types.ts b/src/types.ts new file mode 100644 index 0000000..fbba7b7 --- /dev/null +++ b/src/types.ts @@ -0,0 +1,172 @@ +import { ProxyAgent } from "undici"; + +type FileType = + | "image/png" + | "image/jpeg" + | "image/webp" + | "image/heic" + | "image/heif" + | "audio/wav" + | "audio/mp3" + | "audio/aiff" + | "audio/aac" + | "audio/ogg" + | "audio/flac" + | "video/mp4" + | "video/mpeg" + | "video/mov" + | "video/avi" + | "video/x-flv" + | "video/mpg" + | "video/webm" + | "video/wmv" + | "video/3gpp"; + +type RemoteFilePart = { fileData: { mime_type: FileType; fileUri: string } }; + +type InlineFilePart = { inline_data: { mime_type: FileType; data: string } }; + +type TextPart = { text: string }; + +export type Part = TextPart | RemoteFilePart | InlineFilePart; + +type Role = "user" | "model"; + +type SafetyRating = { category: string; probability: string }; + +export type Message = { parts: Part[]; role: Role }; + +export type PromptFeedback = { + blockReason?: string; + safetyRatings: SafetyRating[]; +}; + +export type Candidate = { + content: { parts: TextPart[]; role: Role }; + finishReason: string; + index: number; + safetyRatings: SafetyRating[]; +}; + +export type GeminiResponse = { + candidates: Candidate[]; + promptFeedback: PromptFeedback; +}; + +export enum Command { + StreamGenerate = "streamGenerateContent", + Generate = "generateContent", + Embed = "embedContent", + Count = "countTokens", +} + +/** + * The body used for the API call to generateContent or streamGenerateContent + */ +type GenerateContentBody = { + contents: Message[]; + generationConfig: { + maxOutputTokens: number; + temperature: number; + topP: number; + topK: number; + }; +}; + +/** + * The response from the REST API to generateContent or streamGenerateContent + */ +type GenerateContentQueryOutput = { + candidates: Candidate[]; + promptFeedback: PromptFeedback; +}; + +/** + * The body used for the API call for each command + */ +export type QueryBodyMap = { + [Command.StreamGenerate]: GenerateContentBody; + [Command.Generate]: GenerateContentBody; + [Command.Count]: { contents: Message[] }; + [Command.Embed]: { model: string; content: Message }; +}; + +/** + * The response from the REST API for each command + */ +export type QueryResponseMap = { + [Command.StreamGenerate]: GenerateContentQueryOutput; + [Command.Generate]: GenerateContentQueryOutput; + [Command.Embed]: { + embedding: { values: number[] }; + }; + [Command.Count]: { + totalTokens: number; + }; +}; + +// These types are also directly used, as a string, in the Gemini class static properties +// If you are to change these types, ensure to modify the statics in the Gemini class as well. +export type TextFormat = "text"; +export type JSONFormat = "json"; +export type Format = TextFormat | JSONFormat; + +/** + * The output format for each command. + */ +export type CommandResponseMap = { + [Command.StreamGenerate]: F extends JSONFormat + ? QueryResponseMap[Command.StreamGenerate] + : string; + [Command.Generate]: F extends JSONFormat + ? QueryResponseMap[Command.Generate] + : string; + [Command.Embed]: number[]; + [Command.Count]: number; +}; + +export type GeminiOptions = { + fetch?: typeof fetch; + apiVersion?: string; + dispatcher?: ProxyAgent; +}; + +/** + * The option format for each command. + */ +export type CommandOptionMap = { + [Command.Generate]: { + temperature: number; + topP: number; + topK: number; + format: F; + maxOutputTokens: number; + model: string; + data: Buffer[]; + messages: ([string, string] | Message)[]; + stream?(stream: CommandResponseMap[Command.StreamGenerate]): void; + }; + [Command.Embed]: { + model: string; + }; + [Command.Count]: { + model: string; + }; +}; + +export type FormatType = T extends JSONFormat ? GeminiResponse : string; + +export type ChatOptions = { + messages: [string, string][]; + temperature: number; + topP: number; + topK: number; + model: string; + maxOutputTokens: number; +}; + +export type ChatAskOptions = { + format: F; + data: []; + stream?(stream: CommandResponseMap[Command.StreamGenerate]): void; +}; diff --git a/src/utils.ts b/src/utils.ts new file mode 100644 index 0000000..3ac9e64 --- /dev/null +++ b/src/utils.ts @@ -0,0 +1,76 @@ +import { FileTypeResult, fileTypeFromBuffer } from "file-type"; +import { PolyfillTextDecoderStream } from "./polyfillTextDecoderStream"; +import type { GeminiResponse, Message } from "./types"; + +export const getFileType = async (buffer: Uint8Array | ArrayBuffer) => { + const fileType: FileTypeResult | undefined = await fileTypeFromBuffer(buffer); + + const validMediaFormats = [ + "image/png", + "image/jpeg", + "image/webp", + "image/heic", + "image/heif", + "audio/wav", + "audio/mp3", + "audio/mpeg", + "audio/aiff", + "audio/aac", + "audio/ogg", + "audio/flac", + "video/mp4", + "video/mpeg", + "video/mov", + "video/avi", + "video/x-flv", + "video/mpg", + "video/webm", + "video/wmv", + "video/3gpp", + ]; + + const formatMap = { + "audio/mpeg": "audio/mp3", + }; + + const format = formatMap[fileType?.mime as string] || fileType?.mime; + + if (format === undefined || !validMediaFormats.includes(format)) + throw new Error( + "Please provide a valid file format that is accepted by Gemini. Learn more about valid formats here: https://ai.google.dev/gemini-api/docs/prompting_with_media?lang=node#supported_file_formats" + ); + + return format; +}; + +export const handleReader = async ( + response: Response, + cb: (response: GeminiResponse) => void +) => { + if (!response.body) throw new Error(await response.text()); + + const reader = response.body + .pipeThrough(new PolyfillTextDecoderStream()) + .getReader(); + + await reader.read().then(function processText({ done, value }) { + if (done) return; + + cb(JSON.parse(value.replace(/^data: /, ""))); + + return reader.read().then(processText); + }); +}; + +export const pairToMessage = (message: [string, string]): Message[] => { + return [ + { + parts: [{ text: message[0] }], + role: "user", + }, + { + parts: [{ text: message[1] }], + role: "model", + }, + ]; +}; diff --git a/test/fullCoverage.test.js b/test/fullCoverage.test.ts similarity index 76% rename from test/fullCoverage.test.js rename to test/fullCoverage.test.ts index 4dbee0e..7eb1873 100644 --- a/test/fullCoverage.test.js +++ b/test/fullCoverage.test.ts @@ -1,14 +1,15 @@ -import fs from "fs"; +import * as fs from "fs"; import { expect, test, vi } from "vitest"; -import Gemini from "../index"; +import type { Mock } from "vitest"; +import Gemini from "../src/index"; +import { Command, QueryResponseMap } from "../src/types"; const API_KEY = "demo-key"; -function createFetchResponse(data) { - return Promise.resolve({ ok: true, status: 200, json: () => data }); -} +const createFetchResponse = (data: T) => + Promise.resolve({ ok: true, status: 200, json: () => data }); -const generateContentResponse = { +const generateContentResponse: QueryResponseMap[Command.Generate] = { candidates: [ { content: { @@ -63,7 +64,7 @@ const generateContentResponse = { }, }; -const embedResponse = { +const embedResponse: QueryResponseMap[Command.Embed] = { embedding: { values: [ 0.014044438, -0.011704044, -0.018803535, -0.048892725, 0.022579819, @@ -90,21 +91,25 @@ const embedResponse = { }, }; -const countResponse = { +const countResponse: QueryResponseMap[Command.Count] = { totalTokens: 2, }; test("Gemini.ask()", async () => { - global.fetch = vi.fn(); - fetch.mockReturnValueOnce(createFetchResponse(generateContentResponse)); + global.fetch = vi.fn() as Mock; + (fetch as Mock).mockReturnValueOnce( + createFetchResponse(generateContentResponse) + ); const gemini = new Gemini(API_KEY); expect(await gemini.ask("Hello!")).toBe("Hi!"); }); test("Gemini.ask() with Previous Messages", async () => { - global.fetch = vi.fn(); - fetch.mockReturnValueOnce(createFetchResponse(generateContentResponse)); + global.fetch = vi.fn() as Mock; + (fetch as Mock).mockReturnValueOnce( + createFetchResponse(generateContentResponse) + ); const gemini = new Gemini(API_KEY); const fileBuffer = fs.readFileSync(`${__dirname}/assets/test.jpg`); @@ -114,13 +119,15 @@ test("Gemini.ask() with Previous Messages", async () => { format: Gemini.JSON, data: [fileBuffer], messages: [["Hi", "Sup?"]], - }), + }) ).toBe(generateContentResponse); }); test("Gemini.ask() with Data", async () => { - global.fetch = vi.fn(); - fetch.mockReturnValueOnce(createFetchResponse(generateContentResponse)); + global.fetch = vi.fn() as Mock; + (fetch as Mock).mockReturnValueOnce( + createFetchResponse(generateContentResponse) + ); const gemini = new Gemini(API_KEY); const fileBuffer = fs.readFileSync(`${__dirname}/assets/test.jpg`); @@ -129,41 +136,29 @@ test("Gemini.ask() with Data", async () => { await gemini.ask("What does this show?", { format: Gemini.JSON, data: [fileBuffer], - }), + }) ).toBe(generateContentResponse); }); test("Gemini.ask() with JSON Response", async () => { - global.fetch = vi.fn(); - fetch.mockReturnValueOnce(createFetchResponse(generateContentResponse)); + global.fetch = vi.fn() as Mock; + (fetch as Mock).mockReturnValueOnce( + createFetchResponse(generateContentResponse) + ); const gemini = new Gemini(API_KEY); expect( await gemini.ask("Hello!", { format: Gemini.JSON, - }), + }) ).toBe(generateContentResponse); }); -test("Gemini.ask() with Incorrect Config", async () => { - const gemini = new Gemini(API_KEY); - - expect( - (async () => - await gemini.ask("Hello!", { - format: Gemini.JSON, - nonExistantProperty: "hi", - }))(), - ).rejects.toThrowError( - "These following configurations are not available on this function: nonExistantProperty", - ); -}); - test("Fetch Polyfill", async () => { const fetchPolyfill = vi.fn(); fetchPolyfill.mockReturnValueOnce( - createFetchResponse(generateContentResponse), + createFetchResponse(generateContentResponse) ); const gemini = new Gemini(API_KEY, { @@ -173,7 +168,8 @@ test("Fetch Polyfill", async () => { }); test("Gemini.embed()", async () => { - fetch.mockReturnValueOnce(createFetchResponse(embedResponse)); + global.fetch = vi.fn() as Mock; + (fetch as Mock).mockReturnValueOnce(createFetchResponse(embedResponse)); const gemini = new Gemini(API_KEY); @@ -181,7 +177,8 @@ test("Gemini.embed()", async () => { }); test("Gemini.count()", async () => { - fetch.mockReturnValueOnce(createFetchResponse(countResponse)); + global.fetch = vi.fn() as Mock; + (fetch as Mock).mockReturnValueOnce(createFetchResponse(countResponse)); const gemini = new Gemini(API_KEY); @@ -189,7 +186,10 @@ test("Gemini.count()", async () => { }); test("Gemini.createChat()", async () => { - fetch.mockReturnValueOnce(createFetchResponse(generateContentResponse)); + global.fetch = vi.fn() as Mock; + (fetch as Mock).mockReturnValueOnce( + createFetchResponse(generateContentResponse) + ); const gemini = new Gemini(API_KEY); const chat = gemini.createChat(); @@ -217,7 +217,10 @@ test("Gemini.createChat()", async () => { }); test("Gemini.createChat() with JSON", async () => { - fetch.mockReturnValueOnce(createFetchResponse(generateContentResponse)); + global.fetch = vi.fn() as Mock; + (fetch as Mock).mockReturnValueOnce( + createFetchResponse(generateContentResponse) + ); const gemini = new Gemini(API_KEY); const chat = gemini.createChat(); @@ -225,6 +228,6 @@ test("Gemini.createChat() with JSON", async () => { expect( await chat.ask("Hello!", { format: Gemini.JSON, - }), + }) ).toBe(generateContentResponse); }); diff --git a/tsconfig.json b/tsconfig.json new file mode 100644 index 0000000..61a2593 --- /dev/null +++ b/tsconfig.json @@ -0,0 +1,14 @@ +{ + "compilerOptions": { + "target": "es2017", + "module": "es2022", + "esModuleInterop": true, + "moduleResolution": "node" + }, + "include": ["/**/*.ts"], + "exclude": ["node_modules"], + "ts-node": { + "esm": true, + "experimentalSpecifierResolution": "node" + } +} diff --git a/tsup.config.ts b/tsup.config.ts new file mode 100644 index 0000000..a17457a --- /dev/null +++ b/tsup.config.ts @@ -0,0 +1,10 @@ +import { defineConfig } from "tsup"; + +export default defineConfig({ + entry: ["src/index.ts"], + format: ["esm"], + dts: true, + splitting: false, + sourcemap: true, + clean: true, +});