diff --git a/docs/reference/advanced-config.md b/docs/reference/advanced-config.md index 435a09c1d..b9c8b916c 100644 --- a/docs/reference/advanced-config.md +++ b/docs/reference/advanced-config.md @@ -72,6 +72,8 @@ This class is responsible for the serialization of every request, it offers the * `deserialize(json: string): any;` deserializes response strings. * `ndserialize(array: any[]): string;` serializes bulk request objects. * `qserialize(object: any): string;` serializes request query parameters. +* `encodeFloat32Vector(floats: number[]): string;` encodes a float array to base64 for efficient vector ingestion. +* `decodeFloat32Vector(base64: string): number[];` decodes a base64 string back to a float array. ```js const { Client, Serializer } = require('@elastic/elasticsearch') diff --git a/docs/reference/bulk_examples.md b/docs/reference/bulk_examples.md index 51b53e6c7..1530ae5af 100644 --- a/docs/reference/bulk_examples.md +++ b/docs/reference/bulk_examples.md @@ -97,3 +97,55 @@ async function run () { run().catch(console.log) ``` +## Bulk ingestion with base64-encoded vectors [bulk_vectors] + +When ingesting dense vectors, you can encode float arrays as base64 strings for more efficient transfer. The client's `serializer` provides `encodeFloat32Vector` and `decodeFloat32Vector` methods that encode IEEE-754 float32 values in big-endian byte order. + +*Note: Support for ingesting base64-encoded float arrays is available starting in Elasticsearch 9.3.* + +```js +'use strict' + +const { Client } = require('@elastic/elasticsearch') +const client = new Client({ + cloud: { id: '' }, + auth: { apiKey: 'base64EncodedKey' } +}) + +async function run () { + await client.indices.create({ + index: 'my-vectors', + mappings: { + properties: { + title: { type: 'text' }, + embedding: { type: 'dense_vector', dims: 3 } + } + } + }, { ignore: [400] }) + + const documents = [ + { title: 'Document 1', embedding: [0.1, 0.2, 0.3] }, + { title: 'Document 2', embedding: [0.4, 0.5, 0.6] }, + { title: 'Document 3', embedding: [0.7, 0.8, 0.9] } + ] + + const operations = documents.flatMap(doc => [ + { index: { _index: 'my-vectors' } }, + { + title: doc.title, + embedding: client.serializer.encodeFloat32Vector(doc.embedding) + } + ]) + + const bulkResponse = await client.bulk({ refresh: true, operations }) + + if (bulkResponse.errors) { + console.log('Bulk ingestion had errors') + } else { + console.log(`Indexed ${documents.length} documents with encoded vectors`) + } +} + +run().catch(console.log) +``` + diff --git a/docs/reference/examples.md b/docs/reference/examples.md index d307341d1..f75b2ee65 100644 --- a/docs/reference/examples.md +++ b/docs/reference/examples.md @@ -9,6 +9,7 @@ Following you can find some examples on how to use the client. * Use of the [asStream](/reference/as_stream_examples.md) parameter; * Executing a [bulk](/reference/bulk_examples.md) request; +* Executing a [bulk request with base64-encoded vectors](/reference/bulk_examples.md#bulk_vectors); * Executing a [exists](/reference/exists_examples.md) request; * Executing a [get](/reference/get_examples.md) request; * Executing a [sql.query](/reference/sql_query_examples.md) request;