Skip to content

[DRAFT] Documentation for client v2 how to read/write #3350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
141 changes: 141 additions & 0 deletions docs/integrations/language-clients/java/client-v2.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,25 @@ Please use tools like [openssl](https://docs.openssl.org/master/man1/openssl/) t
:::


## Writing Data
This sections describes common scenarios of writing data to ClickHouse. Client has different API methods for different use cases:
- `insert(String tableName, InputStream data, ClickHouseFormat format, InsertSettings settings)` - should be used to write data in a text format. Input stream defined by `data` will compressed according to the settings. Data encoding is done by an application.
- `insert(String tableName, List<?> data, InsertSettings settings)` - should be used to write a list of POJOs (or DTOs). Client will encode data as RowBinary and will handle serialization according to a tabele schema of the table "tableName". Stream will be compressed according to the settings. Can be used for a big datasets.
- `insert(String tableName, DataStreamWriter writer, ClickHouseFormat format, InsertSettings settings)` - more advanced version of the first API method. This one accepts a functional interface implementation that can control how data is written to server. This method is useful when transcoding data into a byte stream not wanted or to organize reading data from a queue. This methods allows to use application compression when data is already compresses, for example, as LZ4 frames. However there are some limitations.

Speed of write operation, first of all, defined by how fast server processes data. It can be slow if data requires a lot of parsing. Therefore operation performance affected much by data format. Please read our blog [post](https://clickhouse.com/blog/clickhouse-input-format-matchup-which-is-fastest-most-efficient) about input formats. Client uses RowBinary by default but user may choose more suitable format for her data.
Configuration is the next stop where performance can be improved. Client builder method `setClientNetworkBufferSize(int size)` should be used to configure size of a buffer that stands between socket and client. This is important configuration because defines how many socket IO operation will be done to send data. When size of this buffer is too small it causes many calls to OS what is potential bottle-neck. When size of this buffer is too big compare to socket buffers, system performance and available memory it will cause slowness because copying data from heap to OS buffer is very expensive operations. Big size of the buffer means more memory needed per request, too.


## Reading Data
This sections describes common scenarios of reading data from ClickHouse. Client has different API method for different use cases:
- `query(String sqlQuery, QuerySettings settings)` and `query(String sqlQuery, Map<String, Object> queryParams, QuerySettings settings)` - base methods for most query requests. This pair of method accept raw SQL query and return response object with raw input stream of bytes. This method should be used for big queries because access to raw byte stream allows to use most performant way of reading data. Main benefits of these methods - they allows to streamline reading of data and avoid allocating a lot of memory.
- `queryAll(String sqlQuery, QuerySettings settings)` and `queryAll(String sqlQuery, Map<String, Object> params, QuerySettings settings)` - methods are designed to simplify fetching small amount of data in way of iterable collection or records. These methods should be used only to fetch small number of rows. These methods read result from a server fully. It may because significant peak of memory usage especially in high concurrent applications.
- `queryAll(String sqlQuery, Class<T> clazz, TableSchema schema, Supplier<T> allocator)` is for reading result set directly to a plain java objects (DTOs). Method is suitable for any size of a result. Method uses precompiled serializers to minimize operations overhead. Method doesn't hold connection after reading.

Data can be fetched in any output format that ClickHouse support. Client will try to use RowBinaryWithNamesAndTypes by default because this format has metadata definition at the header lines and it more compact for network transfer.
As we mentioned in "Writing Data" section, there is client builder method `setClientNetworkBufferSize(int size)` that works for read in the same ways as for writes.

## Configuration {#configuration}

All settings are defined by instance methods (a.k.a configuration methods) that make the scope and context of each value clear.
Expand Down Expand Up @@ -352,6 +371,128 @@ try (InsertResponse response = client.insert(TABLE_NAME, events).get()) {
}
```

### insert(String tableName, DataStreamWriter writer, ClickHouseFormat format, InsertSettings settings)
**Beta**

This API method allows to pass a writer object that will encode data directly into an output stream. Data will be compressed by the client.
There is a configuration option in `InsertSettings` called `appCompressedData` that allows to turn off client compression and let application to send compressed stream.
Examples shows major usecases this API was designed for.

`com.clickhouse.client.api.DataStreamWriter` is a functional interface with a method `onOutput` that is called by the client when output stream is ready for data to be written. This interface has
another method `onRetry` with default implementation. This method is called when retry logic is triggered and mainly used to reset data source if applicable.


**Signatures**
```java
CompletableFuture<InsertResponse> insert(String tableName, // name of destination table
DataStreamWriter writer, // data writer instance
ClickHouseFormat format, // data format in which the writer encodes data
InsertSettings settings) // operation settings
```

**Parameters**

`tableName` - name of the target table.

`writer` - data writer instance.

`format` - data format in which the writer encodes data.

`settings` - request settings.

**Return value**

Future of `InsertResponse` type - the result of the operation and additional information like server side metrics.

**Examples**

Writing a collection of JSON objects encoded as string values using `JSONEachRow` format:
```java showLineNumbers

final int EXECUTE_CMD_TIMEOUT = 10; // seconds
final String tableName = "events";
final String tableCreate = "CREATE TABLE \"" + tableName + "\" " +
" (name String, " +
" v1 Float32, " +
" v2 Float32, " +
" attrs Nullable(String), " +
" corrected_time DateTime('UTC') DEFAULT now()," +
" special_attr Nullable(Int8) DEFAULT -1)" +
" Engine = MergeTree ORDER by ()";

client.execute("DROP TABLE IF EXISTS " + tableName).get(EXECUTE_CMD_TIMEOUT, TimeUnit.SECONDS);
client.execute(createTableSQL).get(EXECUTE_CMD_TIMEOUT, TimeUnit.SECONDS);

String correctedTime = Instant.now().atZone(ZoneId.of("UTC")).format(DataTypeUtils.DATETIME_FORMATTER);
String[] rows = new String[] {
"{ \"name\": \"foo1\", \"v1\": 0.3, \"v2\": 0.6, \"attrs\": \"a=1,b=2,c=5\", \"corrected_time\": \"" + correctedTime + "\", \"special_attr\": 10}",
"{ \"name\": \"foo1\", \"v1\": 0.3, \"v2\": 0.6, \"attrs\": \"a=1,b=2,c=5\", \"corrected_time\": \"" + correctedTime + "\"}",
"{ \"name\": \"foo1\", \"v1\": 0.3, \"v2\": 0.6, \"attrs\": \"a=1,b=2,c=5\" }",
"{ \"name\": \"foo1\", \"v1\": 0.3, \"v2\": 0.6 }",
};


try (InsertResponse response = client.insert(tableName, out -> {
// writing raw bytes
for (String row : rows) {
out.write(row.getBytes());
}

}, ClickHouseFormat.JSONEachRow, new InsertSettings()).get()) {

System.out.println("Rows written: " + response.getWrittenRows());
}

```

Writing already compressed data:
```java showLineNumbers
String tableName = "very_long_table_name_with_uuid_" + UUID.randomUUID().toString().replace('-', '_');
String tableCreate = "CREATE TABLE \"" + tableName + "\" " +
" (name String, " +
" v1 Float32, " +
" v2 Float32, " +
" attrs Nullable(String), " +
" corrected_time DateTime('UTC') DEFAULT now()," +
" special_attr Nullable(Int8) DEFAULT -1)" +
" Engine = MergeTree ORDER by ()";

client.execute("DROP TABLE IF EXISTS " + tableName).get(EXECUTE_CMD_TIMEOUT, TimeUnit.SECONDS);
client.execute(createTableSQL).get(EXECUTE_CMD_TIMEOUT, TimeUnit.SECONDS);

String correctedTime = Instant.now().atZone(ZoneId.of("UTC")).format(DataTypeUtils.DATETIME_FORMATTER);
String[] data = new String[] {
"{ \"name\": \"foo1\", \"v1\": 0.3, \"v2\": 0.6, \"attrs\": \"a=1,b=2,c=5\", \"corrected_time\": \"" + correctedTime + "\", \"special_attr\": 10}",
"{ \"name\": \"foo1\", \"v1\": 0.3, \"v2\": 0.6, \"attrs\": \"a=1,b=2,c=5\", \"corrected_time\": \"" + correctedTime + "\"}",
"{ \"name\": \"foo1\", \"v1\": 0.3, \"v2\": 0.6, \"attrs\": \"a=1,b=2,c=5\" }",
"{ \"name\": \"foo1\", \"v1\": 0.3, \"v2\": 0.6 }",
};


// This step is only for showcase. Real application would have already compressed data.
byte[][] compressedData = new byte[data.length][];
for (int i = 0 ; i < data.length; i++) {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
GZIPOutputStream gz = new GZIPOutputStream(baos);
gz.write(data[i].getBytes(StandardCharsets.UTF_8));
gz.finish();
compressedData[i] = baos.toByteArray();
}

InsertSettings insertSettings = new InsertSettings()
.appCompressedData(true, "gzip"); // defining compression algorithm (sent via HTTP headers)

try (InsertResponse response = client.insert(tableName, out -> {
// Writing data
for (byte[] row : compressedData) {
out.write(row);
}
}, ClickHouseFormat.JSONEachRow, insertSettings).get()) {
System.out.println("Rows written: " + response.getWrittenRows());
}

```

### InsertSettings {#insertsettings}

Configuration options for insert operations.
Expand Down
Loading