Skip to content

sql_select: Validate ClickHouse data types #4192

@benwebber

Description

@benwebber

Internally, Bloblang numbers are JSON numbers. Connect passes them to the ClickHouse driver as strings by default. When using the ClickHouse driver, you nearly always need to explicitly cast integers to sized types using methods like int64().

We encountered a footgun related to this. One of our pipelines queries ClickHouse to see if a record exists like so:

SELECT id FROM tbl WHERE id = ?

Because we forgot to cast the id Bloblang value to an int64(), ClickHouse actually executed:

SELECT id FROM tbl WHERE id = '12345'

This type mismatch causes ClickHouse to scan the table instead of using the primary key.


The sql_select processor could support a validate_column_types field to validate incoming args_mapping values match the target data types.

sql_select already accepts a table name and column names. Connect could inspect the table when it starts, or when the processor is used for the first time, to determine the column data types. With validate_column_types: true, sql_select would validate incoming data against the detected data types.

The validate_column_types field could also control whether Connect inspects the target table in the first place.

Initially, this field would only affect the ClickHouse output.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions