Skip to content

feat: Add the ability to request a schema from a statement #1514

@paleolimbot

Description

@paleolimbot

There are some situations (e.g., #1513) where the mapping of a database type to an Arrow type is not canonical. SQLite is an example of an end-member where all mappings of a database result are approximate (and not necessarily stable between queries).

When I rewrote the typing part of the PostgreSQL driver, I intentionally separated the "guess Arrow type from Postgres type" and "convert Postgres data to Arrow data" components. Given an Arrow type, it's reasonably straightforward to write the conversion from a Postgres type. The hard (and imprecise) part is the guessing.

Instead of providing a possibly ever-accumulating pile of options along the lines of "adbc.postgresql.statement.numeric_as_double" = "true", I wonder if we could add AdbcStatementRequestSchema(struct AdbcStatement*, struct ArrowSchema*). Often the query author knows this information (or is using a SQL generation tool that already knows what column types to expect). In more dynamic wrappers, one could inspect AdbcStatementExecuteSchema() and look for specific types. This model fits nicely with how the Python __arrow_c_stream__(requested_schema=xxxx) protocol is parameterized as well.

I'm not sure whether the request should be best-effort or error-if-cannot-be-satisfied (or whether the caller should be able to choose). But without the ability to pass an ArrowSchema*, it's very difficult to work around this: you could provide an IPC-serialized schema to AdbcStatementSetOptionBytes().

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions