-
Notifications
You must be signed in to change notification settings - Fork 165
Description
There are some situations (e.g., #1513) where the mapping of a database type to an Arrow type is not canonical. SQLite is an example of an end-member where all mappings of a database result are approximate (and not necessarily stable between queries).
When I rewrote the typing part of the PostgreSQL driver, I intentionally separated the "guess Arrow type from Postgres type" and "convert Postgres data to Arrow data" components. Given an Arrow type, it's reasonably straightforward to write the conversion from a Postgres type. The hard (and imprecise) part is the guessing.
Instead of providing a possibly ever-accumulating pile of options along the lines of "adbc.postgresql.statement.numeric_as_double" = "true", I wonder if we could add AdbcStatementRequestSchema(struct AdbcStatement*, struct ArrowSchema*). Often the query author knows this information (or is using a SQL generation tool that already knows what column types to expect). In more dynamic wrappers, one could inspect AdbcStatementExecuteSchema() and look for specific types. This model fits nicely with how the Python __arrow_c_stream__(requested_schema=xxxx) protocol is parameterized as well.
I'm not sure whether the request should be best-effort or error-if-cannot-be-satisfied (or whether the caller should be able to choose). But without the ability to pass an ArrowSchema*, it's very difficult to work around this: you could provide an IPC-serialized schema to AdbcStatementSetOptionBytes().