feat(c/include/arrow-adbc): Add AdbcStatementRequestSchema #3623

paleolimbot · 2025-10-27T15:34:10Z

This PR proposes a 1.2 spec addition AdbcStatementRequestSchema(). There are a few decisions here and while I put specific language in I think any version of these is still helpful:

I wrote this up as this best-effort (like the PyCapsule requested_schema). I think this is better for negotiation (e.g., the caller calls execute schema, requests a schema with all strings/binary/list columns as views, and the driver produces string/binary views but not list views because it doesn't implement those). A strict version is also useful and might generate better errors.
Maybe struct ArrowSchema* should be const ArrowSchema*. I'm not sure this matters (mostly the C struct will be populated specifically for this call and they are easy to copy).
No opinions on RequestSchema as the name if there's a better one

The capability to do this is built in to the Postgres driver already for some types (the COPY reader was designed to always accept an arbitrary ArrowSchema, which we currently just infer before constructing the reader).

Other 1.2 spec additions in #3607 (which also contains the infrastructure for a new ADBC minor version).

Closes #1514 .

lidavidm · 2025-10-27T23:45:12Z

Thanks for this!

I wonder if we also want to somehow allow things like "read 'field_name' as string view, but leave other fields alone" or "read all 'database_type' as string view, but leave other fields alone"

paleolimbot · 2025-10-28T03:29:44Z

I wonder if we also want to somehow allow things like "read 'field_name' as string view, but leave other fields alone" or "read all 'database_type' as string view, but leave other fields alone"

Sure, or some version of an option get/set with of a list of supported or preferred types. Happy to take out the negotiation bit if that's confusing (the main purpose here is to help the driver make better choices for loose or row-based typing).

amoeba · 2025-11-11T17:31:12Z

This looks pretty good. On one hand, it's not the most ergonomic way to handle this but it's the most regular with the existing API. Am I right that the most convenient way to use this API may be to first call AdbcStatementExecuteSchema (and hope it's implemented), then modify the returned schema to my desire, call AdbcStatementRequestSchema with the modified schema, and then execute my statement?

Another question, should it be spelled out that the schema can be partial or not?

paleolimbot · 2025-11-11T18:30:01Z

I think the reason for adding this would be for a user who executed and collected a query but got an error along the lines of "can't append string to column of type int64", which is what would happen today if someone issued a query with the sqllite driver and first few thousand values were null.

The second reason is for people who already know the arrow type they want, perhaps because they're using a tool like ibis that already knows or because they executed the query and the driver guessed something they didn't like or that was inconsistent with a previous query against the same data.

I should probably take out the negotiation bit since it seems to be distracting from the intent...it's orthogonal to options, which would only affect the default guessing (but still have no way to specifically specify type parameters or specific problematic columns).

(These are all roughly the same reason why a requested schema is provided for a csv reader.)

zeroshade · 2025-11-11T21:35:55Z

two questions:

Is the expectation that the driver itself would cast/convert the types as they come back from the source?
Since we're also working on and discussing handling multiple result sets, how would this work/handle the case of multiple result sets? Would we need an entirely new function once we add support for multiple result sets to handle this for that case? Is there any way we can adapt this function so that it can work in both cases somehow instead of needing to create an entirely new function afterwards?

lidavidm · 2025-11-12T00:31:10Z

Wouldn't you just call this in between NextResultSetSchema and NextResultSet? We'd just generalize NextResultSetSchema.

zeroshade · 2025-11-12T04:33:04Z

Mostly I'm just trying to cut down the number of extra functions to create. If the idea is that we wouldn't need to create a new function to handle multiple result sets, then I'm okay with that. And that just leaves my first question

paleolimbot · 2025-11-12T13:16:12Z

Is the expectation that the driver itself would cast/convert the types as they come back from the source?

Yes, it's intended as guidance where a driver would otherwise have to guess. I wrote it up here as best-effort (same as PyCapsule requested_schema)...the other choice would be that the driver is required to produce that schema. I don't happen think that's particularly useful (whatever schema is produced by the output stream is always the stream that should be used to interpret the data), but for the immediate use cases (SQLite and Postgres drivers) it would also not be a problem.

lidavidm · 2025-11-20T00:57:25Z

Well, maybe we want to cover cases like explicitly choosing between string/large_string/string_view (in principle the driver would have picked one, but maybe the user wants one of the other types and this can avoid a later conversion). Or is the thought that we need to add driver-specific flags for all of that? @CurtHagenlocher had commented before that with JDBC/ODBC, you get to pick the type you want when you get the data - that's obviously too much flexibility for an Arrow stream, but I don't see why we should limit this to only cases where it is ambiguous to the driver. (In the first place, the underlying data isn't Arrow much of the time; even if it is, ideally the request would be pushed back to the backend.)

The driver could certainly ignore requests to cast, of course, especially if it's actually just going to be a cast internally.

paleolimbot · 2025-11-20T02:26:18Z

I can add that bit back in too...whatever is less confusing! We can also experiment with schema negotiation afterward (I don't think it affects the behaviour of the function but I could be missing something). I think options are orthogonal...options affect the guessing of the schema in the absence of a request and we could have those, too.

request schema

91e8ccb

paleolimbot mentioned this pull request Oct 27, 2025

feat: async and multi-result set APIs WIP #3607

Open

add more language

9c8c7fb

lidavidm requested review from amoeba and zeroshade November 11, 2025 01:16

remove negotiation garbage

b405cba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(c/include/arrow-adbc): Add AdbcStatementRequestSchema #3623

feat(c/include/arrow-adbc): Add AdbcStatementRequestSchema #3623

Uh oh!

paleolimbot commented Oct 27, 2025

Uh oh!

lidavidm commented Oct 27, 2025

Uh oh!

paleolimbot commented Oct 28, 2025

Uh oh!

amoeba commented Nov 11, 2025

Uh oh!

paleolimbot commented Nov 11, 2025

Uh oh!

zeroshade commented Nov 11, 2025

Uh oh!

lidavidm commented Nov 12, 2025

Uh oh!

zeroshade commented Nov 12, 2025

Uh oh!

paleolimbot commented Nov 12, 2025

Uh oh!

lidavidm commented Nov 20, 2025

Uh oh!

paleolimbot commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat(c/include/arrow-adbc): Add AdbcStatementRequestSchema #3623

Are you sure you want to change the base?

feat(c/include/arrow-adbc): Add AdbcStatementRequestSchema #3623

Uh oh!

Conversation

paleolimbot commented Oct 27, 2025

Uh oh!

lidavidm commented Oct 27, 2025

Uh oh!

paleolimbot commented Oct 28, 2025

Uh oh!

amoeba commented Nov 11, 2025

Uh oh!

paleolimbot commented Nov 11, 2025

Uh oh!

zeroshade commented Nov 11, 2025

Uh oh!

lidavidm commented Nov 12, 2025

Uh oh!

zeroshade commented Nov 12, 2025

Uh oh!

paleolimbot commented Nov 12, 2025

Uh oh!

lidavidm commented Nov 20, 2025

Uh oh!

paleolimbot commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants