Skip to content

Conversation

@serramatutu
Copy link
Contributor

@serramatutu serramatutu commented Oct 21, 2025

Motivation

The Type metadata key has two limitations which stems from BigQuery's API:

  1. it says fields of type ARRAY<T> are just T with Repeated=true
  2. it says STRUCT<...> fields are simply RECORD, and erases any information about the inner fields.

These limitations can cause problems when trying to parse the Type key or when using it verbatim against the warehouse in a statement, e.g a CREATE TABLE statement or a AS T cast.

Summary

This PR adds a new BIGQUERY:type key that formats the original SQL string as specified by BigQuery.

Most types remain unchanged as they come from gobigquery, and in those cases this key will contain the same value as Type.

However, arrays and structs get transformed to match the richer type string.

Testing

I ran a CREATE TABLE AS query against BigQuery. Here's the result for fields of different types

[1] Regular non-nested types are simply copied over from the value of Type

1 image

[2] An array of integers becomes ARRAY<INTEGER>, while Type remains INTEGER

2 image

[3] An array of structs becomes ARRAY<STRUCT<...>>

3 image

[4] A struct of arrays' inner types are ARRAY<...>

4 image

[5] A deeply nested struct also has the correct inner types

5 image

Related issues

The `Type` metadata key has two limitations which stems from BigQuery's
API:
1. it says fields of type `ARRAY<T>` are just `T` with `Repeated=true`
2. it says `STRUCT<...>` fields are simply `RECORD`, and erases any
   information about the inner fields.

These limitations can cause problems when trying to parse the `Type` key
or when using it verbating against the warehouse in a statement, e.g a
`CREATE TABLE` statement or a `AS T` cast.

This PR adds a new `BIGQUERY:type` key that formats the original SQL string
as specified by BigQuery.

Most types remain unchanged as they come from `gobigquery`, and in those
cases this key will contain the same value as `Type`.

However, arrays and structs get transformed to match the richer type
string.
@github-actions github-actions bot added this to the ADBC Libraries 21 milestone Oct 21, 2025
@serramatutu serramatutu changed the title feat(adbc/go/driver/bigquery): add BIGQUERY:type field metadata feat(go/adbc/driver/bigquery): add BIGQUERY:type field metadata Oct 21, 2025
@serramatutu
Copy link
Contributor Author

Hmmm seeing some Python failures in CI, not sure how they're related?

metadata["Repeated"] = strconv.FormatBool(schema.Repeated)
metadata["Required"] = strconv.FormatBool(schema.Required)
field.Nullable = !schema.Required
metadata["Type"] = string(schema.Type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to keep this? Should we rename it to something like BIGQUERY:simple_type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering the same thing. I thought of keeping it as-is to avoid breaking changes to end users. But if the project maintainers are fine with a breaking change I can do it!

Copy link
Member

@lidavidm lidavidm Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally prefer we namespace all the properties now that we want to introduce this convention. Possibly we can keep the existing properties under their current name with a deprecation notice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lidavidm I pushed a new commit that does that. I called it BIGQUERY:raw_type since it's the "raw" unprocessed thing coming directly from the API. I think this is a bit more descriptive than simple_type.

Copy link
Contributor Author

@serramatutu serramatutu Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this should be split across two PRs though. IMO these should be two separate changelog entries: one for standardizing the keys under BIGQUERY:... and another for adding the new rich type key.

If that's the case I can merge the last commit first in a separate PR, then rebase this one on top of that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lidavidm I think the convention in the BigQuery driver is that all the fields in the JSON response are copies as-is into the Arrow field metadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a law of physics, though. We can change things.

If you want to defer this to a separate PR, that's fine by me. But I think they should be consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serramatutu can you move the last commit to a separate PR? So we can merge this one with just the BIGQUERY:type addition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just removed the last commit from this branch. I have it on a separate branch and I can open a followup PR after this one to standardize all keys.

@serramatutu serramatutu force-pushed the serramatutu/upstream/bigquery-rich-type-string branch from c87220a to bb7440d Compare October 27, 2025 16:08
@felipecrv
Copy link
Contributor

Merging because the failing checks are Meson+PG and CMake specific.

@felipecrv felipecrv merged commit 6a82d7b into apache:main Oct 30, 2025
77 of 90 checks passed
@felipecrv felipecrv deleted the serramatutu/upstream/bigquery-rich-type-string branch October 30, 2025 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants