Skip to content

Commit 6a82d7b

Browse files
authored
feat(go/adbc/driver/bigquery): add BIGQUERY:type field metadata (#3604)
## Motivation The `Type` metadata key has two limitations which stems from BigQuery's API: 1. it says fields of type `ARRAY<T>` are just `T` with `Repeated=true` 2. it says `STRUCT<...>` fields are simply `RECORD`, and erases any information about the inner fields. These limitations can cause problems when trying to parse the `Type` key or when using it verbatim against the warehouse in a statement, e.g a `CREATE TABLE` statement or a `AS T` cast. ## Summary This PR adds a new `BIGQUERY:type` key that formats the original SQL string as [specified by BigQuery](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#declaring_a_struct_type). Most types remain unchanged as they come from `gobigquery`, and in those cases this key will contain the same value as `Type`. However, arrays and structs get transformed to match the richer type string. ## Testing I ran a `CREATE TABLE AS` query against BigQuery. Here's the result for fields of different types [1] Regular non-nested types are simply copied over from the value of `Type` <details> <summary>1</summary> <img width="331" height="1071" alt="image" src="https://github.com/user-attachments/assets/ccd2ce17-37d8-4630-bef5-a503ed450c2a" /> </details> [2] An array of integers becomes `ARRAY<INTEGER>`, while `Type` remains `INTEGER` <details> <summary>2</summary> <img width="319" height="369" alt="image" src="https://github.com/user-attachments/assets/e588d7ac-c7ca-40fb-ab51-9795e566d240" /> </details> [3] An array of structs becomes `ARRAY<STRUCT<...>>` <details> <summary>3</summary> <img width="551" height="816" alt="image" src="https://github.com/user-attachments/assets/bb946ebc-747a-4529-88a8-68636f94e44e" /> </details> [4] A struct of arrays' inner types are `ARRAY<...>` <details> <summary>4</summary> <img width="610" height="922" alt="image" src="https://github.com/user-attachments/assets/932a3554-ea56-4b1f-8642-801ee91c4f63" /> </details> [5] A deeply nested struct also has the correct inner types <details> <summary>5</summary> <img width="1327" height="1307" alt="image" src="https://github.com/user-attachments/assets/3185651b-8809-42b0-adc4-ec956eaf9e87" /> </details> ## Related issues - #3449
1 parent f6f8e49 commit 6a82d7b

File tree

1 file changed

+13
-0
lines changed

1 file changed

+13
-0
lines changed

go/adbc/driver/bigquery/connection.go

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -795,6 +795,8 @@ func buildField(schema *bigquery.FieldSchema, level uint) (arrow.Field, error) {
795795
field.Nullable = !schema.Required
796796
metadata["Type"] = string(schema.Type)
797797

798+
richSqlType := string(schema.Type)
799+
798800
if schema.PolicyTags != nil {
799801
policyTagList, err := json.Marshal(schema.PolicyTags)
800802
if err != nil {
@@ -823,14 +825,21 @@ func buildField(schema *bigquery.FieldSchema, level uint) (arrow.Field, error) {
823825
case bigquery.RecordFieldType:
824826
// create an Arrow struct for BigQuery Record fields
825827
nestedFields := make([]arrow.Field, len(schema.Schema))
828+
nestedRichSqlTypes := make([]string, len(schema.Schema))
826829
for i, nestedFieldSchema := range schema.Schema {
827830
f, err := buildField(nestedFieldSchema, level+1)
828831
if err != nil {
829832
return arrow.Field{}, err
830833
}
831834
nestedFields[i] = f
835+
836+
fieldRichSqlType, found := f.Metadata.GetValue("BIGQUERY:type")
837+
if found {
838+
nestedRichSqlTypes[i] = fmt.Sprintf("`%s` %s", f.Name, fieldRichSqlType)
839+
}
832840
}
833841
structType := arrow.StructOf(nestedFields...)
842+
richSqlType = fmt.Sprintf("STRUCT<%s>", strings.Join(nestedRichSqlTypes, ", "))
834843
if structType == nil {
835844
return arrow.Field{}, adbc.Error{
836845
Code: adbc.StatusInvalidArgument,
@@ -888,8 +897,12 @@ func buildField(schema *bigquery.FieldSchema, level uint) (arrow.Field, error) {
888897
// if the field is repeated, then it's a list of the type we just built
889898
if schema.Repeated {
890899
field.Type = arrow.ListOf(field.Type)
900+
richSqlType = fmt.Sprintf("ARRAY<%s>", richSqlType)
891901
}
892902

903+
// derive the standard type string from the field
904+
metadata["BIGQUERY:type"] = richSqlType
905+
893906
if level == 0 {
894907
metadata["DefaultValueExpression"] = schema.DefaultValueExpression
895908
}

0 commit comments

Comments
 (0)