Skip to content

BigQueryWriteClient/JsonStreamWriter: missing values for duplicated, nested, proto-incompatible fields #2575

@pondzix

Description

@pondzix

Environment details

  • OS type and version: macos Ventura 13.6
  • Java version: 11
  • version(s): 3.7.0, 2.47.0

Steps to reproduce

  1. Create table in BigQuery:
CREATE TABLE test_dataset.test_table ( 
  parent1 STRUCT< `1nested` STRING >,
  parent2 STRUCT< `1nested` STRING >
)

So 2 parent STRUCT fields, both with a single field named 1nested. 1 at the beginning is important here, which makes it proto incompatible based on this class

  1. Write data to the table using JsonStreamWriter. Here is the gist with sample scala-cli script

The problem is: input contains values for both parent1.1nested and parent2.1nested but in BQ only the first one has correct value. The second one is null. Looks like it's lost somewhere:

Screenshot 2024-07-23 at 09 32 49

Code example

From the gist ☝️:

//> using dep com.google.cloud:google-cloud-bigquerystorage:3.7.0
//> using dep com.google.cloud:google-cloud-bigquery:2.41.0

import com.google.cloud.bigquery.TableId
import com.google.cloud.bigquery.storage.v1.{BigQueryWriteClient, JsonStreamWriter}
import org.json.JSONArray
import scala.jdk.CollectionConverters._

/**
CREATE TABLE
  test_dataset.test_table ( 
    parent1 STRUCT< `1nested` STRING >,
    parent2 STRUCT< `1nested` STRING >
  )
*/ 

val input = Map(
  "parent1" -> Map("1nested" -> "value").asJava, // "value" in BQ
  "parent2" -> Map("1nested" -> "value").asJava, // null in BQ
)

write(input)

def write(input: Map[String, AnyRef]): Unit = {
  val client = BigQueryWriteClient.create()
  val streamId = TableId.of("...project..", "...dataset..", "..table..").getIAMResourceName + "/streams/_default"
  val writer = JsonStreamWriter
    .newBuilder(streamId, client)
    .build

  writer.append(new JSONArray(List(input.asJava).asJava)).get
  writer.close()
  client.close()
  ()
}

Any following attempt of writing data to parent1.1nested is successful, data is not null. Any following attempt of writing data to parent2.1nested always results in null in BQ.

In general, first proto-incompatible field "wins" and any other proto-incompatible field with the same name, living as a nested fields somewhere else, "lose".

Metadata

Metadata

Assignees

No one assigned

    Labels

    api: bigquerystorageIssues related to the googleapis/java-bigquerystorage API.priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions