Skip to content

BigQuery: storage write API connector#4220

Open
squiidz wants to merge 4 commits intomainfrom
bq-write-api
Open

BigQuery: storage write API connector#4220
squiidz wants to merge 4 commits intomainfrom
bq-write-api

Conversation

@squiidz
Copy link
Copy Markdown
Contributor

@squiidz squiidz commented Apr 7, 2026

No description provided.

New enterprise-only output that streams data into BigQuery using the
Storage Write API. Supports JSON and Protobuf message formats, default
stream type, connection multiplexing via managed stream cache, and
IAM/credential-based authentication.
  Add basic operational metrics (rows sent/failed, batches, latency,
  retries), classify gRPC errors as transient or permanent for smarter
  retry behavior, and support service account impersonation via
  target_principal and delegates config fields.
Comment on lines +174 to +182
const (
// streamIdleTimeout is how long a cached stream can remain unused before
// being eligible for eviction by the idle sweep.
streamIdleTimeout = 5 * time.Minute

// streamSweepInterval is how often the background goroutine checks for
// idle streams.
streamSweepInterval = 1 * time.Minute
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded time durations must be YAML-configurable. Per project Go patterns: "Every time-related value (timeouts, backoffs, intervals, retry delays) must be exposed as a YAML-configurable field. Do not hardcode durations."

streamIdleTimeout and streamSweepInterval should be added as config fields (e.g., stream_idle_timeout and stream_sweep_interval) with these values as defaults, following the same pattern as other duration fields in the codebase.

  Add basic operational metrics (rows sent/failed, batches, latency,
  retries). Classify gRPC errors as transient or permanent for smarter
  retry behavior. Support service account impersonation via
  target_principal and delegates config fields. Make stream idle timeout
  and sweep interval YAML-configurable. Regenerate docs.
@claude
Copy link
Copy Markdown

claude bot commented Apr 9, 2026

Commits

  1. Commit 2c077b5 has a leading space in the headline: bq: add metrics, error classification, and SA impersonation — should be bq: add metrics, error classification, and SA impersonation (no leading space).
  2. Commits 2c077b5 and 8e3ff83 have near-identical headlines and describe overlapping work (metrics, error classification, SA impersonation). Commit 8e3ff83 extends 2c077b5 with additional changes (configurable stream idle timeout/sweep interval, docs regeneration). These should be squashed into a single commit since they are not distinct self-contained logical changes.

Review
New enterprise BigQuery Storage Write API output with solid implementation: proper component registration, field name constants, config spec, ParsedConfig extraction with named returns, error wrapping with gerund form, context propagation, mutex-protected concurrency, idempotent Close with ordered lock acquisition, and stream cache with idle sweep. Unit and integration tests cover config parsing, JSON-to-proto conversion, gRPC error classification, edge cases, and end-to-end flow with a BigQuery emulator container.

LGTM

@redpanda-data redpanda-data deleted a comment from claude bot Apr 9, 2026
@redpanda-data redpanda-data deleted a comment from claude bot Apr 9, 2026
@redpanda-data redpanda-data deleted a comment from claude bot Apr 9, 2026
@redpanda-data redpanda-data deleted a comment from claude bot Apr 9, 2026
@redpanda-data redpanda-data deleted a comment from claude bot Apr 9, 2026
@redpanda-data redpanda-data deleted a comment from claude bot Apr 9, 2026
@mmatczuk
Copy link
Copy Markdown
Contributor

mmatczuk commented Apr 9, 2026

Top 2 commits can be squashed I think

@mmatczuk
Copy link
Copy Markdown
Contributor

mmatczuk commented Apr 9, 2026

@mmatczuk
Copy link
Copy Markdown
Contributor

mmatczuk commented Apr 9, 2026

Would be nice to redo test comments into given/when/then logs

@mmatczuk
Copy link
Copy Markdown
Contributor

mmatczuk commented Apr 9, 2026

Can we have a constructor for tests instead of

&bigQueryWriteAPIOutput{
		conf: bigQueryWriteAPIConfig{
			ProjectID: "my-project",
			DatasetID: "my_dataset",
		},

Comment on lines +61 to +63
if err = license.CheckRunningEnterprise(mgr); err != nil {
return
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check license in constructor not here.

Comment on lines +64 to +69
if maxInFlight, err = conf.FieldMaxInFlight(); err != nil {
return
}
if batchPolicy, err = conf.FieldBatchPolicy(bqwaFieldBatching); err != nil {
return
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try preserving order where it makes sense


When batching is enabled the table name is resolved from the first message in
each batch; all messages in the same batch are written to that table.
`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's the best for maintenance to use sentence per line

@mmatczuk
Copy link
Copy Markdown
Contributor

mmatczuk commented Apr 9, 2026

It's handy to support credentials_json

  Move license check from init() to constructor.
  Reorder config fields logically (core fields first, advanced last).
  Use sentence-per-line in config descriptions for easier diffs.
  Add newTestOutput helper for tests.
  Rewrite test comments as given/when/then.
  Inject test license via license.InjectTestService.
  Regenerate docs.

// When we write a batch.
batch := service.MessageBatch{service.NewMessage([]byte(`{"foo":"bar"}`))}
err := out.WriteBatch(context.Background(), batch)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test pattern violation: use t.Context() instead of context.Background() in test functions.

Per the project test patterns:

Use t.Context() for test contexts. Exception: in t.Cleanup() functions, use context.Background() because t.Context() is already canceled during cleanup.

These are non-cleanup test functions, so t.Context() should be used. This also applies to TestWriteBatchEmptyBatch (line 187) and TestCloseNilClients (line 203).

@claude
Copy link
Copy Markdown

claude bot commented Apr 9, 2026

Commits

  1. Commit 2c077b5 ( bq: add metrics, error classification, and SA impersonation) has a leading space before bq: — format violation.

Review
Well-structured new enterprise BigQuery Storage Write API output with good test coverage, proper license headers, correct bundle registration, and idiomatic error handling. One minor test pattern violation found.

  1. Unit tests TestWriteBatchNotConnected, TestWriteBatchEmptyBatch, and TestCloseNilClients use context.Background() instead of t.Context() in non-cleanup test code — project test pattern violation. (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants