Skip to content

doris: add stream load output#4218

Open
xylaaaaa wants to merge 1 commit intoredpanda-data:mainfrom
xylaaaaa:chenjunwei/doris-stream-load-sink
Open

doris: add stream load output#4218
xylaaaaa wants to merge 1 commit intoredpanda-data:mainfrom
xylaaaaa:chenjunwei/doris-stream-load-sink

Conversation

@xylaaaaa
Copy link
Copy Markdown

@xylaaaaa xylaaaaa commented Apr 7, 2026

Summary

  • add a new doris_stream_load output
  • wire the component into community/default builds
  • support JSON and CSV payloads, FE -> BE 307 redirect handling, Expect: 100-continue, query-port based connection tests, and FE failover via fe_urls
  • promote common Doris Stream Load options into first-class config fields
  • add unit coverage for encoding, redirect/failover behavior, promoted header mapping, and connection tests

Validation

  • go test ./internal/impl/doris
  • go build -o target/redpanda-connect ./cmd/redpanda-connect

Notes

  • local demo configs and local docs were intentionally left out of this PR

Copilot AI review requested due to automatic review settings April 7, 2026 11:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new Apache Doris Stream Load output (doris_stream_load) and wires it into the community component bundle so it’s available in standard builds.

Changes:

  • Adds a new doris_stream_load batch output implementation with FE→BE redirect handling, JSON/CSV encoding, and FE failover support.
  • Registers the Doris component in public/components and the plugin catalog.
  • Adds unit tests covering encoding, redirects/failover, header mapping, and connection tests.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
public/components/doris/package.go Adds the public component package that side-effect imports the internal Doris implementation.
public/components/community/package.go Wires the new Doris public component into the community build import set.
internal/plugins/info.csv Registers doris_stream_load in the plugin inventory.
internal/impl/doris/output_stream_load.go Implements the Doris Stream Load output, config parsing/spec, request building, redirect/failover logic, and connection tests.
internal/impl/doris/output_stream_load_test.go Adds unit tests for encoding, redirects, failover, connection tests, and promoted header behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

BackoffMaxRetries: 3,
Retry: httpclient.DefaultRetryConfig(),
MetricPrefix: "doris_stream_load_http",
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTTP client uses httpclient.DefaultTransportConfig(), which sets ExpectContinueTimeout=1s. Since requests always set Expect: 100-continue when a body is present, servers that don’t send a 100 response (common) will cause up to ~1s of extra latency before the body is transmitted. Consider explicitly setting cfg.Transport.ExpectContinueTimeout to 0 (send immediately) or a much smaller value tuned for Doris, while still keeping the Expect header required by Doris.

Suggested change
}
}
cfg.Transport.ExpectContinueTimeout = 0

Copilot uses AI. Check for mistakes.
Comment on lines +583 to +596
resp, err := d.client.Do(req)
if err != nil {
lastErr = fmt.Errorf("connecting to Doris FE %s: %w", feURL, err)
continue
}
resp.Body.Close()
if d.conf.QueryPort > 0 {
if err := d.connectionCheck(ctx, feURL, d.conf.QueryPort); err != nil {
lastErr = err
continue
}
}
return service.ConnectionTestSucceeded().AsList()
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConnectionTest currently treats any HTTP response as success (it doesn’t check resp.StatusCode). This can incorrectly report success for cases like 401/403 (bad credentials) or 404, even though the output won’t actually work. Consider requiring a 2xx (or at least <400) status code and returning ConnectionTestFailed when the FE responds with an error status (optionally including the status/body in the error).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants