-
Notifications
You must be signed in to change notification settings - Fork 32
Description
Summary
When building complex connectors with multi-level substream hierarchies, it's useful to define stream definitions that are only used internally as parent streams for other streams, without exposing them as top-level streams. This pattern is currently undocumented but is actively used in production connectors.
Problem
The current YAML Reference documentation explains that only entries in the top-level streams: array are exposed as runnable streams, but it doesn't explicitly document the pattern of:
- Defining a full stream definition in
definitionsthat is NOT listed instreams: - Using that definition solely as a
parent_stream_configfor another stream - The naming convention some connectors use (e.g.,
__prefix) to signal "internal helper"
This pattern is particularly useful for 3-level nested substream hierarchies where an intermediate stream is needed to provide partition keys but shouldn't be exposed to users.
Example Implementation: Jira Connector
The Jira connector uses this pattern extensively. Here are code permalinks:
Internal/Private Stream Definitions (in definitions, NOT in streams:)
__issue_property_keys_substream- Used as parent forissue_properties_stream__custom_issue_fields_substream- Used as parent forissue_custom_field_contexts__issue_custom_field_contexts_substream- Used as parent forissue_custom_field_options__boards_substream- Used as parent for board-related streams__story_points_issue_fields_substream- Used for story points configuration
How It's Used (3-level hierarchy example)
The issue_properties_stream references the internal __issue_property_keys_substream as its parent:
issue_properties_stream:
# ...
retriever:
# ...
partition_router:
type: SubstreamPartitionRouter
parent_stream_configs:
- type: ParentStreamConfig
stream: "#/definitions/__issue_property_keys_substream" # <-- Internal stream referenceThis creates a 3-level hierarchy:
issues_stream(grandparent - exposed)__issue_property_keys_substream(parent - internal, NOT exposed)issue_properties_stream(child - exposed)
Top-level streams: Section
The streams section only lists the streams that should be exposed to users - the __-prefixed definitions are intentionally omitted.
Suggested Documentation
Add a section to the YAML Reference or a new "Advanced Patterns" page that documents:
- Pattern: Using stream definitions as internal parent streams
- Use case: Multi-level substream hierarchies where intermediate streams shouldn't be exposed
- Naming convention: The
__prefix convention (optional but recommended for clarity) - Behavior: Streams not listed in
streams:will not be exposed bysource.streams(config)- attempting to sync them will silently no-op - Testing implications: When writing mock server tests, always verify stream names against the
streams:section to avoid testing non-existent streams
Context
This issue was discovered while creating comprehensive mock server tests for the Jira connector (airbytehq/airbyte#70884). The pattern caused confusion when attempting to test issue_property_keys as a stream, only to discover it's an internal-only definition.
Requested by: AJ Steers (@aaronsteers)
Related PR: airbytehq/airbyte#70884
Devin session: https://app.devin.ai/sessions/f152f435f9d146688e476611ff864c30