Skip to content

docs: Document pattern for using non-exposed stream definitions as parent streams #866

@devin-ai-integration

Description

@devin-ai-integration

Summary

When building complex connectors with multi-level substream hierarchies, it's useful to define stream definitions that are only used internally as parent streams for other streams, without exposing them as top-level streams. This pattern is currently undocumented but is actively used in production connectors.

Problem

The current YAML Reference documentation explains that only entries in the top-level streams: array are exposed as runnable streams, but it doesn't explicitly document the pattern of:

  1. Defining a full stream definition in definitions that is NOT listed in streams:
  2. Using that definition solely as a parent_stream_config for another stream
  3. The naming convention some connectors use (e.g., __ prefix) to signal "internal helper"

This pattern is particularly useful for 3-level nested substream hierarchies where an intermediate stream is needed to provide partition keys but shouldn't be exposed to users.

Example Implementation: Jira Connector

The Jira connector uses this pattern extensively. Here are code permalinks:

Internal/Private Stream Definitions (in definitions, NOT in streams:)

How It's Used (3-level hierarchy example)

The issue_properties_stream references the internal __issue_property_keys_substream as its parent:

issue_properties_stream:
  # ...
  retriever:
    # ...
    partition_router:
      type: SubstreamPartitionRouter
      parent_stream_configs:
        - type: ParentStreamConfig
          stream: "#/definitions/__issue_property_keys_substream"  # <-- Internal stream reference

This creates a 3-level hierarchy:

  1. issues_stream (grandparent - exposed)
  2. __issue_property_keys_substream (parent - internal, NOT exposed)
  3. issue_properties_stream (child - exposed)

Top-level streams: Section

The streams section only lists the streams that should be exposed to users - the __-prefixed definitions are intentionally omitted.

Suggested Documentation

Add a section to the YAML Reference or a new "Advanced Patterns" page that documents:

  1. Pattern: Using stream definitions as internal parent streams
  2. Use case: Multi-level substream hierarchies where intermediate streams shouldn't be exposed
  3. Naming convention: The __ prefix convention (optional but recommended for clarity)
  4. Behavior: Streams not listed in streams: will not be exposed by source.streams(config) - attempting to sync them will silently no-op
  5. Testing implications: When writing mock server tests, always verify stream names against the streams: section to avoid testing non-existent streams

Context

This issue was discovered while creating comprehensive mock server tests for the Jira connector (airbytehq/airbyte#70884). The pattern caused confusion when attempting to test issue_property_keys as a stream, only to discover it's an internal-only definition.


Requested by: AJ Steers (@aaronsteers)
Related PR: airbytehq/airbyte#70884
Devin session: https://app.devin.ai/sessions/f152f435f9d146688e476611ff864c30

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions