Skip to content

Implement BEP-009 streaming primitives#3215

Open
antoniosarosi wants to merge 12 commits intocanaryfrom
bep009-streaming-implementation
Open

Implement BEP-009 streaming primitives#3215
antoniosarosi wants to merge 12 commits intocanaryfrom
bep009-streaming-implementation

Conversation

@antoniosarosi
Copy link
Contributor

@antoniosarosi antoniosarosi commented Mar 6, 2026

Summary

  • Implements BEP-009 phases 1-8: decomposes LLM streaming into three composable primitives (SSE connection, batched event retrieval, provider-aware accumulator) wired together by Baml-level orchestration sharing retry/fallback/round-robin logic with non-streaming calls
  • Adds incremental W3C SSE parser (sse_parser.rs), provider-aware delta extraction for OpenAI and Anthropic (stream_accumulator.rs), and new resource types (SseStream, StreamAccumulator)
  • New Baml orchestration functions: stream_primitive, execute_client_stream, stream_llm_function
  • Python bridge: new CallContext pyclass bundles tracing, collectors, cancellation, and streaming callbacks
  • Refactors llm.baml to use null guard + early return instead of match on nullable retry field, leveraging type narrowing

New sys ops

Sys Op Purpose
fetch_sse Opens SSE connection, spawns background consumer
SseStream.next/close Batched event retrieval / cleanup
PrimitiveClient.new_stream_accumulator Creates provider-aware accumulator
StreamAccumulator.add_events/content/is_done Accumulator operations
PrimitiveClient.build_request_stream Adds "stream": true to request body
PrimitiveClient.partial_parse String-only partial parsing (SAP deferred)
emit_partial/emit_tick Stream callback dispatch with deduplication

Bug found: continue not supported as catch arm expression

While refactoring llm.baml to eliminate the ExecutionResult { ok, value } wrapper using throw/catch, we discovered that continue (and likely break) cannot be used inside catch arms:

// This fails with: [parse] Error: Expected expression, found continue
let value = some_call() catch (e) {
    _ => continue
};

The intended pattern was to catch errors in retry/fallback loops and skip to the next iteration. Since catch arms require an expression and continue is parsed as a statement (not an expression), this pattern is rejected at parse time.

Workaround: ExecutionResult { ok: bool, value: unknown } is kept for internal retry/fallback orchestration where failures are expected and handled. The top-level functions (call_llm_function, stream_llm_function) use throw at the boundary.

Suggested fix: Treat continue, break, return, and throw uniformly as diverging expressions valid in any expression position (they already work this way for type narrowing purposes).

CodeRabbit review fixes

  • SSE error sets buf.done = true — prevents caller hang when retrying after stream error (background task exited without marking stream done)
  • stream_primitive checks accumulator.is_done() — breaks out of loop early when provider signals completion ([DONE]/finish_reason), and returns ok: false if stream was truncated
  • new_accumulator rejects unsupported providers — returns error for google-ai, aws-bedrock, etc. instead of silently ignoring their events in extract_delta
  • Poisoned mutex recoveryemit_partial deduplication uses unwrap_or_else(PoisonError::into_inner) instead of panicking
  • Redundant event_type.clear() — removed after mem::take which already leaves empty string
  • futures/serde_json optional — gated behind bundle-http feature to reduce compile time without SSE
  • PerCallContext struct — replaces 4 loose parameters (call_id, cancel, stream_callback, tick_callback) in run_event_loop_with_epoch and execute_sys_op, removing the #[allow(clippy::too_many_arguments)]

Follow-up compiler fixes

  • Builtin method throw analysis for path-style method calls — multi-segment builtin method callees now persist resolved targets into TIR, so catch exhaustiveness sees declared throws for calls like primitive.partial_parse(...)
  • Builtin orchestration throws contracts alignedcall_llm_function and stream_llm_function now declare InvalidArgument, matching get_jinja_template and get_client
  • Diagnostics cleanup — removes the spurious Unreachable catch arm warning and E0096 InvalidArgument throws-contract errors that were being emitted from builtin llm.baml across diagnostics snapshots

Test plan

  • All existing workspace tests pass (553 tests, including LSP snapshot tests)
  • Clippy clean with --workspace --all-targets --all-features -- -D warnings
  • Insta snapshots updated for new Baml orchestration functions
  • Diagnostics snapshots refreshed after the builtin throw-analysis fix
  • Integration test with mock SSE server (follow-up)
  • End-to-end Python streaming test (follow-up)

Summary by CodeRabbit

  • New Features

    • Real-time SSE streaming for LLM calls with partial results, tick events, and a stream accumulator.
  • Improvements

    • Unified retry/backoff loop for LLM calls and more robust, provider-agnostic streaming parsing.
  • SDK / Python

    • Python runtime accepts a single per-call context that can include streaming callbacks (stream/tick).
  • Tests

    • New end-to-end and primitive tests covering SSE streaming, partials, monotonic growth, and error cases.

Decomposes LLM streaming into three composable primitives — generic SSE
connection, batched event retrieval, and provider-aware accumulator —
wired together by Baml-level orchestration that shares retry/fallback/
round-robin logic with non-streaming calls.

New crate modules:
- sys_native/src/sse_parser.rs: incremental W3C SSE parser
- sys_llm/src/stream_accumulator.rs: provider-aware delta extraction
  (OpenAI choices[0].delta.content, Anthropic content_block_delta)

New resource types: SseStream, StreamAccumulator

New sys ops: fetch_sse, sse_stream_next/close, new_stream_accumulator,
add_events, content, is_done, build_request_stream, partial_parse,
emit_partial, emit_tick

New Baml orchestration: stream_primitive, execute_client_stream,
execute_client_once_stream, stream_llm_function

Python bridge: CallContext pyclass bundles tracing, collectors,
cancellation, and streaming callbacks into a single context object
passed to call_function/call_function_sync.
@vercel
Copy link

vercel bot commented Mar 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
beps Ready Ready Preview, Comment Mar 6, 2026 8:15pm
promptfiddle Ready Ready Preview, Comment Mar 6, 2026 8:15pm

Request Review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds end-to-end SSE streaming for LLM calls: new streaming orchestration and runtime APIs, SSE HTTP ops and incremental parser, provider-aware stream accumulators, propagation of stream/tick callbacks through engine/VM/Python bridge, and sys-op/native plumbing for emitting partials and ticks.

Changes

Cohort / File(s) Summary
Builtins & orchestration
baml_language/crates/baml_builtins/baml/llm.baml, baml_language/crates/baml_builtins/src/lib.rs
Added streaming orchestration functions (stream_primitive, execute_client_stream, execute_client_once_stream, stream_llm_function) and LLM streaming primitive APIs/types (SseStream, StreamAccumulator, fetch_sse, build_request_stream, partial_parse, stream emit helpers).
Engine & per-call context
baml_language/crates/bex_engine/src/function_call_context.rs, baml_language/crates/bex_engine/src/lib.rs
Threaded stream_callback/tick_callback through FunctionCallContext/PerCallContext and engine execution paths; added last_emitted_partial dedupe state and switched cancellation to PerCallContext.
Python bridge
baml_language/crates/bridge_python/src/runtime.rs, baml_language/crates/bridge_python/src/lib.rs
Added Python CallContext binding and refactored runtime to accept it; build_call_context maps Python callbacks to FunctionCallContext with GIL-safe invocation.
Native SSE & HTTP ops
baml_language/crates/sys_native/src/lib.rs, baml_language/crates/sys_native/src/ops/http.rs, baml_language/crates/sys_native/src/sse_parser.rs, baml_language/crates/sys_native/Cargo.toml
Implemented incremental SSE parser, async SSE fetch with background parsing, sse_stream_next/close ops, feature-gated reqwest "stream" usage and dependency updates.
Registry & resources
baml_language/crates/sys_native/src/registry.rs, baml_language/crates/bex_resource_types/src/lib.rs, baml_language/crates/bex_external_types/src/bex_external_value.rs, baml_language/crates/bridge_ctypes/src/handle_table.rs
Added SseStream and StreamAccumulator registry entries, accessors, resource-display/external mappings, and handle-table coverage; replaced some PoisonError unwraps with safe extraction.
Stream accumulator & sys_llm
baml_language/crates/sys_llm/src/stream_accumulator.rs, baml_language/crates/sys_llm/src/lib.rs
New provider-aware stream accumulator implementation and APIs (new_accumulator, add_events, get_content, is_done) plus helpers to build stream HTTP requests and partial-parse responses.
Sys types & traits
baml_language/crates/sys_types/src/lib.rs
Extended SysOpContext with stream_callback, tick_callback, last_emitted_partial; added with_streaming() helper, SysOpStream wiring, and new LLM streaming sysop trait methods.
Tests & test harness
baml_language/crates/baml_tests/src/engine.rs, baml_language/crates/baml_tests/tests/streaming.rs
Added streaming test harness to capture partials/ticks and comprehensive SSE/streaming orchestration tests with mocked servers and cases for success, partial emission, monotonic growth, and server errors.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Caller
    participant Runtime as BamlRuntime
    participant Engine as BexEngine
    participant SysOps as SysOpLayer
    participant HTTP as Native HTTP SSE
    participant Acc as StreamAccumulator
    participant Callback as UserCallbacks

    Client->>Runtime: call_function(name,args, CallContext{stream_cb,tick_cb})
    Runtime->>Engine: call_function(FunctionCallContext{stream_callback,tick_callback})
    Engine->>SysOps: execute_sys_op(request with {"stream":true})
    SysOps->>HTTP: send_sse_async(request)
    HTTP->>HTTP: background task parses SSE bytes -> SseEvent[] 
    HTTP->>Acc: register new accumulator / add_events(events_json)
    loop per batch
        Acc->>Engine: partial_content
        Engine->>Callback: stream_callback(partial) (deduped via last_emitted_partial)
        Engine->>Callback: tick_callback(raw_events)
    end
    Acc->>Engine: is_done() -> true
    Engine->>Acc: get_content() -> final_value
    Engine-->>Runtime: ExecutionResult(final_value or error)
    Runtime-->>Client: return final value / throw on failure
Loading
sequenceDiagram
    participant Parser as SseParser
    participant Buffer as SseBuffer
    Parser->>Parser: new()
    loop incoming bytes
        Parser->>Parser: feed(chunk)
        Parser->>Parser: parse lines, aggregate fields
        alt blank line completes event
            Parser->>Buffer: emit SseEvent{event,data,id}
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

rust, feature (small)

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: implementing BEP-009 streaming primitives, which is the primary focus across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bep009-streaming-implementation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codspeed-hq
Copy link

codspeed-hq bot commented Mar 6, 2026

Merging this PR will degrade performance by 33.14%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

❌ 13 regressed benchmarks
✅ 2 untouched benchmarks
⏩ 91 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime bench_scale_100_functions 2 ms 2.4 ms -18.01%
WallTime bench_single_simple_file 996.8 µs 1,441.1 µs -30.83%
WallTime bench_incremental_modify_function 171.8 µs 220.7 µs -22.15%
WallTime bench_scale_deep_nesting 1.5 ms 1.9 ms -22.63%
WallTime bench_incremental_rename_type 1.2 ms 1.7 ms -25.64%
WallTime bench_incremental_add_string_char 985.3 µs 1,424.2 µs -30.82%
WallTime bench_incremental_add_user_field 1.1 ms 1.5 ms -28.57%
WallTime bench_empty_project 905.8 µs 1,354.8 µs -33.14%
WallTime bench_incremental_add_field 172.2 µs 220.2 µs -21.79%
WallTime bench_incremental_add_new_file 164.6 µs 217.3 µs -24.24%
WallTime bench_incremental_add_attribute 985.5 µs 1,424.2 µs -30.8%
WallTime bench_incremental_no_change 115.3 µs 164.7 µs -29.98%
WallTime bench_incremental_close_string 990.1 µs 1,427.6 µs -30.65%

Comparing bep009-streaming-implementation (34ca454) with canary (f32fa20)

Open in CodSpeed

Footnotes

  1. 91 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@github-actions
Copy link

github-actions bot commented Mar 6, 2026

Binary size checks passed

7 passed

Artifact Platform Gzip Baseline Delta Status
bridge_cffi Linux 4.4 MB 4.4 MB +29.0 KB (+0.7%) OK
bridge_cffi-stripped Linux 2.9 MB 2.9 MB +19.2 KB (+0.7%) OK
bridge_cffi macOS 3.6 MB 3.6 MB +22.0 KB (+0.6%) OK
bridge_cffi-stripped macOS 2.3 MB 2.3 MB +13.6 KB (+0.6%) OK
bridge_cffi Windows 3.6 MB 3.6 MB +24.4 KB (+0.7%) OK
bridge_cffi-stripped Windows 2.4 MB 2.4 MB +16.1 KB (+0.7%) OK
bridge_wasm WASM 2.2 MB 2.2 MB +13.4 KB (+0.6%) OK

Generated by cargo size-gate · workflow run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7c320720-0f3f-41ec-a595-a2ded4447dfd

📥 Commits

Reviewing files that changed from the base of the PR and between b55d814 and 7fcd5f3.

⛔ Files ignored due to path filters (9)
  • baml_language/Cargo.lock is excluded by !**/*.lock
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____01_lexer__llm.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____02_parser__llm.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded_unoptimized.snap is excluded by !**/*.snap
📒 Files selected for processing (18)
  • baml_language/crates/baml_builtins/baml/llm.baml
  • baml_language/crates/baml_builtins/src/lib.rs
  • baml_language/crates/bex_engine/src/function_call_context.rs
  • baml_language/crates/bex_engine/src/lib.rs
  • baml_language/crates/bex_external_types/src/bex_external_value.rs
  • baml_language/crates/bex_resource_types/src/lib.rs
  • baml_language/crates/bridge_ctypes/src/handle_table.rs
  • baml_language/crates/bridge_python/src/lib.rs
  • baml_language/crates/bridge_python/src/runtime.rs
  • baml_language/crates/sys_llm/Cargo.toml
  • baml_language/crates/sys_llm/src/lib.rs
  • baml_language/crates/sys_llm/src/stream_accumulator.rs
  • baml_language/crates/sys_native/Cargo.toml
  • baml_language/crates/sys_native/src/lib.rs
  • baml_language/crates/sys_native/src/ops/http.rs
  • baml_language/crates/sys_native/src/registry.rs
  • baml_language/crates/sys_native/src/sse_parser.rs
  • baml_language/crates/sys_types/src/lib.rs

…return

Use type narrowing (null guard + early return) to replace `match (llm_client.retry)`
patterns in build_plan_with_state, execute_client, and execute_client_stream.
This reduces nesting and reads more naturally.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4ded66d1-bd47-4276-a5d0-1780fc24e15f

📥 Commits

Reviewing files that changed from the base of the PR and between 7fcd5f3 and c2daed0.

⛔ Files ignored due to path filters (5)
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____01_lexer__llm.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____02_parser__llm.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snap is excluded by !**/*.snap
📒 Files selected for processing (1)
  • baml_language/crates/baml_builtins/baml/llm.baml

Review fixes:
- SSE error now sets buf.done=true to prevent caller hang on retry
- stream_primitive checks accumulator.is_done() for early exit and
  returns failure on truncated streams
- new_accumulator rejects unsupported providers (google-ai, aws-bedrock
  etc.) instead of silently ignoring their events
- Poisoned mutex recovery in emit_partial deduplication
- Remove redundant event_type.clear() after mem::take in SSE parser
- Make futures/serde_json optional behind bundle-http feature gate

Refactor:
- Introduce PerCallContext struct bundling call_id, cancel,
  stream_callback, tick_callback — replaces loose parameters in
  run_event_loop_with_epoch and execute_sys_op, removing the
  #[allow(clippy::too_many_arguments)] annotation
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
baml_language/crates/baml_builtins/baml/llm.baml (1)

321-355: ⚠️ Potential issue | 🟠 Major

Streaming failures still escape the retry/fallback envelope.

Any throw from fetch_sse()/next(), new_stream_accumulator(), add_events(), emit_tick(), emit_partial(), partial_parse(), or final parse() bypasses ExecutionResult { ok: false }, so execute_client_stream() never gets a chance to retry or move to the next fallback client. Once the SSE resource has been opened, that same path can also skip sse.close(). Please wrap this body in the same internal failure envelope used by the non-streaming orchestrator, and only throw from stream_llm_function() after all attempts fail.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d0998c1b-3d7a-4016-8f85-44dda632801e

📥 Commits

Reviewing files that changed from the base of the PR and between c2daed0 and 170f4cb.

⛔ Files ignored due to path filters (5)
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____01_lexer__llm.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____02_parser__llm.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snap is excluded by !**/*.snap
📒 Files selected for processing (8)
  • baml_language/crates/baml_builtins/baml/llm.baml
  • baml_language/crates/bex_engine/src/function_call_context.rs
  • baml_language/crates/bex_engine/src/lib.rs
  • baml_language/crates/sys_llm/src/stream_accumulator.rs
  • baml_language/crates/sys_native/Cargo.toml
  • baml_language/crates/sys_native/src/lib.rs
  • baml_language/crates/sys_native/src/ops/http.rs
  • baml_language/crates/sys_native/src/sse_parser.rs

… poisoning

- Replace serde_json unwrap with proper error in sse_stream_next
- Return LlmOpError instead of panic in execute_partial_parse for non-string types
- Remove early loop exit on accumulator.is_done() to fully drain SSE events
- Add SseDropGuard to prevent consumer hangs on task cancellation
- Extract token usage from OpenAI and Anthropic streaming events
- Use unwrap_or_else(PoisonError::into_inner) consistently for lock acquisition
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
baml_language/crates/sys_types/src/lib.rs (1)

997-1009: 🧹 Nitpick | 🔵 Trivial

Consider adding unit tests for the new streaming methods.

The new SysOpLlm streaming methods (lines 845-915) lack direct unit test coverage. While the existing tests validate the test context setup, consider adding tests that verify error propagation for the accumulator methods, potentially by mocking sys_llm::stream_accumulator functions.

As per coding guidelines: "Prefer writing Rust unit tests over integration tests where possible."

baml_language/crates/sys_native/src/registry.rs (1)

326-330: ⚠️ Potential issue | 🟡 Minor

Same inconsistency in ResourceRegistryRef::remove.

Line 328 uses .unwrap() instead of the .unwrap_or_else(std::sync::PoisonError::into_inner) pattern used elsewhere.

🛠️ Proposed fix
 impl ResourceRegistryRef for ResourceRegistry {
     fn remove(&self, key: usize) {
-        self.entries.write().unwrap().remove(&key);
+        self.entries.write().unwrap_or_else(std::sync::PoisonError::into_inner).remove(&key);
     }
 }
♻️ Duplicate comments (1)
baml_language/crates/baml_builtins/baml/llm.baml (1)

332-347: ⚠️ Potential issue | 🟠 Major

Break the SSE loop as soon as accumulator.is_done() flips true.

The post-loop check at Lines 345-347 avoids truncated success, but the loop still waits for sse.next() == null. If a provider emits its terminal event and keeps the socket open, this attempt blocks longer than necessary.

Suggested fix
     while (true) {
         let events = sse.next();
         if (events == null) { break; }
         accumulator.add_events(events);
         baml.stream.emit_tick(events);

         let content = accumulator.content();
         let parsed = primitive.partial_parse(content, return_type);
         baml.stream.emit_partial(parsed);
+
+        if (accumulator.is_done()) {
+            break;
+        }
     }

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4376c445-0dc6-40d7-b1d1-b8883a2dd324

📥 Commits

Reviewing files that changed from the base of the PR and between 170f4cb and fbffefd.

⛔ Files ignored due to path filters (5)
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____01_lexer__llm.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____02_parser__llm.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snap is excluded by !**/*.snap
📒 Files selected for processing (6)
  • baml_language/crates/baml_builtins/baml/llm.baml
  • baml_language/crates/sys_llm/src/lib.rs
  • baml_language/crates/sys_llm/src/stream_accumulator.rs
  • baml_language/crates/sys_native/src/ops/http.rs
  • baml_language/crates/sys_native/src/registry.rs
  • baml_language/crates/sys_types/src/lib.rs

… final value

Use partial_parse instead of parse for the final value in stream_primitive,
since the accumulator provides raw extracted content rather than the
provider's JSON envelope. Add 8 streaming integration tests covering SSE
primitives and full OpenAI streaming orchestration with wiremock.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
baml_language/crates/baml_builtins/baml/llm.baml (4)

332-347: ⚠️ Potential issue | 🟠 Major

Stop the loop on logical completion, not only on socket EOF.

This still waits for sse.next() == null before leaving the loop. If the accumulator reaches done before the provider closes the connection, streaming can hang until transport EOF instead of finishing immediately.

Suggested fix
     while (true) {
         let events = sse.next();
         if (events == null) { break; }
         accumulator.add_events(events);
         baml.stream.emit_tick(events);

         let content = accumulator.content();
         let parsed = primitive.partial_parse(content, return_type);
         baml.stream.emit_partial(parsed);
+
+        if (accumulator.is_done()) {
+            break;
+        }
     }

     sse.close();

     if (accumulator.is_done() == false) {

328-343: ⚠️ Potential issue | 🟠 Major

Guarantee sse.close() on exceptional exits too.

close() only runs on the happy path. A throw from sse.next(), add_events, emit_tick, or partial_parse exits this function before cleanup, leaving the SSE resource alive across retries or fallback attempts.


330-352: ⚠️ Potential issue | 🟠 Major

Fail fast for return types that partial_parse() cannot handle.

This path now uses partial_parse(...) for both incremental updates and the final value, but the streaming contract in this PR limits partial parsing to string outputs. Any streamed function with a non-string return type will fail after the SSE request has already started.

Either validate return_type before opening the stream, or keep streaming disabled for unsupported return types. Based on learnings: New language features for BAML require coordinated updates across Parser (parser-database), IR/validation (baml-core), Compiler (baml-compiler), and VM (baml-vm)


444-450: ⚠️ Potential issue | 🔴 Critical

Guard empty round-robin streaming clients before % length().

This branch still divides by zero when sub_clients is empty. build_attempt_with_state() already treats that shape as empty on Lines 89-102; the executor should return ok: false here instead of throwing.

Suggested fix
         baml.llm.ClientType.RoundRobin => {
-            let idx = baml.llm.round_robin_next(llm_client.name) % llm_client.sub_clients.length();
-            baml.llm.execute_client_stream(
-                llm_client.sub_clients.at(idx),
-                context,
-                active_delay_ms,
-            )
+            if (llm_client.sub_clients.length() == 0) {
+                baml.llm.ExecutionResult { ok: false, value: null }
+            } else {
+                let idx = baml.llm.round_robin_next(llm_client.name) % llm_client.sub_clients.length();
+                baml.llm.execute_client_stream(
+                    llm_client.sub_clients.at(idx),
+                    context,
+                    active_delay_ms,
+                )
+            }
         }

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cebcc780-d15b-4e84-83df-b7fb230b63c8

📥 Commits

Reviewing files that changed from the base of the PR and between fbffefd and a997436.

⛔ Files ignored due to path filters (5)
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____01_lexer__llm.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____02_parser__llm.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snap is excluded by !**/*.snap
📒 Files selected for processing (3)
  • baml_language/crates/baml_builtins/baml/llm.baml
  • baml_language/crates/baml_tests/src/engine.rs
  • baml_language/crates/baml_tests/tests/streaming.rs

- Wrap partial_parse in catch so mid-stream parse failures are silently
  skipped, matching legacy behavior for incomplete structured content.
- Add accumulator metadata accessors: model(), finish_reason(),
  input_tokens(), output_tokens() as new sys ops.
- Add final_parse sys op on PrimitiveClient for strict parsing of the
  final accumulated content (separate from permissive partial_parse).
- Update stream_primitive to use final_parse for the completed value.
@antoniosarosi antoniosarosi added this pull request to the merge queue Mar 10, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant