Implement BEP-009 streaming primitives by antoniosarosi · Pull Request #3215 · BoundaryML/baml

antoniosarosi · 2026-03-06T14:04:58Z

Summary

Implements BEP-009 phases 1-8: decomposes LLM streaming into three composable primitives (SSE connection, batched event retrieval, provider-aware accumulator) wired together by Baml-level orchestration sharing retry/fallback/round-robin logic with non-streaming calls
Adds incremental W3C SSE parser (sse_parser.rs), provider-aware delta extraction for OpenAI and Anthropic (stream_accumulator.rs), and new resource types (SseStream, StreamAccumulator)
New Baml orchestration functions: stream_primitive, execute_client_stream, stream_llm_function
Python bridge: new CallContext pyclass bundles tracing, collectors, cancellation, and streaming callbacks
Refactors llm.baml to use null guard + early return instead of match on nullable retry field, leveraging type narrowing

New sys ops

Sys Op	Purpose
`fetch_sse`	Opens SSE connection, spawns background consumer
`SseStream.next/close`	Batched event retrieval / cleanup
`PrimitiveClient.new_stream_accumulator`	Creates provider-aware accumulator
`StreamAccumulator.add_events/content/is_done`	Accumulator operations
`PrimitiveClient.build_request_stream`	Adds `"stream": true` to request body
`PrimitiveClient.partial_parse`	String-only partial parsing (SAP deferred)
`emit_partial/emit_tick`	Stream callback dispatch with deduplication

Bug found: `continue` not supported as catch arm expression

While refactoring llm.baml to eliminate the ExecutionResult { ok, value } wrapper using throw/catch, we discovered that continue (and likely break) cannot be used inside catch arms:

// This fails with: [parse] Error: Expected expression, found continue
let value = some_call() catch (e) {
    _ => continue
};

The intended pattern was to catch errors in retry/fallback loops and skip to the next iteration. Since catch arms require an expression and continue is parsed as a statement (not an expression), this pattern is rejected at parse time.

Workaround: ExecutionResult { ok: bool, value: unknown } is kept for internal retry/fallback orchestration where failures are expected and handled. The top-level functions (call_llm_function, stream_llm_function) use throw at the boundary.

Suggested fix: Treat continue, break, return, and throw uniformly as diverging expressions valid in any expression position (they already work this way for type narrowing purposes).

CodeRabbit review fixes

SSE error sets buf.done = true — prevents caller hang when retrying after stream error (background task exited without marking stream done)
stream_primitive checks accumulator.is_done() — breaks out of loop early when provider signals completion ([DONE]/finish_reason), and returns ok: false if stream was truncated
new_accumulator rejects unsupported providers — returns error for google-ai, aws-bedrock, etc. instead of silently ignoring their events in extract_delta
Poisoned mutex recovery — emit_partial deduplication uses unwrap_or_else(PoisonError::into_inner) instead of panicking
Redundant event_type.clear() — removed after mem::take which already leaves empty string
futures/serde_json optional — gated behind bundle-http feature to reduce compile time without SSE
PerCallContext struct — replaces 4 loose parameters (call_id, cancel, stream_callback, tick_callback) in run_event_loop_with_epoch and execute_sys_op, removing the #[allow(clippy::too_many_arguments)]

Follow-up compiler fixes

Builtin method throw analysis for path-style method calls — multi-segment builtin method callees now persist resolved targets into TIR, so catch exhaustiveness sees declared throws for calls like primitive.partial_parse(...)
Builtin orchestration throws contracts aligned — call_llm_function and stream_llm_function now declare InvalidArgument, matching get_jinja_template and get_client
Diagnostics cleanup — removes the spurious Unreachable catch arm warning and E0096 InvalidArgument throws-contract errors that were being emitted from builtin llm.baml across diagnostics snapshots

Test plan

All existing workspace tests pass (553 tests, including LSP snapshot tests)
Clippy clean with --workspace --all-targets --all-features -- -D warnings
Insta snapshots updated for new Baml orchestration functions
Diagnostics snapshots refreshed after the builtin throw-analysis fix
Integration test with mock SSE server (follow-up)
End-to-end Python streaming test (follow-up)

Summary by CodeRabbit

New Features
- Real-time SSE streaming for LLM calls with partial results, tick events, and a stream accumulator.
Improvements
- Unified retry/backoff loop for LLM calls and more robust, provider-agnostic streaming parsing.
SDK / Python
- Python runtime accepts a single per-call context that can include streaming callbacks (stream/tick).
Tests
- New end-to-end and primitive tests covering SSE streaming, partials, monotonic growth, and error cases.

Decomposes LLM streaming into three composable primitives — generic SSE connection, batched event retrieval, and provider-aware accumulator — wired together by Baml-level orchestration that shares retry/fallback/ round-robin logic with non-streaming calls. New crate modules: - sys_native/src/sse_parser.rs: incremental W3C SSE parser - sys_llm/src/stream_accumulator.rs: provider-aware delta extraction (OpenAI choices[0].delta.content, Anthropic content_block_delta) New resource types: SseStream, StreamAccumulator New sys ops: fetch_sse, sse_stream_next/close, new_stream_accumulator, add_events, content, is_done, build_request_stream, partial_parse, emit_partial, emit_tick New Baml orchestration: stream_primitive, execute_client_stream, execute_client_once_stream, stream_llm_function Python bridge: CallContext pyclass bundles tracing, collectors, cancellation, and streaming callbacks into a single context object passed to call_function/call_function_sync.

vercel · 2026-03-06T14:05:04Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
beps	Ready	Preview, Comment	Mar 6, 2026 8:15pm
promptfiddle	Ready	Preview, Comment	Mar 6, 2026 8:15pm

coderabbitai · 2026-03-06T14:05:24Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds end-to-end SSE streaming for LLM calls: new streaming orchestration and runtime APIs, SSE HTTP ops and incremental parser, provider-aware stream accumulators, propagation of stream/tick callbacks through engine/VM/Python bridge, and sys-op/native plumbing for emitting partials and ticks.

Changes

Cohort / File(s)	Summary
Builtins & orchestration `baml_language/crates/baml_builtins/baml/llm.baml`, `baml_language/crates/baml_builtins/src/lib.rs`	Added streaming orchestration functions (`stream_primitive`, `execute_client_stream`, `execute_client_once_stream`, `stream_llm_function`) and LLM streaming primitive APIs/types (`SseStream`, `StreamAccumulator`, `fetch_sse`, `build_request_stream`, `partial_parse`, stream emit helpers).
Engine & per-call context `baml_language/crates/bex_engine/src/function_call_context.rs`, `baml_language/crates/bex_engine/src/lib.rs`	Threaded `stream_callback`/`tick_callback` through FunctionCallContext/PerCallContext and engine execution paths; added `last_emitted_partial` dedupe state and switched cancellation to PerCallContext.
Python bridge `baml_language/crates/bridge_python/src/runtime.rs`, `baml_language/crates/bridge_python/src/lib.rs`	Added Python `CallContext` binding and refactored runtime to accept it; `build_call_context` maps Python callbacks to FunctionCallContext with GIL-safe invocation.
Native SSE & HTTP ops `baml_language/crates/sys_native/src/lib.rs`, `baml_language/crates/sys_native/src/ops/http.rs`, `baml_language/crates/sys_native/src/sse_parser.rs`, `baml_language/crates/sys_native/Cargo.toml`	Implemented incremental SSE parser, async SSE fetch with background parsing, sse_stream_next/close ops, feature-gated reqwest "stream" usage and dependency updates.
Registry & resources `baml_language/crates/sys_native/src/registry.rs`, `baml_language/crates/bex_resource_types/src/lib.rs`, `baml_language/crates/bex_external_types/src/bex_external_value.rs`, `baml_language/crates/bridge_ctypes/src/handle_table.rs`	Added `SseStream` and `StreamAccumulator` registry entries, accessors, resource-display/external mappings, and handle-table coverage; replaced some PoisonError unwraps with safe extraction.
Stream accumulator & sys_llm `baml_language/crates/sys_llm/src/stream_accumulator.rs`, `baml_language/crates/sys_llm/src/lib.rs`	New provider-aware stream accumulator implementation and APIs (`new_accumulator`, `add_events`, `get_content`, `is_done`) plus helpers to build stream HTTP requests and partial-parse responses.
Sys types & traits `baml_language/crates/sys_types/src/lib.rs`	Extended `SysOpContext` with `stream_callback`, `tick_callback`, `last_emitted_partial`; added `with_streaming()` helper, SysOpStream wiring, and new LLM streaming sysop trait methods.
Tests & test harness `baml_language/crates/baml_tests/src/engine.rs`, `baml_language/crates/baml_tests/tests/streaming.rs`	Added streaming test harness to capture partials/ticks and comprehensive SSE/streaming orchestration tests with mocked servers and cases for success, partial emission, monotonic growth, and server errors.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Caller
    participant Runtime as BamlRuntime
    participant Engine as BexEngine
    participant SysOps as SysOpLayer
    participant HTTP as Native HTTP SSE
    participant Acc as StreamAccumulator
    participant Callback as UserCallbacks

    Client->>Runtime: call_function(name,args, CallContext{stream_cb,tick_cb})
    Runtime->>Engine: call_function(FunctionCallContext{stream_callback,tick_callback})
    Engine->>SysOps: execute_sys_op(request with {"stream":true})
    SysOps->>HTTP: send_sse_async(request)
    HTTP->>HTTP: background task parses SSE bytes -> SseEvent[] 
    HTTP->>Acc: register new accumulator / add_events(events_json)
    loop per batch
        Acc->>Engine: partial_content
        Engine->>Callback: stream_callback(partial) (deduped via last_emitted_partial)
        Engine->>Callback: tick_callback(raw_events)
    end
    Acc->>Engine: is_done() -> true
    Engine->>Acc: get_content() -> final_value
    Engine-->>Runtime: ExecutionResult(final_value or error)
    Runtime-->>Client: return final value / throw on failure

sequenceDiagram
    participant Parser as SseParser
    participant Buffer as SseBuffer
    Parser->>Parser: new()
    loop incoming bytes
        Parser->>Parser: feed(chunk)
        Parser->>Parser: parse lines, aggregate fields
        alt blank line completes event
            Parser->>Buffer: emit SseEvent{event,data,id}
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat(cffi): handle table for opaque values, wire serialization for Media/PromptAst #3169: Adjusts FFI handle mappings and resource-handle usage for new streaming resource types (SseStream/StreamAccumulator).
Ergonomic sys_op traits with generic SysOpOutput<T> and auto-generated glue #3088: Overlaps on LLM primitive/sys-op surface changes and builtin LLM APIs integrated with streaming.
LLM orchestration port #3121: Prior LLM orchestration/retry refactor that this PR extends with streaming variants.

Suggested labels

rust, feature (small)

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: implementing BEP-009 streaming primitives, which is the primary focus across all modified files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bep009-streaming-implementation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codspeed-hq · 2026-03-06T14:09:31Z

Merging this PR will degrade performance by 33.14%

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

❌ 13 regressed benchmarks
✅ 2 untouched benchmarks
⏩ 91 skipped benchmarks¹

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	WallTime	`bench_scale_100_functions`	2 ms	2.4 ms	-18.01%
❌	WallTime	`bench_single_simple_file`	996.8 µs	1,441.1 µs	-30.83%
❌	WallTime	`bench_incremental_modify_function`	171.8 µs	220.7 µs	-22.15%
❌	WallTime	`bench_scale_deep_nesting`	1.5 ms	1.9 ms	-22.63%
❌	WallTime	`bench_incremental_rename_type`	1.2 ms	1.7 ms	-25.64%
❌	WallTime	`bench_incremental_add_string_char`	985.3 µs	1,424.2 µs	-30.82%
❌	WallTime	`bench_incremental_add_user_field`	1.1 ms	1.5 ms	-28.57%
❌	WallTime	`bench_empty_project`	905.8 µs	1,354.8 µs	-33.14%
❌	WallTime	`bench_incremental_add_field`	172.2 µs	220.2 µs	-21.79%
❌	WallTime	`bench_incremental_add_new_file`	164.6 µs	217.3 µs	-24.24%
❌	WallTime	`bench_incremental_add_attribute`	985.5 µs	1,424.2 µs	-30.8%
❌	WallTime	`bench_incremental_no_change`	115.3 µs	164.7 µs	-29.98%
❌	WallTime	`bench_incremental_close_string`	990.1 µs	1,427.6 µs	-30.65%

_{Comparing bep009-streaming-implementation (34ca454) with canary (f32fa20)}

91 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

github-actions · 2026-03-06T14:11:35Z

Binary size checks passed

✅ 7 passed

	Artifact	Platform	Gzip	Baseline	Delta	Status
✅	`bridge_cffi`	Linux	4.4 MB	4.4 MB	+29.0 KB (+0.7%)	OK
✅	`bridge_cffi-stripped`	Linux	2.9 MB	2.9 MB	+19.2 KB (+0.7%)	OK
✅	`bridge_cffi`	macOS	3.6 MB	3.6 MB	+22.0 KB (+0.6%)	OK
✅	`bridge_cffi-stripped`	macOS	2.3 MB	2.3 MB	+13.6 KB (+0.6%)	OK
✅	`bridge_cffi`	Windows	3.6 MB	3.6 MB	+24.4 KB (+0.7%)	OK
✅	`bridge_cffi-stripped`	Windows	2.4 MB	2.4 MB	+16.1 KB (+0.7%)	OK
✅	`bridge_wasm`	WASM	2.2 MB	2.2 MB	+13.4 KB (+0.6%)	OK

Generated by cargo size-gate · workflow run

coderabbitai

Actionable comments posted: 14

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7c320720-0f3f-41ec-a595-a2ded4447dfd

📥 Commits

Reviewing files that changed from the base of the PR and between b55d814 and 7fcd5f3.

⛔ Files ignored due to path filters (9)

baml_language/Cargo.lock is excluded by !**/*.lock
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____01_lexer__llm.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____02_parser__llm.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____03_hir.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_5_mir.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_tir.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded_unoptimized.snap is excluded by !**/*.snap

📒 Files selected for processing (18)

baml_language/crates/baml_builtins/baml/llm.baml
baml_language/crates/baml_builtins/src/lib.rs
baml_language/crates/bex_engine/src/function_call_context.rs
baml_language/crates/bex_engine/src/lib.rs
baml_language/crates/bex_external_types/src/bex_external_value.rs
baml_language/crates/bex_resource_types/src/lib.rs
baml_language/crates/bridge_ctypes/src/handle_table.rs
baml_language/crates/bridge_python/src/lib.rs
baml_language/crates/bridge_python/src/runtime.rs
baml_language/crates/sys_llm/Cargo.toml
baml_language/crates/sys_llm/src/lib.rs
baml_language/crates/sys_llm/src/stream_accumulator.rs
baml_language/crates/sys_native/Cargo.toml
baml_language/crates/sys_native/src/lib.rs
baml_language/crates/sys_native/src/ops/http.rs
baml_language/crates/sys_native/src/registry.rs
baml_language/crates/sys_native/src/sse_parser.rs
baml_language/crates/sys_types/src/lib.rs

baml_language/crates/baml_builtins/baml/llm.baml

baml_language/crates/bridge_ctypes/src/handle_table.rs

baml_language/crates/bridge_python/src/runtime.rs

baml_language/crates/sys_native/Cargo.toml

baml_language/crates/sys_native/src/lib.rs

baml_language/crates/sys_native/src/ops/http.rs

baml_language/crates/sys_native/src/sse_parser.rs

baml_language/crates/sys_types/src/lib.rs

…return Use type narrowing (null guard + early return) to replace `match (llm_client.retry)` patterns in build_plan_with_state, execute_client, and execute_client_stream. This reduces nesting and reads more naturally.

coderabbitai

Actionable comments posted: 3

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4ded66d1-bd47-4276-a5d0-1780fc24e15f

📥 Commits

Reviewing files that changed from the base of the PR and between 7fcd5f3 and c2daed0.

⛔ Files ignored due to path filters (5)

baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____01_lexer__llm.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____02_parser__llm.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____03_hir.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_5_mir.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snap is excluded by !**/*.snap

📒 Files selected for processing (1)

baml_language/crates/baml_builtins/baml/llm.baml

baml_language/crates/baml_builtins/baml/llm.baml

Review fixes: - SSE error now sets buf.done=true to prevent caller hang on retry - stream_primitive checks accumulator.is_done() for early exit and returns failure on truncated streams - new_accumulator rejects unsupported providers (google-ai, aws-bedrock etc.) instead of silently ignoring their events - Poisoned mutex recovery in emit_partial deduplication - Remove redundant event_type.clear() after mem::take in SSE parser - Make futures/serde_json optional behind bundle-http feature gate Refactor: - Introduce PerCallContext struct bundling call_id, cancel, stream_callback, tick_callback — replaces loose parameters in run_event_loop_with_epoch and execute_sys_op, removing the #[allow(clippy::too_many_arguments)] annotation

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

baml_language/crates/baml_builtins/baml/llm.baml (1)

321-355: ⚠️ Potential issue | 🟠 Major

Streaming failures still escape the retry/fallback envelope.

Any throw from fetch_sse()/next(), new_stream_accumulator(), add_events(), emit_tick(), emit_partial(), partial_parse(), or final parse() bypasses ExecutionResult { ok: false }, so execute_client_stream() never gets a chance to retry or move to the next fallback client. Once the SSE resource has been opened, that same path can also skip sse.close(). Please wrap this body in the same internal failure envelope used by the non-streaming orchestrator, and only throw from stream_llm_function() after all attempts fail.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d0998c1b-3d7a-4016-8f85-44dda632801e

📥 Commits

Reviewing files that changed from the base of the PR and between c2daed0 and 170f4cb.

⛔ Files ignored due to path filters (5)

baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____01_lexer__llm.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____02_parser__llm.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____03_hir.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_5_mir.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snap is excluded by !**/*.snap

📒 Files selected for processing (8)

baml_language/crates/baml_builtins/baml/llm.baml
baml_language/crates/bex_engine/src/function_call_context.rs
baml_language/crates/bex_engine/src/lib.rs
baml_language/crates/sys_llm/src/stream_accumulator.rs
baml_language/crates/sys_native/Cargo.toml
baml_language/crates/sys_native/src/lib.rs
baml_language/crates/sys_native/src/ops/http.rs
baml_language/crates/sys_native/src/sse_parser.rs

baml_language/crates/bex_engine/src/lib.rs

baml_language/crates/sys_native/src/ops/http.rs

baml_language/crates/sys_native/src/sse_parser.rs

… poisoning - Replace serde_json unwrap with proper error in sse_stream_next - Return LlmOpError instead of panic in execute_partial_parse for non-string types - Remove early loop exit on accumulator.is_done() to fully drain SSE events - Add SseDropGuard to prevent consumer hangs on task cancellation - Extract token usage from OpenAI and Anthropic streaming events - Use unwrap_or_else(PoisonError::into_inner) consistently for lock acquisition

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

baml_language/crates/sys_types/src/lib.rs (1)

997-1009: 🧹 Nitpick | 🔵 Trivial

Consider adding unit tests for the new streaming methods.

The new SysOpLlm streaming methods (lines 845-915) lack direct unit test coverage. While the existing tests validate the test context setup, consider adding tests that verify error propagation for the accumulator methods, potentially by mocking sys_llm::stream_accumulator functions.

As per coding guidelines: "Prefer writing Rust unit tests over integration tests where possible."
baml_language/crates/sys_native/src/registry.rs (1)
326-330: ⚠️ Potential issue | 🟡 Minor

Same inconsistency in ResourceRegistryRef::remove.

Line 328 uses .unwrap() instead of the .unwrap_or_else(std::sync::PoisonError::into_inner) pattern used elsewhere.
🛠️ Proposed fix
 impl ResourceRegistryRef for ResourceRegistry {
     fn remove(&self, key: usize) {
-        self.entries.write().unwrap().remove(&key);
+        self.entries.write().unwrap_or_else(std::sync::PoisonError::into_inner).remove(&key);
     }
 }

♻️ Duplicate comments (1)

baml_language/crates/baml_builtins/baml/llm.baml (1)
332-347: ⚠️ Potential issue | 🟠 Major

Break the SSE loop as soon as accumulator.is_done() flips true.

The post-loop check at Lines 345-347 avoids truncated success, but the loop still waits for sse.next() == null. If a provider emits its terminal event and keeps the socket open, this attempt blocks longer than necessary.
Suggested fix
     while (true) {
         let events = sse.next();
         if (events == null) { break; }
         accumulator.add_events(events);
         baml.stream.emit_tick(events);

         let content = accumulator.content();
         let parsed = primitive.partial_parse(content, return_type);
         baml.stream.emit_partial(parsed);
+
+        if (accumulator.is_done()) {
+            break;
+        }
     }

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4376c445-0dc6-40d7-b1d1-b8883a2dd324

📥 Commits

Reviewing files that changed from the base of the PR and between 170f4cb and fbffefd.

⛔ Files ignored due to path filters (5)

baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____01_lexer__llm.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____02_parser__llm.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____03_hir.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_5_mir.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snap is excluded by !**/*.snap

📒 Files selected for processing (6)

baml_language/crates/baml_builtins/baml/llm.baml
baml_language/crates/sys_llm/src/lib.rs
baml_language/crates/sys_llm/src/stream_accumulator.rs
baml_language/crates/sys_native/src/ops/http.rs
baml_language/crates/sys_native/src/registry.rs
baml_language/crates/sys_types/src/lib.rs

baml_language/crates/baml_builtins/baml/llm.baml

baml_language/crates/sys_llm/src/lib.rs

baml_language/crates/sys_llm/src/stream_accumulator.rs

baml_language/crates/sys_native/src/registry.rs

… final value Use partial_parse instead of parse for the final value in stream_primitive, since the accumulator provides raw extracted content rather than the provider's JSON envelope. Add 8 streaming integration tests covering SSE primitives and full OpenAI streaming orchestration with wiremock.

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (4)

baml_language/crates/baml_builtins/baml/llm.baml (4)
332-347: ⚠️ Potential issue | 🟠 Major

Stop the loop on logical completion, not only on socket EOF.

This still waits for sse.next() == null before leaving the loop. If the accumulator reaches done before the provider closes the connection, streaming can hang until transport EOF instead of finishing immediately.
Suggested fix
     while (true) {
         let events = sse.next();
         if (events == null) { break; }
         accumulator.add_events(events);
         baml.stream.emit_tick(events);

         let content = accumulator.content();
         let parsed = primitive.partial_parse(content, return_type);
         baml.stream.emit_partial(parsed);
+
+        if (accumulator.is_done()) {
+            break;
+        }
     }

     sse.close();

     if (accumulator.is_done() == false) {
328-343: ⚠️ Potential issue | 🟠 Major

Guarantee sse.close() on exceptional exits too.

close() only runs on the happy path. A throw from sse.next(), add_events, emit_tick, or partial_parse exits this function before cleanup, leaving the SSE resource alive across retries or fallback attempts.

330-352: ⚠️ Potential issue | 🟠 Major

Fail fast for return types that partial_parse() cannot handle.

This path now uses partial_parse(...) for both incremental updates and the final value, but the streaming contract in this PR limits partial parsing to string outputs. Any streamed function with a non-string return type will fail after the SSE request has already started.

Either validate return_type before opening the stream, or keep streaming disabled for unsupported return types. Based on learnings: New language features for BAML require coordinated updates across Parser (parser-database), IR/validation (baml-core), Compiler (baml-compiler), and VM (baml-vm)

444-450: ⚠️ Potential issue | 🔴 Critical

Guard empty round-robin streaming clients before % length().

This branch still divides by zero when sub_clients is empty. build_attempt_with_state() already treats that shape as empty on Lines 89-102; the executor should return ok: false here instead of throwing.
Suggested fix
         baml.llm.ClientType.RoundRobin => {
-            let idx = baml.llm.round_robin_next(llm_client.name) % llm_client.sub_clients.length();
-            baml.llm.execute_client_stream(
-                llm_client.sub_clients.at(idx),
-                context,
-                active_delay_ms,
-            )
+            if (llm_client.sub_clients.length() == 0) {
+                baml.llm.ExecutionResult { ok: false, value: null }
+            } else {
+                let idx = baml.llm.round_robin_next(llm_client.name) % llm_client.sub_clients.length();
+                baml.llm.execute_client_stream(
+                    llm_client.sub_clients.at(idx),
+                    context,
+                    active_delay_ms,
+                )
+            }
         }

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cebcc780-d15b-4e84-83df-b7fb230b63c8

📥 Commits

Reviewing files that changed from the base of the PR and between fbffefd and a997436.

⛔ Files ignored due to path filters (5)

baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____01_lexer__llm.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____02_parser__llm.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____03_hir.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_5_mir.snap is excluded by !**/*.snap
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snap is excluded by !**/*.snap

📒 Files selected for processing (3)

baml_language/crates/baml_builtins/baml/llm.baml
baml_language/crates/baml_tests/src/engine.rs
baml_language/crates/baml_tests/tests/streaming.rs

baml_language/crates/baml_tests/tests/streaming.rs

- Wrap partial_parse in catch so mid-stream parse failures are silently skipped, matching legacy behavior for incomplete structured content. - Add accumulator metadata accessors: model(), finish_reason(), input_tokens(), output_tokens() as new sys ops. - Add final_parse sys op on PrimitiveClient for strict parsing of the final accumulated content (separate from permissive partial_parse). - Update stream_primitive to use final_parse for the completed value.

…plementation

vercel bot deployed to Preview – beps March 6, 2026 14:05 View deployment

coderabbitai bot requested changes Mar 6, 2026

View reviewed changes

vercel bot deployed to Preview – beps March 6, 2026 14:20 View deployment

coderabbitai bot requested changes Mar 6, 2026

View reviewed changes

baml_language/crates/baml_builtins/baml/llm.baml Show resolved Hide resolved

baml_language/crates/baml_builtins/baml/llm.baml Outdated Show resolved Hide resolved

baml_language/crates/baml_builtins/baml/llm.baml Show resolved Hide resolved

vercel bot deployed to Preview – promptfiddle March 6, 2026 14:42 View deployment

vercel bot deployed to Preview – beps March 6, 2026 17:08 View deployment

coderabbitai bot requested changes Mar 6, 2026

View reviewed changes

vercel bot deployed to Preview – promptfiddle March 6, 2026 17:29 View deployment

vercel bot deployed to Preview – beps March 6, 2026 17:32 View deployment

coderabbitai bot requested changes Mar 6, 2026

View reviewed changes

vercel bot deployed to Preview – beps March 6, 2026 17:45 View deployment

coderabbitai bot requested changes Mar 6, 2026

View reviewed changes

baml_language/crates/baml_tests/tests/streaming.rs Show resolved Hide resolved

baml_language/crates/baml_tests/tests/streaming.rs Outdated Show resolved Hide resolved

baml_language/crates/baml_tests/tests/streaming.rs Show resolved Hide resolved

vercel bot deployed to Preview – promptfiddle March 6, 2026 18:05 View deployment

vercel bot deployed to Preview – beps March 6, 2026 18:17 View deployment

vercel bot deployed to Preview – promptfiddle March 6, 2026 18:37 View deployment

Fix BEP-009 streaming regressions

56fc3d5

vercel bot deployed to Preview – beps March 6, 2026 18:49 View deployment

Fix builtin throw-analysis diagnostics

479d3aa

vercel bot deployed to Preview – beps March 6, 2026 19:09 View deployment

Strengthen streaming regression tests

6c7486a

vercel bot deployed to Preview – beps March 6, 2026 19:24 View deployment

antoniosarosi added 2 commits March 6, 2026 20:37

Merge remote-tracking branch 'origin/canary' into bep009-streaming-im…

9a4a4d4

…plementation

Refresh snapshots after canary merge

6531549

vercel bot deployed to Preview – beps March 6, 2026 19:42 View deployment

coderabbitai bot approved these changes Mar 6, 2026

View reviewed changes

Address CodeRabbit follow-up comments

34ca454

vercel bot deployed to Preview – beps March 6, 2026 19:55 View deployment

vercel bot deployed to Preview – promptfiddle March 6, 2026 20:15 View deployment

antoniosarosi added this pull request to the merge queue Mar 10, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 10, 2026

Conversation

antoniosarosi commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New sys ops

Bug found: continue not supported as catch arm expression

CodeRabbit review fixes

Follow-up compiler fixes

Test plan

Summary by CodeRabbit

Uh oh!

vercel bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

codspeed-hq bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 33.14%

Performance Changes

Footnotes

Uh oh!

github-actions bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Binary size checks passed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

antoniosarosi commented Mar 6, 2026 •

edited

Loading

Bug found: `continue` not supported as catch arm expression

vercel bot commented Mar 6, 2026 •

edited

Loading

coderabbitai bot commented Mar 6, 2026 •

edited

Loading

codspeed-hq bot commented Mar 6, 2026 •

edited

Loading

github-actions bot commented Mar 6, 2026 •

edited

Loading