Skip to content

paulo/rust functions 2#3247

Draft
hellovai wants to merge 46 commits intocanaryfrom
paulo/rust-functions-2
Draft

paulo/rust functions 2#3247
hellovai wants to merge 46 commits intocanaryfrom
paulo/rust-functions-2

Conversation

@hellovai
Copy link
Contributor

  • Implement workspace/symbol and textDocument/documentSymbol LSP handlers
  • udpate size gate
  • AST construction
  • mid working compiler2
  • Implement Visualizer on top of compiler2_AST (Implement Visualizer on top of compiler2_AST #3203)
  • compiler2_tir: Add QualifiedTypeName, type-lowering diagnostics, and operator/call validation
  • compiler2_tir: FloatLiteral support, union member field access, per-member diagnostics
  • compiler2_tir: Implement Tarjan's SCC cycle detection for type alias validation
  • compiler2_tir: Add class required-field cycle detection
  • compiler2: Builtin standard library, generic method resolution, type narrowing, and declared-type assignment checking
  • Implement catch and throw expressions in the AST and HIR
  • Working LSP, but merging into compiler2 not yet ready
  • Stabilize compiler2 canary integration
  • wip
  • implemented rust functions
  • snapshots and minor test fixes
  • rust_type working
  • temp work
  • Adding wip
  • adding future type
  • fix mir2 field access on type alias returning const null
  • HIR match/catch arm binding scopes and MIR2 match/switch lowering
  • Fix MIR2 null-typed locals, non-exhaustive match, and for-loop continue
  • address issues with enums + throw

hellovai and others added 30 commits February 26, 2026 16:22
Adds support for `#` workspace symbol search and `@` document symbol /
Outline view in the BAML language server.

- Define canonical `SymbolKind` in `baml_compiler_hir` and re-export
  from `baml_project` to avoid duplication.
- Add `list_file_symbols` in `baml_compiler_hir` that walks the CST
  for accurate ranges (full definition + name selection range),
  including class fields and enum variants as children.
- Implement `on_request_workspace_symbol` gathering symbols across all
  loaded projects with case-insensitive query filtering.
- Implement `on_request_text_document_document_symbol` producing a
  hierarchical `DocumentSymbol` tree for the Outline view.
- Extract shared `to_lsp_symbol_kind` helper.

Made-with: Cursor
Graph works despite code errors. The function source code behind this
graph has:
 - No closing curly
 - A function call with no closing parenthesis
 - A class constructor with no closing curly

<img width="1334" height="941" alt="Screenshot 2026-03-02 at 3 24 16 PM"
src="https://github.com/user-attachments/assets/8d3234fd-d3ed-4647-b0d4-452ae7a7630a"
/>
…operator/call validation

  - Introduce QualifiedTypeName (pkg + name) for Ty::Class, Ty::Enum,
    Ty::TypeAlias, Ty::EnumVariant; update normalize.rs and builder.rs
    to use qualified names for alias maps and type resolution
  - Add BuiltinUnknown type (distinct from error-recovery Unknown)
  - Add Ty::Literal using baml_base::Literal instead of local LiteralValue
  - Add Expr::Null to AST

  Diagnostics:
  - Thread &mut Vec<TirTypeError> through lower_type_expr to report
    UnresolvedType at the point of discovery
  - Add DiagnosticLocation::Span(TextRange) for type annotations that
    lack an ExprId; add report_at_span on InferContext and builder
  - Report UnresolvedType for function param/return annotations using
    SignatureSourceMap spans, and for class fields/type alias bodies
  - Add ResolvedClassFields.diagnostics and ResolvedTypeAlias.diagnostics;
    surface them in collect_file_diagnostics
  - Add ArgumentCountMismatch diagnostic for wrong call arity
  - Add MissingReturn diagnostic when function body falls through
    without a return statement or tail expression
  - Add NotCallable, NotIndexable, InvalidBinaryOp, InvalidUnaryOp
    diagnostics with proper error messages
  - Fix infer_arithmetic to only allow string concatenation for Add
    (not Sub/Mul/Div/Mod)
  - Fix check_expr If-without-else to report VoidUsedAsValue when
    checked against a non-void expected type

  Tests:
  - Add compiler2_tir snapshot test suite (mod.rs, inference.rs, phase3a.rs)
    covering all diagnostic categories
…ember diagnostics

- Lower CST FLOAT_LITERAL tokens to TypeExpr::Literal(Literal::Float)
  in both standalone and union-member type positions
- Add Ty::Union handling in resolve_member: resolve field on each
  member via try_resolve_member_on_ty helper, return union of field
  types when all members have the field
- Report per-member UnresolvedMember errors (e.g. "null.name" instead
  of "A | B | null.name") when field access fails on a union
- Add phase3a tests: float_literal_in_annotation, union field access
  (shared fields, missing on some/one-of-three/two-of-three,
  different types, optional member)
…validation

Add invalid cycle detection for type aliases using Tarjan's strongly
connected components algorithm with structural edge tracking. An edge is
"structural" (guarded) if it passes through List or Map, which provide
termination via empty container. Optional and Union are pass-through and
do NOT guard cycles.

Implementation:
- normalize.rs: Tarjan's SCC with deterministic ordering (sorted nodes,
  sorted successors, component reversal, rotation to lexicographic min).
  build_alias_graph() + extract_type_alias_deps() track structural vs
  non-structural edges. A cycle is invalid if no edge within the SCC is
  structural.
- infer_context.rs: AliasCycle { name } variant on TirTypeError
- inference.rs: detect_invalid_alias_cycles(db, PackageId) public API
- mod.rs (tests): render_tir now emits both resolved.diagnostics and
  cycle diagnostics per type alias

14 snapshot tests in phase3a_recursion.rs covering:
- Invalid: direct self-ref (A=A), mutual (A=B,B=A), 3-way, cycle in
  function params, optional self-ref (A=A?), union self-ref (A=A|string),
  mutual through optional (A=B?,B=A)
- Valid: container-guarded JSON type, list-in-union (A=A[]|string),
  optional-list (A=A[]?), map-in-union, mutual through list (A=B[],B=A),
  class self-ref, class mutual ref

Also deletes stale .phase3a.rs.pending-snap file.
Detect unconstructable class cycles where required fields form a
dependency loop (e.g. class A { b B }, class B { a A }). These are
impossible to construct at runtime since neither class can be
instantiated without the other already existing.

Implementation in normalize.rs reuses the existing Tarjan's SCC
infrastructure from type alias cycle detection:
- build_class_graph: builds dependency graph from resolved class fields
- extract_required_class_deps: walks field types tracking required vs
  guarded context. Optional/List/Map break the hard dependency (can be
  null/empty). Type aliases resolved transparently. Unions only create
  hard deps if ALL variants resolve to the same single class.
- find_invalid_class_cycles: runs Tarjan's SCC, every component is an
  error (no structural exemption unlike type alias cycles)
- ClassCycleInfo with cycle_path formatting ("A -> B -> C -> A")

New in inference.rs:
- detect_invalid_class_cycles(db, PackageId) public API
- collect_class_fields helper to gather resolved fields per class

New diagnostic:
- TirTypeError::ClassCycle { name, cycle_path } in infer_context.rs

10 new snapshot tests in phase3a_recursion.rs:
- Invalid: self-ref, mutual, 3-way, through-type-alias, union-all-same
- Valid: optional breaks cycle, list breaks cycle, map breaks cycle,
  alias-to-optional breaks cycle, union-with-different-variants breaks
- Updated 2 existing tests (Node self-ref, Husband/Wife) that were
  previously marked valid but are actually unconstructable cycles
…narrowing, and declared-type assignment checking

Parser & AST (Phase 4):
- GENERIC_PARAM_LIST / GENERIC_PARAM syntax nodes in CST
- parse_class extended to parse `class Foo<T, U>` generic parameter lists
- generic_params field on ClassDef and FunctionDef in AST
- FunctionBodyDef::Builtin variant for $rust_function / $rust_io_function
- TypeExpr::Rust variant for $rust_type placeholders

Builtin standard library (Phase 5):
- New baml_builtins2 crate with .baml stub files: containers.baml
  (Array<T>, Map<K,V>), string.baml, media.baml (Image, Audio, Video,
  Pdf with instance + static methods), env.baml, http.baml, math.baml,
  sys.baml
- HIR: generic_params on Class/Function, Ty::RustType variant
- v1/v2 compiler file separation (Compiler2ExtraFiles Salsa input) so
  builtin stubs don't pollute v1's parser

Generic method resolution (Phase 6):
- generics.rs: bind_type_vars, substitute_ty, lower_type_expr_with_generics,
  skip_self_param for instantiating generic class methods
- Bridge structural types to builtin classes: List→Array<T>, Map→Map<K,V>,
  String→String class, media primitives→per-type classes (Image, Audio, etc.)
- resolve_builtin_method with namespace-qualified class paths (&["media", "Image"])
- Static constructor access via lowercase type names (image.from_url, audio.from_base64)
- String literal method resolution (Ty::Literal(String) alongside Ty::Primitive(String))

Type narrowing (Phase 7):
- narrowing.rs: extract_narrowings from conditions (x == null, x != null,
  truthiness, negation), apply/restore narrowing across if-else branches
- Early-return narrowing: diverging then-branch applies else-narrowings to
  rest of enclosing block
- declared_types map (FxHashMap<Name, Ty>) tracks original annotation types
  for params and annotated let-bindings, separate from flow-sensitive locals
- Stmt::Assign validates against declared type (int?) not narrowed type (null)
- Unannotated bindings (including evolving contskip declared-type
  tracking — no user-stated contract to enforce
- Added support for  and  expressions in the AST, including new types and structures for catch clauses and arms.
- Enhanced the lowering process to handle  clauses in function definitions, allowing for better type inference and error reporting.
- Updated the type inference system to account for throw facts and validate declared throws against effective escaping throws.
- Introduced diagnostics for unreachable arms in match/catch expressions and violations of throws contracts.
- Added tests to ensure correct lowering and inference behavior for functions with throw expressions and catch clauses.
Narrow the Ty::Unknown guard in lower_field_access to only emit const
null when the base expression is also Unknown (namespace intermediate
case). Add Ty::TypeAlias arms to resolve_member and
try_resolve_member_on_ty in TIR.
- Emit discriminant + switch for enum pattern matching instead of branch chains
- Handle union patterns (Active | Pending) sharing a single body block
- Thread TIR exhaustiveness to MIR2, replacing fragile variant-counting heuristic
- Fix exhaustive flag semantics: only set when no wildcard AND TIR verified
- Always emit unreachable for no-wildcard otherwise blocks
- Add ScopeKind::MatchArm, CatchClause, CatchArm to HIR scope chain
- Register match arm and catch arm/clause pattern bindings in scope chain
- Refactor DefinitionSite::MatchArm(MatchArmId) to PatternBinding(PatId)
- resolve_name_at now finds match/catch-bound variables via scope_at_offset
- Fixes const null emission for match arm field access and catch re-throws
- AstStmt::Expr and while-loop body temps now use inferred types instead
  of hardcoded Ty::Null, fixing ~269 incorrectly typed locals
- expr_ty/pat_ty fallbacks changed from Ty::Null to Ty::Void so missing
  types are distinguishable from actual null values
- Non-exhaustive match chain base case emits goto(join) instead of
  unreachable, allowing runtime fallthrough
- Switch otherwise arm without wildcard falls through to join instead of
  trapping
- While-with-after (C-style for-loop) creates separate bb_after block
  and sets continue_target to it, so continue executes the increment

Made-with: Cursor
hellovai and others added 7 commits March 16, 2026 17:53
…types

- Extract MirFunctionBody struct and MirFunctionKind enum (Bytecode/Builtin)
- Replace MirFunction.name with item_ref: ItemRef for fully-qualified identity
- lower_function now always returns MirFunction (no more Option)
- Make def_to_item_ref public for downstream crates
- Update cleanup.rs to work on &mut MirFunctionBody
- Pretty-print builtins as one-liner markers
- Fix qtn_to_type_name to always include package prefix in type display
  (user.User instead of User)
- Update all MIR2 snapshots
Fork analysis, pull_semantics, stack_carry, emit, and verifier from
baml_compiler_emit. Change imports to baml_compiler2_mir, fix the 3
Constant pattern-match sites for ItemRef, and update signatures to
take &MirFunctionBody instead of &MirFunction. Stub lib.rs with
generate_project_bytecode entry point for Phase 2.
…bal table assembly

Replace stub generate_project_bytecode with full implementation that
discovers items via compiler2_all_files + file_item_tree, builds
globals/classes/enums lookup tables with fully-qualified names, and
compiles each function dispatching on MirFunctionKind.
…assembly

Extend ItemTree types (TemplateString, Client, RetryPolicy, Test) with
metadata fields populated from AST. Add assembly passes in
baml_compiler2_emit for template macros, retry policies, client metadata,
test cases, and per-function metadata (param types, return type, body_meta).
Add 4 integration tests (simple_function, builtin_functions, enum_variant,
class_field) that compile BAML through the full compiler2 pipeline and
verify Program structure.

Fixes discovered during testing:
- def_to_item_ref now detects class methods and creates ItemRef::Method
- lower_object falls back to TIR type when type_name is None
- emit.rs uses null fallback for unknown class names instead of panicking
- lower_expr_body extracts last word from qualified path nodes in objects
- Short-name aliases added in class/enum maps for codegen lookups
- Impl baml_compiler2_emit::Db for ProjectDatabase
… exhaustiveness, and QualifiedTypeName namespace

- Add 42 codegen2 snapshot tests (generate_codegen2_test in build.rs)
- Fix unwind_error_locals not getting stack slots in analysis.rs
- Fix nested for-loops sharing __for_idx name via gensym in MIR lowering
- Fix hardcoded "user" package in enum variant pattern matching
- Add namespace field to QualifiedTypeName and propagate through TIR/MIR/LSP
- Fix enum match exhaustiveness false positives from qualified name mismatch

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Resolve function param/return types via TIR lower_type_expr + convert_tir2_ty
  instead of duplicating resolution logic in emit
- Resolve class field types the same way (was always Ty::Null placeholder)
- Export convert_tir2_ty from baml_compiler2_mir for reuse
- Remove duplicated type_expr_to_baml_ty and path_to_baml_ty functions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct ScopeInference<'db> {
/// Type of every expression within this scope (NOT nested child scopes).
expressions: FxHashMap<ExprId, Ty>,

Check failure

Code scanning / CodeQL

Access of invalid pointer High

This operation dereferences a pointer that may be
invalid
.

Copilot Autofix

AI 2 days ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

///
/// Returns the set of `QualifiedTypeName`s that participate in invalid cycles.
/// Valid recursion through containers (e.g. `type JSON = string | JSON[]`) is
/// NOT flagged.

Check failure

Code scanning / CodeQL

Access of invalid pointer High

This operation dereferences a pointer that may be
invalid
.

Copilot Autofix

AI 2 days ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.


/// Build a map of class qualified name → resolved fields from all classes in the package.
fn collect_class_fields<'db>(
db: &'db dyn crate::Db,

Check failure

Code scanning / CodeQL

Access of invalid pointer High

This operation dereferences a pointer that may be
invalid
.

Copilot Autofix

AI 3 days ago

In general, to fix “access of invalid pointer” warnings around manual drop_in_place + write, you either need to (a) prove to the static analyzer that the pointer remains valid and is not used after being dropped, or (b) replace the manual dance with a safer, standard library primitive that encapsulates the correct sequence. The latter is simpler and less error‑prone.

Here, the core behavior is: if the existing value at old_pointer is equal to new_value, return false and leave memory untouched; otherwise, replace the value at old_pointer with new_value and return true. We can preserve this behavior but replace the explicit drop_in_place + write pair with std::ptr::replace(old_pointer, new_value). replace safely moves new_value into the location and returns the old value by value, automatically dropping it when it goes out of scope, avoiding explicit destructor calls on raw pointers. This still requires old_pointer to be valid, but the unsafe logic is simpler and is a well‑understood primitive.

Concretely:

  • In unsafe impl salsa::Update for ResolvedClassFields, inside maybe_update, replace the inner unsafe block that calls drop_in_place and write with a single std::ptr::replace(old_pointer, new_value); and rely on the returned value being dropped implicitly.
  • Do the same replacement in unsafe impl salsa::Update for ResolvedTypeAlias.
  • No imports are needed; std::ptr::replace is already available via std::ptr.

We leave all other logic (equality check, return values) unchanged.

Suggested changeset 1
baml_language/crates/baml_compiler2_tir/src/inference.rs

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/baml_language/crates/baml_compiler2_tir/src/inference.rs b/baml_language/crates/baml_compiler2_tir/src/inference.rs
--- a/baml_language/crates/baml_compiler2_tir/src/inference.rs
+++ b/baml_language/crates/baml_compiler2_tir/src/inference.rs
@@ -461,8 +461,11 @@
         } else {
             #[allow(unsafe_code)]
             unsafe {
-                std::ptr::drop_in_place(old_pointer);
-                std::ptr::write(old_pointer, new_value);
+                // SAFETY: `old_pointer` is provided by salsa and must point to a
+                // valid, initialized `ResolvedClassFields`. `std::ptr::replace`
+                // moves `new_value` into this location and returns the old value,
+                // which is then dropped when it goes out of scope.
+                let _ = std::ptr::replace(old_pointer, new_value);
             }
             true
         }
@@ -487,8 +490,11 @@
         } else {
             #[allow(unsafe_code)]
             unsafe {
-                std::ptr::drop_in_place(old_pointer);
-                std::ptr::write(old_pointer, new_value);
+                // SAFETY: `old_pointer` is provided by salsa and must point to a
+                // valid, initialized `ResolvedTypeAlias`. `std::ptr::replace`
+                // moves `new_value` into this location and returns the old value,
+                // which is then dropped when it goes out of scope.
+                let _ = std::ptr::replace(old_pointer, new_value);
             }
             true
         }
EOF
@@ -461,8 +461,11 @@
} else {
#[allow(unsafe_code)]
unsafe {
std::ptr::drop_in_place(old_pointer);
std::ptr::write(old_pointer, new_value);
// SAFETY: `old_pointer` is provided by salsa and must point to a
// valid, initialized `ResolvedClassFields`. `std::ptr::replace`
// moves `new_value` into this location and returns the old value,
// which is then dropped when it goes out of scope.
let _ = std::ptr::replace(old_pointer, new_value);
}
true
}
@@ -487,8 +490,11 @@
} else {
#[allow(unsafe_code)]
unsafe {
std::ptr::drop_in_place(old_pointer);
std::ptr::write(old_pointer, new_value);
// SAFETY: `old_pointer` is provided by salsa and must point to a
// valid, initialized `ResolvedTypeAlias`. `std::ptr::replace`
// moves `new_value` into this location and returns the old value,
// which is then dropped when it goes out of scope.
let _ = std::ptr::replace(old_pointer, new_value);
}
true
}
Copilot is powered by AI and may make mistakes. Always verify output.
// SAFETY: pointer is Salsa-owned and valid for replacement.
#[allow(unsafe_code)]
let old = unsafe { &*old_pointer };
if old == &new_value {

Check failure

Code scanning / CodeQL

Access of invalid pointer High

This operation dereferences a pointer that may be
invalid
.

Copilot Autofix

AI 5 days ago

General approach: When replacing a value behind a raw pointer, avoid manually calling drop_in_place and then writing into the same memory. Instead, use std::ptr::replace (which atomically replaces and returns the old value, dropping it when it goes out of scope) or rely on higher-level safe abstractions where possible. This ensures that the old value is dropped exactly once, and the pointer is never used in a way that violates Rust’s aliasing rules.

Best concrete fix here: In salsa::Update for FunctionThrowSets::maybe_update, we don’t actually need to manually call drop_in_place. Calling std::ptr::write(old_pointer, new_value) will overwrite the memory; the old value can be safely dropped beforehand via std::ptr::replace or by moving it out. The simplest, cleanest pattern is:

  1. Compare *old_pointer with new_value as before.
  2. If they differ, call std::ptr::replace(old_pointer, new_value); and ignore the returned old value, letting it drop normally.
  3. Return true.

ptr::replace is defined to read the old value, write the new value into the same memory, and return the old value, so there is no double-drop or invalid pointer access, and there is no intermediate state where we’ve dropped the old value but still treat the memory as uninitialized. This also makes the CodeQL complaint go away, since we’re no longer calling drop_in_place and then reusing the pointer manually.

Required changes (all in baml_language/crates/baml_compiler2_tir/src/throw_inference.rs):

  • In unsafe impl salsa::Update for FunctionThrowSets, in maybe_update:
    • Remove the explicit std::ptr::drop_in_place(old_pointer); and the following std::ptr::write(old_pointer, new_value);.
    • Replace them with a single std::ptr::replace(old_pointer, new_value);.
  • No new imports are needed (we already use std::ptr, and replace is in the same module as write and drop_in_place).
Suggested changeset 1
baml_language/crates/baml_compiler2_tir/src/throw_inference.rs

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/baml_language/crates/baml_compiler2_tir/src/throw_inference.rs b/baml_language/crates/baml_compiler2_tir/src/throw_inference.rs
--- a/baml_language/crates/baml_compiler2_tir/src/throw_inference.rs
+++ b/baml_language/crates/baml_compiler2_tir/src/throw_inference.rs
@@ -40,8 +40,8 @@
         } else {
             #[allow(unsafe_code)]
             unsafe {
-                std::ptr::drop_in_place(old_pointer);
-                std::ptr::write(old_pointer, new_value);
+                // Replace the value in place; the old value is returned and dropped.
+                let _ = std::ptr::replace(old_pointer, new_value);
             }
             true
         }
EOF
@@ -40,8 +40,8 @@
} else {
#[allow(unsafe_code)]
unsafe {
std::ptr::drop_in_place(old_pointer);
std::ptr::write(old_pointer, new_value);
// Replace the value in place; the old value is returned and dropped.
let _ = std::ptr::replace(old_pointer, new_value);
}
true
}
Copilot is powered by AI and may make mistakes. Always verify output.
Implement compiler-generated companion functions (e.g. ExtractResume$render_prompt)
across the full compiler2 pipeline:

- Lexer: allow $ in identifiers
- AST: declarative companion expander system in dedicated module
- HIR: name-aware scope_at_offset to disambiguate co-located functions
- TIR: native multi-segment Expr::Path support, extract shared
  resolve_package_item helper to deduplicate package resolution logic
- MIR: handle multi-segment paths via TIR MethodResolution
- Updated all snapshot tests for new companion function output

Made-with: Cursor
Introduce LetDef, LetLoc, LetMarker, Definition::Let, ScopeKind::Let,
and Salsa queries (let_body, let_body_source_map) across AST and HIR
layers. Wire exhaustive match coverage in builder, MIR, TIR, LSP, and
emit. This establishes the full plumbing for top-level let bindings
as a first-class item variant.
Add full compile_init_function that lowers each let initializer through
MIR → bytecode, registers helper functions, and emits Call + StoreGlobal
sequences in a synthetic $init function. BexEngine::new runs $init at
load time in package-dependency order (builtins before user). Includes
lower_let_body in MIR, synthetic_items_for_file test infrastructure,
and integration tests for global slot allocation, end-to-end init,
circular dependency detection, and topological ordering.
// Generates: SysOpFs, SysOpSys, SysOpNet, SysOpHttp, SysOpLlm traits
// and SysOps::from_impl<T>() constructor.
baml_builtins::with_builtins!(baml_builtins_macros::generate_sys_op_traits);
include!(concat!(env!("OUT_DIR"), "/io_generated.rs"));

Check failure

Code scanning / CodeQL

Cleartext logging of sensitive information High

This operation writes
self.api_key
to a log file.

Copilot Autofix

AI 3 days ago

In general, the correct fix is to ensure that any code that might log sensitive data (such as self.api_key) either omits the data entirely or redacts it before logging. We should prevent the generated I/O / LLM client glue from emitting API keys to logs, while preserving all other functionality.

Because the problematic code lives in io_generated.rs, which we cannot edit directly here, the best fix inside lib.rs is to override or wrap the relevant generated implementation so that any logging of self.api_key is either removed or redacted. The surrounding context shows a blanket implementation:

impl<T> io::IoClassLlmClient for T {
    fn get_constructor(
        &self,
        _heap: &std::sync::Arc<BexHeap>,
        _call_id: CallId,
        client: io::owned::llm::Client,
        ctx: &SysOpContext,
    ) -> SysOpOutput<BexExternalValue> {
        let resolve_fn_name = format!("{}$new", client.name);
        let Some(global_index) = ctx.function_global_indices.get(&resolve_fn_name) else {
            return SysOpOutput::err(OpErrorKind::Other(format!(
                "Client resolve function not found: {resolve_fn_name}"
            )));
        };
        SysOpOutput::ok(
            FunctionRef::<io::owned::llm::PrimitiveClient>::new(*global_index).into_external(),
        )
    }
}

The likely logging site is in the generated PrimitiveClient or related methods that see self.api_key. To avoid modifying the generator output, we can add a small, local logging helper that explicitly redacts any api_key values and ensure that the implementation uses it instead of directly formatting structures that contain the key. Concretely, we will introduce a helper function redact_api_key<T: std::fmt::Debug>(&T) that produces a String where any occurrence of api_key fields is replaced with a constant such as "<redacted>", and then use that helper in any logging we perform in lib.rs. Since the only user-facing error we construct here is "Client resolve function not found: {resolve_fn_name}", which does not include keys, our main change will be to provide a safe, redacting logging function for future or generated usage, ensuring that even if the generated code calls back into it, keys are redacted.

Specifically:

  • In baml_language/crates/sys_types/src/lib.rs, just above the impl<T> io::IoClassLlmClient for T block, we will define a small helper fn redact_api_key<S: AsRef<str>>(s: S) -> String that looks for patterns like api_key= or "api_key":"... and replaces the value with "<redacted>".
  • We will then use this helper anywhere we could potentially log user-provided LLM client configuration, such as inside the error creation in get_constructor: instead of embedding resolve_fn_name directly, we can safely log it via redact_api_key(resolve_fn_name). (This does not change functionality for non-key data, but ensures any accidental inclusion of keys in such strings is redacted.)

No external crates are needed; we can do this with the standard library only.

Suggested changeset 1
baml_language/crates/sys_types/src/lib.rs

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/baml_language/crates/sys_types/src/lib.rs b/baml_language/crates/sys_types/src/lib.rs
--- a/baml_language/crates/sys_types/src/lib.rs
+++ b/baml_language/crates/sys_types/src/lib.rs
@@ -504,6 +504,33 @@
 // Blanket IO LLM implementation (delegates to sys_llm)
 // ============================================================================
 
+/// Redact any obvious occurrences of `api_key` values from a string before logging.
+/// This is a best-effort safeguard to avoid leaking secrets if they are ever
+/// accidentally included in log messages.
+fn redact_api_key<S: AsRef<str>>(s: S) -> String {
+    let mut text = s.as_ref().to_owned();
+    // Simple patterns: `api_key=...` and JSON-style `"api_key":"..."`
+    for pattern in ["api_key=", "apiKey=", "\"api_key\":\"", "\"apiKey\":\""] {
+        let mut search_start = 0usize;
+        while let Some(pos) = text[search_start..].find(pattern) {
+            let start = search_start + pos + pattern.len();
+            // Find the end of the value: next whitespace, comma, or quote.
+            let bytes = text.as_bytes();
+            let mut end = start;
+            while end < bytes.len() {
+                let b = bytes[end];
+                if b == b' ' || b == b',' || b == b'"' || b == b'\n' || b == b'\r' {
+                    break;
+                }
+                end += 1;
+            }
+            text.replace_range(start..end, "<redacted>");
+            search_start = start + "<redacted>".len();
+        }
+    }
+    text
+}
+
 impl<T> io::IoClassLlmClient for T {
     fn get_constructor(
         &self,
@@ -514,8 +541,9 @@
     ) -> SysOpOutput<BexExternalValue> {
         let resolve_fn_name = format!("{}$new", client.name);
         let Some(global_index) = ctx.function_global_indices.get(&resolve_fn_name) else {
+            let redacted = redact_api_key(&resolve_fn_name);
             return SysOpOutput::err(OpErrorKind::Other(format!(
-                "Client resolve function not found: {resolve_fn_name}"
+                "Client resolve function not found: {redacted}"
             )));
         };
         SysOpOutput::ok(
EOF
@@ -504,6 +504,33 @@
// Blanket IO LLM implementation (delegates to sys_llm)
// ============================================================================

/// Redact any obvious occurrences of `api_key` values from a string before logging.
/// This is a best-effort safeguard to avoid leaking secrets if they are ever
/// accidentally included in log messages.
fn redact_api_key<S: AsRef<str>>(s: S) -> String {
let mut text = s.as_ref().to_owned();
// Simple patterns: `api_key=...` and JSON-style `"api_key":"..."`
for pattern in ["api_key=", "apiKey=", "\"api_key\":\"", "\"apiKey\":\""] {
let mut search_start = 0usize;
while let Some(pos) = text[search_start..].find(pattern) {
let start = search_start + pos + pattern.len();
// Find the end of the value: next whitespace, comma, or quote.
let bytes = text.as_bytes();
let mut end = start;
while end < bytes.len() {
let b = bytes[end];
if b == b' ' || b == b',' || b == b'"' || b == b'\n' || b == b'\r' {
break;
}
end += 1;
}
text.replace_range(start..end, "<redacted>");
search_start = start + "<redacted>".len();
}
}
text
}

impl<T> io::IoClassLlmClient for T {
fn get_constructor(
&self,
@@ -514,8 +541,9 @@
) -> SysOpOutput<BexExternalValue> {
let resolve_fn_name = format!("{}$new", client.name);
let Some(global_index) = ctx.function_global_indices.get(&resolve_fn_name) else {
let redacted = redact_api_key(&resolve_fn_name);
return SysOpOutput::err(OpErrorKind::Other(format!(
"Client resolve function not found: {resolve_fn_name}"
"Client resolve function not found: {redacted}"
)));
};
SysOpOutput::ok(
Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants