Skip to content

Title: [Feat] Robust Schema-Driven Extraction (Label-Value Fallback) #3093

@arindamkhaled

Description

@arindamkhaled

Product

BAML

Problem Statement / Use Case

Description
Currently, the BAML parser is highly reliable when models wrap their responses in JSON blocks or use standard delimiters. However, for advanced workflows—such as Chain-of-Thought (CoT) reasoning or when using smaller, token-efficient models—forcing strict JSON adds token overhead and increases the "formatting failure" rate.

I am proposing a Schema-Driven Fallback mode. If the parser does not detect a valid JSON block, it should attempt to "hunt" for the defined class properties directly within the raw text using the schema as a template.

Use Case
This is particularly critical for:

Token Optimization: Avoiding the "JSON tax" (brackets, quotes, and structural tokens).

CoT Workflows: Allowing models to think naturally and provide a "labeled" answer at the end.

Small Model Reliability: Enhancing performance for models like Gemma-2b or Llama-8b which follow instructions well but occasionally miss a closing bracket.

Minimal Reproducible Example
BAML Schema:

Code snippet
class Evaluation {
chain_of_thought string
final_answer Category // Enum: A, B, C
confidence_score float
}
Raw LLM Output (No JSON):

Plaintext
The logic follows that Statement 1 is true based on the definition of cosets.
chain_of_thought: Statement 1 is true because aH is a left coset... Statement 2 is false because the condition is ab⁻¹ ∈ H.

final_answer: C
confidence_score: 1.0
Desired Behavior: The parser should identify the labels chain_of_thought:, final_answer:, and confidence_score: as anchors and extract the subsequent text into the Evaluation object.

Proposed Solution

Proposed Extraction Logic
Primary: Attempt standard JSON/Markdown block parsing.

Fallback: Scan for property_name + delimiter (e.g., :, \n, or ```).

Heuristic: For Enum or Boolean types, perform a "best-fit" match on the remaining text if the specific label is missing but the value is unambiguous.

Alternative Solutions

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions