Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capturing and referencing variable in output #1516

Open
903124 opened this issue Mar 27, 2025 · 1 comment
Open

Capturing and referencing variable in output #1516

903124 opened this issue Mar 27, 2025 · 1 comment

Comments

@903124
Copy link

903124 commented Mar 27, 2025

Sometimes outside of getting a structured output for the final answer, you might want to get the intermediate output by LLM and reference them in a later stage. Together with #1407 (and related #1434 #1480) here a interface of doing is is proposed.

Using a named variable it could be possible to pass value forward, e.g. for a simple example to count number of r in the word 'strawberry':

thinking_pattern = (
        String("First, listing all letters in the word strawberry: ") + 
        Capture(repeat(char + optional(whitespace), 8, 11), "letters") + 
        newline
    )

Then each letters in Capture can be parsed into a list

letters_list = parse_capture("letters", list, [])
letters_list = [letter for letter in letters_list]
letter_count = len(letters_list)

And then create pattern using a loop so to emulate the effect of a for loop without LLM output

counting_items = []
for i in range(letter_count):
    counting_items.append(
        String(f" Checking letter {i+1}: ") + 
        String(letters_list[i]) +
        String(" Is it 'r'? ") + 
        (String("Yes") | String("No")) + 
        String(".") + 
        optional(String(" Running count: ") + Capture(digit, f"count_{i}")) + 
        newline
            )

Alternatively one can also capture a value e.g. String("Integer: ") + capture(integer, "integer_value"), then parse it using int_val = parse_value("integer_value", int) to cast into python type with

capture_pattern = (
    String("Let's extract different data types:") + newline +
    String("Integer: ") + capture(integer, "integer_value") + newline +...

)


def process_values():
    int_val = parse_value("integer_value", int, 0)
    store_value("int_squared", int_val ** 2)

    return (
        String("Processing results:") + newline +
        String(f"Integer squared: {int_val}² = {int_val ** 2}") + newline +...

output = create_dynamic_pipeline(
    model=model,
    prompt="Extract different data types and perform operations on them.",
    patterns=[capture_pattern,  process_values...

Here is my prototype of the implementation: https://github.com/903124/outlines/tree/capture

Looking forward to any suggestion or enhancement!

@903124
Copy link
Author

903124 commented Mar 27, 2025

Full code

model = outlines.models.llamacpp(
        "microsoft/Phi-3-mini-4k-instruct-gguf",
        "Phi-3-mini-4k-instruct-q4.gguf"
    )
    
    # Clear any previous captures
    clear_values()
    
    # Step 1: Capture the individual letters
    thinking_pattern = (
        String("First, listing all letters in the word strawberry starting from s: ") + 
        capture((char + whitespace).repeat(8, 11), "letters", strip_whitespace=True) + 
        newline
    )
    
    # Step 2: Let the LLM determine if each letter is an 'r'
    def create_llm_determination_pattern():
        # Parse the captured letters into a list
        letters_list = parse_value("letters", list, [])
        # Clean up to keep only alphabetic characters
        letters_list = [letter for letter in letters_list if letter.isalpha()]
        letter_count = len(letters_list)
        
        # Store the cleaned list back for later use
        store_value("cleaned_letters", letters_list)
        
        # Create a pattern where the LLM determines if each letter is an 'r'
        determination_pattern = String("Now I'll determine if each letter is an 'r':")
        
        for i, letter in enumerate(letters_list):
            # Let the LLM decide with a yes/no capture
            determination_pattern = determination_pattern + newline + (
                String(f"Letter {i+1}: {letter}") + 
                String(" Is this letter an 'r'? ") + 
                capture(String("Yes") | String("No"), f"llm_is_r_{i}")
            )
            
            # Also capture the running count if the LLM says it's an 'r'
            determination_pattern = determination_pattern + (
                String(" Running count: ") + 
                capture(digit, f"llm_count_{i}", strip_whitespace=True)
            )
        
        return determination_pattern + newline
       output = create_dynamic_pipeline(
        model=model,
        prompt="Count the number of 'r' characters in the word 'strawberry'.",
        patterns=[thinking_pattern, create_llm_determination_pattern],
        token_triggers=[r"\n\n", r"\n\n"],
        max_tokens=500
    )

Output:

First, listing all letters in the word strawberry starting from s: s t r a w b e r r y 
 Now I'll determine if each letter is an 'r':
Letter 1: s Is this letter an 'r'? No Running count: 0
Letter 2: t Is this letter an 'r'? No Running count: 0
Letter 3: r Is this letter an 'r'? Yes Running count: 1
Letter 4: a Is this letter an 'r'? No Running count: 1
Letter 5: w Is this letter an 'r'? No Running count: 1
Letter 6: b Is this letter an 'r'? No Running count: 1
Letter 7: e Is this letter an 'r'? No Running count: 1
Letter 8: r Is this letter an 'r'? Yes Running count: 2
Letter 9: r Is this letter an 'r'? Yes Running count: 3
Letter 10: y Is this letter an 'r'? No Running count: 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant