Skip to content

Proposition output type documentation #1575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions types_doc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
---
title: Output Types
---

# Output Types

Outlines provides a simple and intutive way of defining the output format of text generation.

## General approach

The general idea is that you should provide as an output type what you would give as the type hint of the return type of a function.
Thus, you can use most Python native types as well as the additional types from the `typing` module and some widespread Python packages such as `pydantic`.

Consider the following functions for instance:

```python
from datetime import datetime
from typing import Literal
from pydantic import BaseModel

class Baz(BaseModel):
foo: datetime
bar: Union[Dict, List[str]]

def foo() -> int:
...

def bar() -> Literal["a", "b"]:
...

def baz() -> Baz:
...
```

With an Outlines generator, you can generate text that would respect the type hints above by providing those as the output type:

```python
generator("my prompt", int) # "12"
generator("my prompt", iteral["a", "b"]) # "12"
generator("my prompt", Baz) # "{'foo': '', 'bar': ['qux', 'quux']}"
```

An important difference with function type hints though is that an Outlines generator always returns a string.
You have to cast the response into the type you want yourself.

## Outlines-specific types

While Outlines favors the widely adopted types described, there are a few specific cases in which you need to use a type define in the `outlines.types` module. Those types are necessary to avoid ambiguities and to respect the principle that out output types could be used as return type hints of functions.

### JsonSchema

There are multiple types you can use to defining a json schema-like structured output:
- A Pydantic class
- A dataclass
- A TypedDict
- A GenSon SchemaBuilder

However, 2 populars format are not supported directly: json schema strings and python dictionaries describing a json schema. To use those, you should wrap them in a `JsonSchema` object. For instance :

```python
from outlines.types import JsonSchema

schema_string = '{"type": "object", "properties": {"answer": {"type": "number"}}}'
output_type = JsonSchema(schema_string)

schema_dict = {
"type": "object",
"properties": {
"answer": {"type": "number"}
}
}
output_type = JsonSchema(schema_dict)
```

### Regex

As regex patterns are expressed as simple raw string literals, we also need to wrap them in an Outlines object: `outlines.types.Regex`. For instance:

```python
from outlines.types import Regex

regex = r"[0-9]{3}"
output_type = Regex(regex)
```

### Context-Free Grammar

Outlines allows you to generate text that respects the syntax of a context-free grammar. For that, you need to define a Lark grammar. As the latter is expressed as a string, we need once again to wrap in an Outlines object: `outlines.types.CFG`. For instance:

```python
from outlines.types import CFG

grammar_string = """
start: expr
expr: "{" expr "}" | "[" expr "]" |
"""
output_type = CFG(grammar_string)
```

## Output type availability

The output types presented above are not available for all models as some have only limited support for structured outputs. In general, we need to distinguish two different types of models:
- steerable models: they are fully run locally without a separate server such as that Outlines can control the structured generation process through the creation of a `LogitsProcessor`. For those, all output types available in Outlines are supported. Steerable models are `Transformers`, `VLLMOffline`, `LlamaCpp` and `MLXLM`.
- blackbox models: Outlines interact with a client but text generation happen in a separate server (either remote or local). In that case, Outlines can provide structured generations arguments as defined by the server's API, but it does not control the generation. Specific output type support depends on each model, please consult the model's documentation for more information. All models that are not listed in the steerable models fall in that category such as `OpenAI`, `VLLM`, `Ollama`...

Loading