A Python library for building document annotation interfaces with Pydantic and Dash.
Define your annotation schema as a Pydantic model, pick widgets, and get a web-based annotation app with auto-save, progress tracking, and document navigation.
Install via uv (recommended) or pip.
uv syncBy default, uv will create a virtual environment in .venv. To activate it:
source .venv/bin/activateOptionally create and activate a virtual environment first, e.g.:
python -m venv .venv
source .venv/bin/activateThen install:
pip install .For development (editable install):
pip install -e .Create a Python config file (my_config.py):
from typing import Optional, Literal
from pydantic import BaseModel
class Schema(BaseModel):
sentiment: Optional[Literal["positive", "negative", "neutral"]] = None
is_relevant: bool = FalseRun it — widgets are auto-generated from the schema:
tater --config my_config.py --documents data/documents.jsonOr specify widgets explicitly:
from typing import Optional, Literal
from pydantic import BaseModel
from tater.widgets import SegmentedControlWidget, CheckboxWidget
class Schema(BaseModel):
sentiment: Optional[Literal["positive", "negative", "neutral"]] = None
is_relevant: bool = False
title = "My Annotator"
widgets = [
SegmentedControlWidget("sentiment", label="Sentiment", required=True),
CheckboxWidget("is_relevant", label="Relevant?"),
]Alternatively, use a JSON schema file (my_schema.json):
{
"spec_version": "1.0",
"title": "My Annotator",
"data_schema": [
{
"id": "sentiment",
"type": "choice",
"options": ["positive", "negative", "neutral"],
"widget": {"type": "segmented_control", "label": "Sentiment", "required": true}
},
{"id": "is_relevant", "type": "boolean"}
]
}Fields without a widget block get auto-generated default widgets.
tater --schema my_schema.json --documents data/documents.jsonExample configs and schemas are in apps/.
A config file is a plain Python module. The tater CLI looks for these names:
| Name | Required | Description |
|---|---|---|
Schema |
yes | Pydantic BaseModel subclass defining the annotation fields |
widgets |
no | List of TaterWidget instances. Omit to auto-generate all; supply a partial list to override specific fields and auto-generate the rest. SpanAnnotationWidget and hierarchical label widgets cannot be usefully auto-generated (entity types and hierarchy are required) — always include these explicitly. |
title |
no | App window title (default: "tater - document annotation") |
description |
no | Subtitle shown below the title |
instructions |
no | Markdown help text shown in the instructions drawer |
register_callbacks |
no | Callable (app: TaterApp) -> None called after widgets are registered; use for custom Dash callbacks and setting app.on_save |
Widgets are linked to Pydantic model fields by schema_field. Options for choice widgets are
inferred from the field's Literal type — no manual list needed.
All widgets accept label, description, and most accept required.
| Widget | Schema type | Notes |
|---|---|---|
CheckboxWidget |
bool |
|
SwitchWidget |
bool |
Toggle switch |
ChipWidget |
bool |
Single toggleable chip |
| Widget | Schema type | Notes |
|---|---|---|
SegmentedControlWidget |
Literal[...] |
Horizontal button group; vertical=True supported |
RadioGroupWidget |
Literal[...] |
Radio buttons; vertical=True supported |
SelectWidget |
Literal[...] |
Searchable dropdown |
ChipRadioWidget |
Literal[...] |
Chip-style radio buttons; vertical=True supported |
| Widget | Schema type | Notes |
|---|---|---|
MultiSelectWidget |
list[Literal[...]] |
Searchable multi-select dropdown |
CheckboxGroupWidget |
list[Literal[...]] |
Checkbox group; vertical=True supported |
| Widget | Schema type | Extra params |
|---|---|---|
NumberInputWidget |
int / float |
min_value, max_value, step |
SliderWidget |
int / float |
min_value, max_value, step |
RangeSliderWidget |
Optional[list[float]] |
min_value, max_value, step |
| Widget | Schema type | Extra params |
|---|---|---|
TextInputWidget |
str |
placeholder |
TextAreaWidget |
str |
placeholder |
These widgets require explicit configuration and must always be included in your widgets list — they cannot be usefully auto-generated.
SpanAnnotationWidget — highlight text spans and assign entity types. Schema field must be list[SpanAnnotation]. Auto-generation produces a widget with no entity types.
from tater import SpanAnnotation
from tater.widgets import SpanAnnotationWidget, EntityType
SpanAnnotationWidget(
"entities",
label="Entities",
entity_types=[
EntityType("Medication"),
EntityType("Diagnosis"),
EntityType("Symptom"),
],
)Colors are assigned automatically from the palette. To use a specific color for an entity, pass a hex string to EntityType:
EntityType("Medication", color="#4e79a7")
EntityType("Diagnosis", color="#e15759")The palette parameter controls auto-assigned colors (default: "tableau10"). Palettes are from D3's categorical schemes:
| Palette | Description |
|---|---|
category10 |
D3's category10 |
accent |
ColorBrewer Accent — mixed tones |
dark2 |
ColorBrewer Dark2 — dark, saturated |
observable10 |
Observable's 10-color palette |
paired |
ColorBrewer Paired — 12 colors in light/dark pairs |
pastel1 |
ColorBrewer Pastel1 — soft tones |
pastel2 |
ColorBrewer Pastel2 — soft tones |
set1 |
ColorBrewer Set1 — bold, high-contrast |
set2 |
ColorBrewer Set2 — medium saturation |
set3 |
ColorBrewer Set3 — light, 12 colors |
tableau10 |
Tableau's 10-color categorical palette (default) |
SpanAnnotationWidget("entities", label="Entities", palette="set1", entity_types=[...])Navigate a tree hierarchy to select a node. Schema field must be str or Optional[str]. Optional[str] is indistinguishable from a plain text field during auto-generation.
from tater.widgets import (
HierarchicalLabelCompactWidget,
HierarchicalLabelFullWidget,
HierarchicalLabelTagsWidget,
load_hierarchy_from_yaml,
)
ontology = load_hierarchy_from_yaml("data/ontology.yaml")
# Chosen leaves appear as removable pills
HierarchicalLabelTagsWidget("tags", label="Tags", hierarchy=ontology)
# Shows only the selected node at each level (compact breadcrumb-style)
HierarchicalLabelCompactWidget("diagnosis", label="Diagnosis", hierarchy=ontology)
# Shows all siblings at every expanded level with a breadcrumb below the search bar
HierarchicalLabelFullWidget("diagnosis", label="Diagnosis", hierarchy=ontology)All three accept searchable=True (default). Build a tree programmatically with build_tree(dict_or_list) or from a YAML file with load_hierarchy_from_yaml(path).
By default only leaf nodes can be selected. Pass allow_non_leaf=True to allow selecting any node — clicking a non-leaf selects it as the annotation value and also navigates into it to show its children. The selected node is indicated by a dark border regardless of depth:
HierarchicalLabelFullWidget("diagnosis", label="Diagnosis", hierarchy=ontology, allow_non_leaf=True)GroupWidget — groups child widgets under a nested Pydantic model field:
class Address(BaseModel):
city: str
country: str
class Doc(BaseModel):
address: Address
GroupWidget("address", label="Location", children=[
TextInputWidget("address.city", label="City"),
TextInputWidget("address.country", label="Country"),
])ListableWidget — repeatable list of sub-widgets for list[SomeModel] fields, rendered as a vertical stack of cards:
ListableWidget("findings", label="Findings", item_label="Finding", item_widgets=[
RadioGroupWidget("label", label="Label"),
])TabsWidget — same as ListableWidget but items are shown as switchable tabs.
AccordionWidget — same as ListableWidget but items are shown as collapsible accordion panels.
DividerWidget — a labeled horizontal rule for visually separating sections. Has no schema field and does not contribute to the annotation model:
DividerWidget(label="Clinical Findings")
DividerWidget(label="Demographics", description="Patient background info")A JSON schema file has this top-level structure:
{
"spec_version": "1.0",
"title": "My Annotator",
"description": "Optional subtitle",
"hierarchies": {
"ontology": "path/to/ontology.yaml"
},
"data_schema": [ ... ]
}| Key | Required | Description |
|---|---|---|
spec_version |
yes | Must be "1.0" |
data_schema |
yes | Array of field definitions |
title |
no | App window title |
description |
no | Subtitle shown below the title |
hierarchies |
no | Map of named hierarchies (YAML file path or inline dict) used by hierarchical_label fields |
Every entry in data_schema (and item_fields / fields for repeater/group types) is either a field or a divider.
Field — has id and type at the top level. Data-schema keys only:
| Key | Required | Description |
|---|---|---|
id |
yes | Field name (Pydantic field name and widget ID) |
type |
yes | Field type — see table below |
options |
for choice/multi_choice |
List of option strings |
default |
no | Default value |
fields |
for group |
Child field definitions |
item_fields |
for repeater |
Item field definitions |
widget |
no | Widget config block — see below |
Divider — no id or type; only a widget block:
{"widget": {"type": "divider", "label": "Section Heading"}}All UI properties belong inside the widget block. widget.type is required for leaf fields when a widget block is present; it selects the widget class. Fields with no widget block get auto-generated default widgets.
| Key | Description |
|---|---|
type |
Widget class — see field types table for valid values |
label |
Display label (default: humanized id) |
description |
Help text shown below the widget |
required |
true marks the field for completion tracking |
auto_advance |
true advances to the next document on selection (choice/boolean) |
placeholder |
Placeholder text (text_input, text_area) |
orientation |
"vertical" or "horizontal" (radio_group, chip_radio, checkbox_group, segmented_control) |
min_value / max_value / step |
Bounds and step size (number_input, slider, range_slider) |
entity_types |
List of entity type name strings (span_annotation) |
hierarchy_ref |
Key into the top-level hierarchies dict (hierarchical_label) |
searchable |
Enable search (default true) (hierarchical_label) |
item_label |
Singular label for list items (listable, tabs, accordion) |
conditional_on |
{"field": "field_id", "value": ...} — show this widget only when the named field equals the given value |
type |
Schema type | Default widget | Widget type overrides |
|---|---|---|---|
boolean |
bool |
checkbox |
switch, chip_boolean |
choice |
Literal[...] |
segmented_control |
radio_group, select, chip_radio |
multi_choice |
list[Literal[...]] |
multi_select |
checkbox_group |
numeric |
float |
number_input |
slider |
range_slider |
list[float] |
range_slider |
— |
text |
str |
text_input |
text_area |
span_annotation |
list[SpanAnnotation] |
span_annotation |
— |
hierarchical_label |
Optional[str] |
hierarchical_label_tags |
hierarchical_label_compact, hierarchical_label_full |
group |
nested model | auto (GroupWidget) |
— |
repeater |
list[model] |
listable |
tabs, accordion |
group — requires fields; widget.type is not used (there is only one GroupWidget). Provide a widget block to set label/description and explicitly control which child fields get widgets. Without a widget block, auto-generation covers the whole group.
repeater — requires item_fields; widget.type selects the layout (listable default, tabs, or accordion).
span_annotation — entity_types is required in the widget block.
hierarchical_label — hierarchy_ref (in the widget block) must match a key in the top-level hierarchies dict.
JSON file listing documents to annotate. Document text can be provided inline or via a file path (exactly one is required):
[
{"text": "Inline document text goes here."},
{"text": "Inline document text also goes here.", "name": "Patient 1", "info": {"date": "2024-01-15"}},
{"file_path": "data/note_001.txt"},
{"file_path": "data/note_002.txt", "name": "Patient 2", "info": {"date": "2024-01-16"}}
]Each document may have:
text— inline document text (use this orfile_path, not both)file_path— path to a.txtfile; resolved relative to the documents file (use this ortext, not both; not supported in hosted mode)id— unique string ID (auto-generated asdoc_000,doc_001, … if omitted)name— display nameinfo— arbitrary metadata dict shown in the UI
Auto-saved JSON keyed by document ID:
{
"doc_000": {
"annotations": {"sentiment": "positive", "summary": "Normal findings."},
"metadata": {"flagged": false, "notes": "", "visited": true, "annotation_seconds": 42.0, "status": "complete"}
}
}Status values: "not_started", "in_progress", "complete".
tater --config CONFIG --documents PATH [options]
tater --schema SCHEMA --documents PATH [options]
tater --hosted [options]
| Flag | Description |
|---|---|
--config PATH |
Python config file (one of --config / --schema required in single mode) |
--schema PATH |
JSON schema file (one of --config / --schema required in single mode) |
--documents PATH |
Documents JSON file (required in single mode) |
--annotations PATH |
Annotations output file (default: <documents>_annotations.json) |
--hosted |
Run in hosted mode (upload page at /, annotation UI at /annotate) |
--port INT |
Server port (default: 8050) |
--host STR |
Bind address (default: 127.0.0.1) |
--debug |
Enable debug/hot-reload mode |
Environment variables: TATER_PORT, TATER_HOST, TATER_DEBUG, TATER_SECRET_KEY.
Hosted mode lets multiple users upload their own schema and documents and annotate independently — no server-side annotation state is kept between sessions.
tater --hosted --host 0.0.0.0 --port 8050Flow — upload your own files:
- User visits
/→ upload page, "Upload files" tab - Upload schema JSON and documents JSON; status icons confirm each file is valid
- If the schema references external hierarchy files, per-file upload zones appear automatically
- Optionally upload an existing annotations JSON to resume from a previous session
- Click Start Annotating → redirected to
/annotate - Annotate documents; click Download in the footer to save annotations as JSON
- Click the home icon in the header to start over
Flow — built-in examples:
- User visits
/→ click the "Browse examples" tab - Click any example card → immediately redirected to
/annotatewith that example loaded
Hosted mode constraints vs. single mode:
- No auto-save — annotations live in the browser (
dcc.Store) and must be downloaded explicitly file_pathis not supported in documents — use inlinetextinstead- Hierarchy files referenced by path in the schema must be uploaded separately (inline hierarchy dicts work without upload)
Install dev dependencies (includes pytest, dash[testing], and webdriver-manager):
uv sync --group dev
source .venv/bin/activateBrowser tests use Chrome by default, but Dash's testing framework supports other browsers — see the Dash testing docs for alternatives.
On macOS/Windows, install Chrome normally from google.com/chrome. On standard Linux, use your package manager or the official Linux install guide.
WSL users: Chrome must be installed via the .deb package (snap does not work in WSL):
wget -q https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome-stable_current_amd64.deb
sudo apt-get install -f # resolve any missing dependencieswebdriver-manager automatically downloads a matching ChromeDriver on first run — no manual driver install needed.
# Unit and integration tests (fast, no browser)
python -m pytest tests/ --ignore=tests/test_browser.py
# Browser tests (headless Chrome, ~45s)
python -m pytest tests/test_browser.py --headless
# Full suite
python -m pytest tests/ --headless
# With coverage
python -m pytest tests/ --ignore=tests/test_browser.py --cov=tater --cov-report=term-missing