Skip to content

RENCI/tater

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

273 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tater

tater-icon

A Python library for building document annotation interfaces with Pydantic and Dash.

Define your annotation schema as a Pydantic model, pick widgets, and get a web-based annotation app with auto-save, progress tracking, and document navigation.

Installation

Install via uv (recommended) or pip.

uv

uv sync

By default, uv will create a virtual environment in .venv. To activate it:

source .venv/bin/activate

pip

Optionally create and activate a virtual environment first, e.g.:

python -m venv .venv
source .venv/bin/activate

Then install:

pip install .

For development (editable install):

pip install -e .

Quick Start

Create a Python config file (my_config.py):

from typing import Optional, Literal
from pydantic import BaseModel

class Schema(BaseModel):
    sentiment: Optional[Literal["positive", "negative", "neutral"]] = None
    is_relevant: bool = False

Run it — widgets are auto-generated from the schema:

tater --config my_config.py --documents data/documents.json

Or specify widgets explicitly:

from typing import Optional, Literal
from pydantic import BaseModel
from tater.widgets import SegmentedControlWidget, CheckboxWidget

class Schema(BaseModel):
    sentiment: Optional[Literal["positive", "negative", "neutral"]] = None
    is_relevant: bool = False

title = "My Annotator"

widgets = [
    SegmentedControlWidget("sentiment", label="Sentiment", required=True),
    CheckboxWidget("is_relevant", label="Relevant?"),
]

Alternatively, use a JSON schema file (my_schema.json):

{
  "spec_version": "1.0",
  "title": "My Annotator",
  "data_schema": [
    {
      "id": "sentiment",
      "type": "choice",
      "options": ["positive", "negative", "neutral"],
      "widget": {"type": "segmented_control", "label": "Sentiment", "required": true}
    },
    {"id": "is_relevant", "type": "boolean"}
  ]
}

Fields without a widget block get auto-generated default widgets.

tater --schema my_schema.json --documents data/documents.json

Example configs and schemas are in apps/.

Python config reference

A config file is a plain Python module. The tater CLI looks for these names:

Name Required Description
Schema yes Pydantic BaseModel subclass defining the annotation fields
widgets no List of TaterWidget instances. Omit to auto-generate all; supply a partial list to override specific fields and auto-generate the rest. SpanAnnotationWidget and hierarchical label widgets cannot be usefully auto-generated (entity types and hierarchy are required) — always include these explicitly.
title no App window title (default: "tater - document annotation")
description no Subtitle shown below the title
instructions no Markdown help text shown in the instructions drawer
register_callbacks no Callable (app: TaterApp) -> None called after widgets are registered; use for custom Dash callbacks and setting app.on_save

Widgets

Widgets are linked to Pydantic model fields by schema_field. Options for choice widgets are inferred from the field's Literal type — no manual list needed.

All widgets accept label, description, and most accept required.

Boolean

Widget Schema type Notes
CheckboxWidget bool
SwitchWidget bool Toggle switch
ChipWidget bool Single toggleable chip

Single choice

Widget Schema type Notes
SegmentedControlWidget Literal[...] Horizontal button group; vertical=True supported
RadioGroupWidget Literal[...] Radio buttons; vertical=True supported
SelectWidget Literal[...] Searchable dropdown
ChipRadioWidget Literal[...] Chip-style radio buttons; vertical=True supported

Multiple choice

Widget Schema type Notes
MultiSelectWidget list[Literal[...]] Searchable multi-select dropdown
CheckboxGroupWidget list[Literal[...]] Checkbox group; vertical=True supported

Numeric

Widget Schema type Extra params
NumberInputWidget int / float min_value, max_value, step
SliderWidget int / float min_value, max_value, step
RangeSliderWidget Optional[list[float]] min_value, max_value, step

Text

Widget Schema type Extra params
TextInputWidget str placeholder
TextAreaWidget str placeholder

Specialized

These widgets require explicit configuration and must always be included in your widgets list — they cannot be usefully auto-generated.

Span annotation

SpanAnnotationWidget — highlight text spans and assign entity types. Schema field must be list[SpanAnnotation]. Auto-generation produces a widget with no entity types.

from tater import SpanAnnotation
from tater.widgets import SpanAnnotationWidget, EntityType

SpanAnnotationWidget(
    "entities",
    label="Entities",
    entity_types=[
        EntityType("Medication"),
        EntityType("Diagnosis"),
        EntityType("Symptom"),
    ],
)

Colors are assigned automatically from the palette. To use a specific color for an entity, pass a hex string to EntityType:

EntityType("Medication", color="#4e79a7")
EntityType("Diagnosis", color="#e15759")

The palette parameter controls auto-assigned colors (default: "tableau10"). Palettes are from D3's categorical schemes:

Palette Description
category10 D3's category10
accent ColorBrewer Accent — mixed tones
dark2 ColorBrewer Dark2 — dark, saturated
observable10 Observable's 10-color palette
paired ColorBrewer Paired — 12 colors in light/dark pairs
pastel1 ColorBrewer Pastel1 — soft tones
pastel2 ColorBrewer Pastel2 — soft tones
set1 ColorBrewer Set1 — bold, high-contrast
set2 ColorBrewer Set2 — medium saturation
set3 ColorBrewer Set3 — light, 12 colors
tableau10 Tableau's 10-color categorical palette (default)
SpanAnnotationWidget("entities", label="Entities", palette="set1", entity_types=[...])

Hierarchical label

Navigate a tree hierarchy to select a node. Schema field must be str or Optional[str]. Optional[str] is indistinguishable from a plain text field during auto-generation.

from tater.widgets import (
    HierarchicalLabelCompactWidget,
    HierarchicalLabelFullWidget,
    HierarchicalLabelTagsWidget,
    load_hierarchy_from_yaml,
)

ontology = load_hierarchy_from_yaml("data/ontology.yaml")

# Chosen leaves appear as removable pills
HierarchicalLabelTagsWidget("tags", label="Tags", hierarchy=ontology)

# Shows only the selected node at each level (compact breadcrumb-style)
HierarchicalLabelCompactWidget("diagnosis", label="Diagnosis", hierarchy=ontology)

# Shows all siblings at every expanded level with a breadcrumb below the search bar
HierarchicalLabelFullWidget("diagnosis", label="Diagnosis", hierarchy=ontology)

All three accept searchable=True (default). Build a tree programmatically with build_tree(dict_or_list) or from a YAML file with load_hierarchy_from_yaml(path).

By default only leaf nodes can be selected. Pass allow_non_leaf=True to allow selecting any node — clicking a non-leaf selects it as the annotation value and also navigates into it to show its children. The selected node is indicated by a dark border regardless of depth:

HierarchicalLabelFullWidget("diagnosis", label="Diagnosis", hierarchy=ontology, allow_non_leaf=True)

Containers

GroupWidget — groups child widgets under a nested Pydantic model field:

class Address(BaseModel):
    city: str
    country: str

class Doc(BaseModel):
    address: Address

GroupWidget("address", label="Location", children=[
    TextInputWidget("address.city", label="City"),
    TextInputWidget("address.country", label="Country"),
])

ListableWidget — repeatable list of sub-widgets for list[SomeModel] fields, rendered as a vertical stack of cards:

ListableWidget("findings", label="Findings", item_label="Finding", item_widgets=[
    RadioGroupWidget("label", label="Label"),
])

TabsWidget — same as ListableWidget but items are shown as switchable tabs.

AccordionWidget — same as ListableWidget but items are shown as collapsible accordion panels.

Divider

DividerWidget — a labeled horizontal rule for visually separating sections. Has no schema field and does not contribute to the annotation model:

DividerWidget(label="Clinical Findings")
DividerWidget(label="Demographics", description="Patient background info")

JSON schema reference

A JSON schema file has this top-level structure:

{
  "spec_version": "1.0",
  "title": "My Annotator",
  "description": "Optional subtitle",
  "hierarchies": {
    "ontology": "path/to/ontology.yaml"
  },
  "data_schema": [ ... ]
}
Key Required Description
spec_version yes Must be "1.0"
data_schema yes Array of field definitions
title no App window title
description no Subtitle shown below the title
hierarchies no Map of named hierarchies (YAML file path or inline dict) used by hierarchical_label fields

Field definition

Every entry in data_schema (and item_fields / fields for repeater/group types) is either a field or a divider.

Field — has id and type at the top level. Data-schema keys only:

Key Required Description
id yes Field name (Pydantic field name and widget ID)
type yes Field type — see table below
options for choice/multi_choice List of option strings
default no Default value
fields for group Child field definitions
item_fields for repeater Item field definitions
widget no Widget config block — see below

Divider — no id or type; only a widget block:

{"widget": {"type": "divider", "label": "Section Heading"}}

Widget config block

All UI properties belong inside the widget block. widget.type is required for leaf fields when a widget block is present; it selects the widget class. Fields with no widget block get auto-generated default widgets.

Key Description
type Widget class — see field types table for valid values
label Display label (default: humanized id)
description Help text shown below the widget
required true marks the field for completion tracking
auto_advance true advances to the next document on selection (choice/boolean)
placeholder Placeholder text (text_input, text_area)
orientation "vertical" or "horizontal" (radio_group, chip_radio, checkbox_group, segmented_control)
min_value / max_value / step Bounds and step size (number_input, slider, range_slider)
entity_types List of entity type name strings (span_annotation)
hierarchy_ref Key into the top-level hierarchies dict (hierarchical_label)
searchable Enable search (default true) (hierarchical_label)
item_label Singular label for list items (listable, tabs, accordion)
conditional_on {"field": "field_id", "value": ...} — show this widget only when the named field equals the given value

Field types

type Schema type Default widget Widget type overrides
boolean bool checkbox switch, chip_boolean
choice Literal[...] segmented_control radio_group, select, chip_radio
multi_choice list[Literal[...]] multi_select checkbox_group
numeric float number_input slider
range_slider list[float] range_slider
text str text_input text_area
span_annotation list[SpanAnnotation] span_annotation
hierarchical_label Optional[str] hierarchical_label_tags hierarchical_label_compact, hierarchical_label_full
group nested model auto (GroupWidget)
repeater list[model] listable tabs, accordion

group — requires fields; widget.type is not used (there is only one GroupWidget). Provide a widget block to set label/description and explicitly control which child fields get widgets. Without a widget block, auto-generation covers the whole group.

repeater — requires item_fields; widget.type selects the layout (listable default, tabs, or accordion).

span_annotationentity_types is required in the widget block.

hierarchical_labelhierarchy_ref (in the widget block) must match a key in the top-level hierarchies dict.

Document format

JSON file listing documents to annotate. Document text can be provided inline or via a file path (exactly one is required):

[
  {"text": "Inline document text goes here."},
  {"text": "Inline document text also goes here.", "name": "Patient 1", "info": {"date": "2024-01-15"}},
  {"file_path": "data/note_001.txt"},
  {"file_path": "data/note_002.txt", "name": "Patient 2", "info": {"date": "2024-01-16"}}
]

Each document may have:

  • text — inline document text (use this or file_path, not both)
  • file_path — path to a .txt file; resolved relative to the documents file (use this or text, not both; not supported in hosted mode)
  • id — unique string ID (auto-generated as doc_000, doc_001, … if omitted)
  • name — display name
  • info — arbitrary metadata dict shown in the UI

Annotations file format

Auto-saved JSON keyed by document ID:

{
  "doc_000": {
    "annotations": {"sentiment": "positive", "summary": "Normal findings."},
    "metadata": {"flagged": false, "notes": "", "visited": true, "annotation_seconds": 42.0, "status": "complete"}
  }
}

Status values: "not_started", "in_progress", "complete".

CLI

tater --config CONFIG --documents PATH [options]
tater --schema SCHEMA --documents PATH [options]
tater --hosted [options]
Flag Description
--config PATH Python config file (one of --config / --schema required in single mode)
--schema PATH JSON schema file (one of --config / --schema required in single mode)
--documents PATH Documents JSON file (required in single mode)
--annotations PATH Annotations output file (default: <documents>_annotations.json)
--hosted Run in hosted mode (upload page at /, annotation UI at /annotate)
--port INT Server port (default: 8050)
--host STR Bind address (default: 127.0.0.1)
--debug Enable debug/hot-reload mode

Environment variables: TATER_PORT, TATER_HOST, TATER_DEBUG, TATER_SECRET_KEY.

Hosted mode

Hosted mode lets multiple users upload their own schema and documents and annotate independently — no server-side annotation state is kept between sessions.

tater --hosted --host 0.0.0.0 --port 8050

Flow — upload your own files:

  1. User visits / → upload page, "Upload files" tab
  2. Upload schema JSON and documents JSON; status icons confirm each file is valid
  3. If the schema references external hierarchy files, per-file upload zones appear automatically
  4. Optionally upload an existing annotations JSON to resume from a previous session
  5. Click Start Annotating → redirected to /annotate
  6. Annotate documents; click Download in the footer to save annotations as JSON
  7. Click the home icon in the header to start over

Flow — built-in examples:

  1. User visits / → click the "Browse examples" tab
  2. Click any example card → immediately redirected to /annotate with that example loaded

Hosted mode constraints vs. single mode:

  • No auto-save — annotations live in the browser (dcc.Store) and must be downloaded explicitly
  • file_path is not supported in documents — use inline text instead
  • Hierarchy files referenced by path in the schema must be uploaded separately (inline hierarchy dicts work without upload)

Testing

Setup

Install dev dependencies (includes pytest, dash[testing], and webdriver-manager):

uv sync --group dev
source .venv/bin/activate

Browser tests use Chrome by default, but Dash's testing framework supports other browsers — see the Dash testing docs for alternatives.

On macOS/Windows, install Chrome normally from google.com/chrome. On standard Linux, use your package manager or the official Linux install guide.

WSL users: Chrome must be installed via the .deb package (snap does not work in WSL):

wget -q https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome-stable_current_amd64.deb
sudo apt-get install -f   # resolve any missing dependencies

webdriver-manager automatically downloads a matching ChromeDriver on first run — no manual driver install needed.

Running tests

# Unit and integration tests (fast, no browser)
python -m pytest tests/ --ignore=tests/test_browser.py

# Browser tests (headless Chrome, ~45s)
python -m pytest tests/test_browser.py --headless

# Full suite
python -m pytest tests/ --headless

# With coverage
python -m pytest tests/ --ignore=tests/test_browser.py --cov=tater --cov-report=term-missing

About

Manual document annotation

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors