Shared conversation data types for ASR, NLP, and conversation analysis across the Ushadow ecosystem.
life-datatypes defines a unified conversation schema that bridges Chronicle (conversation storage), Mycelia (conversation analysis), and other services that work with transcribed conversations.
The schema models conversations at three hierarchical levels:
Conversation (time-bounded dialog with start/end)
├── Segments (atomic timeline units, delimited by speaker/pause)
│ └── Like lines in a screenplay
├── Fragments (consecutive segments grouped by topic)
│ └── Discussion of a specific subject within the timeline
└── Threads (fragments grouped by broader theme)
└── Can be non-consecutive across the conversation
pip install life-datatypesfrom life_datatypes import Conversation, Segment, Fragment, Thread
# Use the types...npm install life-datatypesimport { Conversation, Segment, Fragment, Thread } from 'life-datatypes'
// Use the types...from life_datatypes import Segment, Fragment, Thread, Conversation
from datetime import datetime
from uuid import uuid4
# Create segments from ASR output (chronological order)
segment1 = Segment(
id=str(uuid4()),
text="Hello, how are you?",
speaker="alice",
start=0.0,
end=2.5,
confidence=0.98,
language="en"
)
segment2 = Segment(
id=str(uuid4()),
text="I'm doing great!",
speaker="bob",
start=3.0,
end=4.5,
confidence=0.96,
language="en"
)
# Create a fragment (consecutive segments grouped by topic)
fragment = Fragment(
id=str(uuid4()),
text="Hello, how are you? I'm doing great!",
segmentIds=[segment1.id, segment2.id], # Consecutive segments
type="utterance",
metadata={"topic": "greeting"}
)
# Create a thread (fragments grouped by broader theme)
thread = Thread(
id=str(uuid4()),
title="Opening Greetings",
description="Initial hello exchange",
fragmentIds=[fragment.id],
participants=["alice", "bob"],
startTime=0.0,
endTime=4.5
)
# Assemble into a conversation
conversation = Conversation(
id=str(uuid4()),
title="Meeting with Alice",
participants=["alice", "bob"],
startTime=datetime.now(),
segments=[segment1, segment2],
fragments=[fragment],
threads=[thread]
)
# Validate and serialize
json_data = conversation.model_dump_json()import { Conversation, Segment, Fragment } from 'life-datatypes'
import { randomUUID } from 'crypto'
const segment: Segment = {
id: randomUUID(),
text: 'Hello, how are you?',
speaker: 'alice',
start: 0.0,
end: 2.5,
confidence: 0.98,
language: 'en'
}
const conversation: Conversation = {
id: randomUUID(),
title: 'Meeting with Alice',
participants: ['alice', 'bob'],
startTime: new Date().toISOString(),
segments: [segment]
}
// Serialize to JSON
const json = JSON.stringify(conversation)A contiguous transcribed text from a speaker with timing metadata.
Segment(
id: str # Unique identifier
text: str # Transcribed text
speaker: str # Speaker ID or name
start: float # Start time (seconds)
end: float # End time (seconds)
confidence: float # ASR confidence (0-1)
language: str # Language code (e.g., 'en')
metadata: dict # Provider-specific data
)A topic-linked group of consecutive segments (e.g., discussion of a specific subject).
Fragments must reference adjacent segments in the chronological timeline.
Fragment(
id: str # Unique identifier
text: str # Fragment text (combined from consecutive segments)
segmentIds: list[str] # References to consecutive segment IDs (must be adjacent)
type: str # 'sentence' | 'clause' | 'turn' | 'utterance' | 'other'
start: float # Start time (from first segment)
end: float # End time (from last segment)
metadata: dict # Fragment-specific metadata
)A broader topic grouping of fragments across the conversation.
Unlike fragments (which are consecutive), threads can span non-consecutive fragments. For example, a "budget" thread might include fragments scattered throughout the conversation.
Thread(
id: str # Unique identifier
title: str # Thread title/topic
description: str # Detailed description
fragmentIds: list[str] # Fragment IDs (can be non-consecutive)
participants: list[str] # Speakers involved
startTime: float # Earliest fragment start (seconds)
endTime: float # Latest fragment end (seconds)
metadata: dict # Tags, sentiment, importance, etc.
)The complete conversation with all three levels: segments (atomic timeline), fragments (consecutive topic groupings), and threads (broader themes).
Conversation(
id: str # Unique identifier
title: str # Conversation title
description: str # Summary
participants: list[str] # All speakers
startTime: str # ISO 8601 timestamp
endTime: str # ISO 8601 timestamp
duration: float # Total duration (seconds)
segments: list[Segment] # All segments (chronological order)
fragments: list[Fragment] # Topic-grouped consecutive segments
threads: list[Thread] # Broader topic groupings
metadata: dict # Conversation-level metadata
)The schema is language-agnostic, hierarchical, and version-stable:
- Single source of truth:
schema/conversation.json(JSON Schema Draft 7) - Type generation: Automatically generated from schema for Python (Pydantic) and TypeScript
- Three-level hierarchy: Segments (atomic) → Fragments (consecutive topic grouping) → Threads (broader themes)
- Extensibility:
metadatafields on every type allow provider-specific extensions - Validation: Built-in validation in generated Pydantic models
Three Levels:
- Segments: Delimited by speaker change or pause — atomic timeline units (like screenplay lines)
- Fragments: Consecutive segments linked by topic — local coherence within timeline
- Threads: Fragments linked by broader theme — can be scattered across conversation (non-consecutive)
This mirrors real conversation dynamics: you discuss budget with interruptions for other topics, then return to budget later. Threads capture this pattern; fragments capture the contiguous bits.
- Edit
schema/conversation.json - Regenerate types:
npm run codegen:typescript python scripts/codegen-python.py
- Test generators with examples:
python examples/python/example_conversation.py node examples/typescript/example_conversation.ts
- Create a pull request with schema changes and regenerated types
Tag a version and push to GitHub:
git tag v1.0.1
git push origin v1.0.1The GitHub Actions workflow automatically:
- Regenerates types
- Publishes to npm and PyPI
- Creates a GitHub Release
MIT — see LICENSE
- Ushadow Contributors