life-datatypes

Shared conversation data types for ASR, NLP, and conversation analysis across the Ushadow ecosystem.

Overview

life-datatypes defines a unified conversation schema that bridges Chronicle (conversation storage), Mycelia (conversation analysis), and other services that work with transcribed conversations.

The schema models conversations at three hierarchical levels:

Conversation (time-bounded dialog with start/end)
├── Segments (atomic timeline units, delimited by speaker/pause)
│   └── Like lines in a screenplay
├── Fragments (consecutive segments grouped by topic)
│   └── Discussion of a specific subject within the timeline
└── Threads (fragments grouped by broader theme)
    └── Can be non-consecutive across the conversation

Installation

Python (PyPI)

pip install life-datatypes

from life_datatypes import Conversation, Segment, Fragment, Thread

# Use the types...

TypeScript / JavaScript (npm)

npm install life-datatypes

import { Conversation, Segment, Fragment, Thread } from 'life-datatypes'

// Use the types...

Quick Example

Python

from life_datatypes import Segment, Fragment, Thread, Conversation
from datetime import datetime
from uuid import uuid4

# Create segments from ASR output (chronological order)
segment1 = Segment(
    id=str(uuid4()),
    text="Hello, how are you?",
    speaker="alice",
    start=0.0,
    end=2.5,
    confidence=0.98,
    language="en"
)

segment2 = Segment(
    id=str(uuid4()),
    text="I'm doing great!",
    speaker="bob",
    start=3.0,
    end=4.5,
    confidence=0.96,
    language="en"
)

# Create a fragment (consecutive segments grouped by topic)
fragment = Fragment(
    id=str(uuid4()),
    text="Hello, how are you? I'm doing great!",
    segmentIds=[segment1.id, segment2.id],  # Consecutive segments
    type="utterance",
    metadata={"topic": "greeting"}
)

# Create a thread (fragments grouped by broader theme)
thread = Thread(
    id=str(uuid4()),
    title="Opening Greetings",
    description="Initial hello exchange",
    fragmentIds=[fragment.id],
    participants=["alice", "bob"],
    startTime=0.0,
    endTime=4.5
)

# Assemble into a conversation
conversation = Conversation(
    id=str(uuid4()),
    title="Meeting with Alice",
    participants=["alice", "bob"],
    startTime=datetime.now(),
    segments=[segment1, segment2],
    fragments=[fragment],
    threads=[thread]
)

# Validate and serialize
json_data = conversation.model_dump_json()

TypeScript

import { Conversation, Segment, Fragment } from 'life-datatypes'
import { randomUUID } from 'crypto'

const segment: Segment = {
  id: randomUUID(),
  text: 'Hello, how are you?',
  speaker: 'alice',
  start: 0.0,
  end: 2.5,
  confidence: 0.98,
  language: 'en'
}

const conversation: Conversation = {
  id: randomUUID(),
  title: 'Meeting with Alice',
  participants: ['alice', 'bob'],
  startTime: new Date().toISOString(),
  segments: [segment]
}

// Serialize to JSON
const json = JSON.stringify(conversation)

Data Model

Segment

A contiguous transcribed text from a speaker with timing metadata.

Segment(
    id: str                    # Unique identifier
    text: str                  # Transcribed text
    speaker: str               # Speaker ID or name
    start: float               # Start time (seconds)
    end: float                 # End time (seconds)
    confidence: float          # ASR confidence (0-1)
    language: str              # Language code (e.g., 'en')
    metadata: dict             # Provider-specific data
)

Fragment

A topic-linked group of consecutive segments (e.g., discussion of a specific subject).

Fragments must reference adjacent segments in the chronological timeline.

Fragment(
    id: str                    # Unique identifier
    text: str                  # Fragment text (combined from consecutive segments)
    segmentIds: list[str]      # References to consecutive segment IDs (must be adjacent)
    type: str                  # 'sentence' | 'clause' | 'turn' | 'utterance' | 'other'
    start: float               # Start time (from first segment)
    end: float                 # End time (from last segment)
    metadata: dict             # Fragment-specific metadata
)

Thread

A broader topic grouping of fragments across the conversation.

Unlike fragments (which are consecutive), threads can span non-consecutive fragments. For example, a "budget" thread might include fragments scattered throughout the conversation.

Thread(
    id: str                    # Unique identifier
    title: str                 # Thread title/topic
    description: str           # Detailed description
    fragmentIds: list[str]     # Fragment IDs (can be non-consecutive)
    participants: list[str]    # Speakers involved
    startTime: float           # Earliest fragment start (seconds)
    endTime: float             # Latest fragment end (seconds)
    metadata: dict             # Tags, sentiment, importance, etc.
)

Conversation

The complete conversation with all three levels: segments (atomic timeline), fragments (consecutive topic groupings), and threads (broader themes).

Conversation(
    id: str                    # Unique identifier
    title: str                 # Conversation title
    description: str           # Summary
    participants: list[str]    # All speakers
    startTime: str             # ISO 8601 timestamp
    endTime: str               # ISO 8601 timestamp
    duration: float            # Total duration (seconds)
    segments: list[Segment]    # All segments (chronological order)
    fragments: list[Fragment]  # Topic-grouped consecutive segments
    threads: list[Thread]      # Broader topic groupings
    metadata: dict             # Conversation-level metadata
)

Schema Design

The schema is language-agnostic, hierarchical, and version-stable:

Single source of truth: schema/conversation.json (JSON Schema Draft 7)
Type generation: Automatically generated from schema for Python (Pydantic) and TypeScript
Three-level hierarchy: Segments (atomic) → Fragments (consecutive topic grouping) → Threads (broader themes)
Extensibility: metadata fields on every type allow provider-specific extensions
Validation: Built-in validation in generated Pydantic models

Design Decisions

Three Levels:

Segments: Delimited by speaker change or pause — atomic timeline units (like screenplay lines)
Fragments: Consecutive segments linked by topic — local coherence within timeline
Threads: Fragments linked by broader theme — can be scattered across conversation (non-consecutive)

This mirrors real conversation dynamics: you discuss budget with interruptions for other topics, then return to budget later. Threads capture this pattern; fragments capture the contiguous bits.

Contributing

Modifying the Schema

Edit schema/conversation.json

Regenerate types:

npm run codegen:typescript
python scripts/codegen-python.py

Test generators with examples:

python examples/python/example_conversation.py
node examples/typescript/example_conversation.ts

Create a pull request with schema changes and regenerated types

Publishing a Release

Tag a version and push to GitHub:

git tag v1.0.1
git push origin v1.0.1

The GitHub Actions workflow automatically:

Regenerates types
Publishes to npm and PyPI
Creates a GitHub Release

License

MIT — see LICENSE

Maintainers

Ushadow Contributors

Related Projects

Chronicle — Conversation storage and retrieval
Mycelia — Conversation analysis and insights

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
examples		examples
schema		schema
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

life-datatypes

Overview

Installation

Python (PyPI)

TypeScript / JavaScript (npm)

Quick Example

Python

TypeScript

Data Model

Segment

Fragment

Thread

Conversation

Schema Design

Design Decisions

Contributing

Modifying the Schema

Publishing a Release

License

Maintainers

Related Projects

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

Ushadow-io/life-datatypes

Folders and files

Latest commit

History

Repository files navigation

life-datatypes

Overview

Installation

Python (PyPI)

TypeScript / JavaScript (npm)

Quick Example

Python

TypeScript

Data Model

Segment

Fragment

Thread

Conversation

Schema Design

Design Decisions

Contributing

Modifying the Schema

Publishing a Release

License

Maintainers

Related Projects

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages