|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +ULTK (Unnatural Language ToolKit) is a Python library for computational semantic typology research — specifically for "efficient communication" analyses that explain natural language structure in terms of competing pressures: minimizing cognitive complexity vs. maximizing communicative accuracy. |
| 8 | + |
| 9 | +## Commands |
| 10 | + |
| 11 | +```bash |
| 12 | +# Install all dependencies (including dev group for tests) |
| 13 | +uv sync --group dev |
| 14 | + |
| 15 | +# Run all tests |
| 16 | +uv run pytest src/tests/ |
| 17 | + |
| 18 | +# Run a single test file |
| 19 | +uv run pytest src/tests/test_language.py |
| 20 | + |
| 21 | +# Run a single test by name |
| 22 | +uv run pytest src/tests/test_language.py::TestLanguage::test_name |
| 23 | + |
| 24 | +# Format code (Black is enforced via CI on PRs) |
| 25 | +black src/ |
| 26 | +``` |
| 27 | + |
| 28 | +Tests are discovered automatically by pytest from `src/tests/`. The CI workflow runs `uv run pytest src/tests/` from the repo root. |
| 29 | + |
| 30 | +## Architecture |
| 31 | + |
| 32 | +### Two Main Modules |
| 33 | + |
| 34 | +**`ultk.language`** — Core data structures for semantic representations: |
| 35 | +- `semantics.py`: `Referent` (immutable semantic object), `Universe` (collection of Referents with a prior distribution), `Meaning` (mapping from Universe to arbitrary type T — e.g., booleans for truth values) |
| 36 | +- `language.py`: `Expression` (form + meaning pair), `Language` (frozenset of Expressions sharing a Universe). Helper `aggregate_expression_complexity()` bridges language and effcomm. |
| 37 | +- `sampling.py`: Generators for all meanings, expressions, and languages from a universe — used to enumerate the full hypothesis space. |
| 38 | +- `grammar/`: A probabilistic context-free grammar (PCFG) framework for building expressions as programs in a Language of Thought. `grammar.py` defines `Rule` and `Grammar`/`GrammaticalExpression`; `likelihood.py` provides scoring functions; `inference.py` handles MDL/Bayesian inference. |
| 39 | + |
| 40 | +**`ultk.effcomm`** — Efficient communication analysis tools: |
| 41 | +- `agent.py`: RSA (Rational Speech Act) agents — `LiteralSpeaker`, `LiteralListener`, `PragmaticSpeaker`, `PragmaticListener` — represented as weight matrices. |
| 42 | +- `informativity.py`: `informativity()` and `communicative_success()` — compute how well a language supports communication (vectorized as `diag(prior) @ S @ R ⊙ U`). |
| 43 | +- `tradeoff.py`: Pareto front computation (`pareto_optimal_languages`, `non_dominated_2d`, `dominates`) for simplicity/informativeness trade-off analysis. |
| 44 | +- `optimization.py`: `EvolutionaryOptimizer` — iterative algorithm to approximate the Pareto frontier via mutations (`AddExpression`, `RemoveExpression`). |
| 45 | +- `sampling.py`: `get_hypothetical_variants()` — generates null-hypothesis languages by permuting speaker weight matrices. |
| 46 | +- `analysis.py`: Aggregation utilities for building results DataFrames. |
| 47 | + |
| 48 | +**`ultk.util`**: |
| 49 | +- `frozendict.py`: `FrozenDict` — an immutable dict used extensively as keys in frozen dataclasses. |
| 50 | +- `io.py`: I/O helpers. |
| 51 | + |
| 52 | +### Key Design Patterns |
| 53 | + |
| 54 | +- Core objects (`Universe`, `Meaning`, `Expression`) are **frozen/immutable** (`@dataclass(frozen=True)` or manual `_frozen` flag), enabling hashing and use as dict keys. |
| 55 | +- `Meaning` stores its mapping as a `tuple[T, ...]` indexed parallel to `Universe.referents`, with `_ref_to_idx` for O(1) lookup. Access via `meaning[referent]`. |
| 56 | +- `Language` stores expressions as a `frozenset` — order-independent, hashable. |
| 57 | +- Grammar rules are defined via Python type annotations; `Rule.from_callable()` introspects function signatures to build rules automatically. |
| 58 | + |
| 59 | +### Examples |
| 60 | + |
| 61 | +`src/examples/` contains complete worked analyses: |
| 62 | +- `indefinites/` — efficient communication analysis of indefinite pronouns |
| 63 | +- `modals/` — semantic universals for modals |
| 64 | +- `learn_quant/` — quantifier learning |
0 commit comments