Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 37 additions & 14 deletions docs/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,26 +30,39 @@ Here are some definitions of some key ideas encountered in this documentation.
tree
: A "gene tree", i.e., the genealogical tree describing how a collection of
genomes (usually at the tips of the tree) are related to each other at some
chromosomal location. See {ref}`sec_nodes_or_individuals` for discussion
of what a "genome" is.
chromosomal {ref}`position <sec_data_model_definitions_position>` or location.
As the trees may vary depending on this location, they are also known as "local
trees". See {ref}`sec_nodes_or_individuals` for discussion of what a "genome" is.

(sec_data_model_definitions_tree_sequence)=

tree sequence
: A "succinct tree sequence" (or tree sequence, for brevity) is an efficient
encoding of a sequence of correlated trees, such as one encounters looking
at the gene trees along a genome. A tree sequence efficiently captures the
structure shared by adjacent trees, (essentially) storing only what differs
between them.
: A "succinct tree sequence" (or tree sequence, for brevity) is an object
that stores the genetic ancestry and mutational history of a set of
aligned DNA sequences or genomes. The name reflects the idea that a common
way to treat genetic ancestry is as a sequence of correlated
{ref}`trees <sec_data_model_definitions_tree>` at different chromosomal
{ref}`positions <sec_data_model_definitions_position>`.
Branches that are shared between these trees are efficiently stored as a
single {ref}`edge <sec_data_model_definitions_edge>`, and adjacent trees
may differ by only a few such edges. These edges connect
{ref}`nodes <sec_data_model_definitions_node>` (genomes) in
the tree sequence, forming a
network or graph. Graphs of this sort are sometimes called ancestral
recombination graphs (ARGs), hence tree sequences provide a
flexible way to encode multiple types of ARG.

(sec_data_model_definitions_node)=

node
: Each branching point in each tree is associated with a particular genome
: Any point in a tree can be associated with a particular genome
in a particular ancestor, called a "node". Since each node represents a
specific genome it has a unique `time`, thought of as its birth time,
which determines the height of any branching points it is associated with.
See {ref}`sec_nodes_or_individuals` for discussion of what a "node" is.
specific genome it has a unique `time`, thought of as its birth time. Nodes
may or may not correspond to branching points, either in a local
{ref}`tree <sec_data_model_definitions_tree>` or in the whole graph.
However a branching point must always be associated with a node.
See {ref}`sec_nodes_or_individuals` for discussion of what a "node"
represents.

(sec_data_model_definitions_individual)=

Expand All @@ -66,7 +79,7 @@ individual
sample
: The focal nodes of a tree sequence, usually thought of as those from which
we have obtained data. The specification of these affects various
methods: (1) {meth}`TreeSequence.variants` and
methods: {meth}`TreeSequence.variants` and
{meth}`TreeSequence.haplotypes` will output the genotypes of the samples,
and {attr}`Tree.roots` only return roots ancestral to at least one
sample.
Expand All @@ -81,13 +94,15 @@ edge
: The topology of a tree sequence is defined by a set of **edges**. Each
edge is a tuple `(left, right, parent, child)`, which records a
parent-child relationship among a pair of nodes on the
on the half-open interval of chromosome `[left, right)`.
on the half-open interval `[left, right)` along the genome. The difference
between `left` and `right` is known as the "span" of the edge.

(sec_data_model_definitions_site)=

site
: Tree sequences can define the mutational state of nodes as well as their
topological relationships. A **site** is thought of as some position along
topological relationships. A **site** is thought of as some
{ref}`position <sec_data_model_definitions_position>` along
the genome at which variation occurs. Each site is associated with
a unique position and ancestral state.

Expand All @@ -114,6 +129,14 @@ migration
population
: A grouping of nodes, e.g., by sampling location.

(sec_data_model_definitions_position)=

position
: A location along the genome, from 0 to the
{ref}`sequence length<sec_data_model_definitions_sequence_length>`. In `tskit`
positions are stored as floating-point numbers, although it is common to
restrict positions to occur at discrete integer locations.

(sec_data_model_definitions_provenance)=

provenance
Expand Down