diff --git a/docs/src/client/stores.md b/docs/src/client/stores.md
deleted file mode 100644
index 76847983e..000000000
--- a/docs/src/client/stores.md
+++ /dev/null
@@ -1 +0,0 @@
-# Work in progress
diff --git a/docs/src/concepts/data-model.md b/docs/src/concepts/data-model.md
index 9754e2392..14528fe04 100644
--- a/docs/src/concepts/data-model.md
+++ b/docs/src/concepts/data-model.md
@@ -2,11 +2,23 @@
## What is a data model?
-A **data model** refers to a conceptual framework for thinking about data and about
-operations on data.
-A data model defines the mental toolbox of the data scientist; it has less to do with
-the architecture of the data systems, although architectures are often intertwined with
-data models.
+A **data model** is a conceptual framework that defines how data is organized,
+represented, and transformed. It gives us the components for creating blueprints for the
+structure and operations of data management systems, ensuring consistency and efficiency
+in data handling.
+
+Data management systems are built to accommodate these models, allowing us to manage
+data according to the principles laid out by the model. If you’re studying data science
+or engineering, you’ve likely encountered different data models, each providing a unique
+approach to organizing and manipulating data.
+
+A data model is defined by considering the following key aspects:
+
++ What are the fundamental elements used to structure the data?
++ What operations are available for defining, creating, and manipulating the data?
++ What mechanisms exist to enforce the structure and rules governing valid data interactions?
+
+## Types of data models
Among the most familiar data models are those based on files and folders: data of any
kind are lumped together into binary strings called **files**, files are collected into
@@ -24,17 +36,16 @@ objects in memory with properties and methods for transformations of such data.
## Relational data model
The **relational model** is a way of thinking about data as sets and operations on sets.
-Formalized almost a half-century ago
-([Codd, 1969](https://dl.acm.org/citation.cfm?doid=362384.362685)), the relational data
-model provides the most rigorous approach to structured data storage and the most
-precise approach to data querying.
-The model is defined by the principles of data representation, domain constraints,
-uniqueness constraints, referential constraints, and declarative queries as summarized
-below.
+Formalized almost a half-century ago ([Codd,
+1969](https://dl.acm.org/citation.cfm?doid=362384.362685)). The relational data model is
+one of the most powerful and precise ways to store and manage structured data. At its
+core, this model organizes all data into tables--representing mathematical
+relations---where each table consists of rows (representing mathematical tuples) and
+columns (often called attributes).
### Core principles of the relational data model
-**Data representation**
+**Data representation:**
Data are represented and manipulated in the form of relations.
A relation is a set (i.e. an unordered collection) of entities of values for each of
the respective named attributes of the relation.
@@ -43,27 +54,27 @@ below.
A collection of base relations with their attributes, domain constraints, uniqueness
constraints, and referential constraints is called a schema.
-**Domain constraints**
- Attribute values are drawn from corresponding attribute domains, i.e. predefined sets
- of values.
- Attribute domains may not include relations, which keeps the data model flat, i.e.
- free of nested structures.
+**Domain constraints:**
+ Each attribute (column) in a table is associated with a specific attribute domain (or
+ datatype, a set of possible values), ensuring that the data entered is valid.
+ Attribute domains may not include relations, which keeps the data model
+ flat, i.e. free of nested structures.
-**Uniqueness constraints**
+**Uniqueness constraints:**
Entities within relations are addressed by values of their attributes.
To identify and relate data elements, uniqueness constraints are imposed on subsets
of attributes.
Such subsets are then referred to as keys.
One key in a relation is designated as the primary key used for referencing its elements.
-**Referential constraints**
- Associations among data are established by means of referential constraints with the
+**Referential constraints:**
+ Associations among data are established by means of referential constraints with the
help of foreign keys.
A referential constraint on relation A referencing relation B allows only those
entities in A whose foreign key attributes match the key attributes of an entity in B.
-**Declarative queries**
- Data queries are formulated through declarative, as opposed to imperative,
+**Declarative queries:**
+ Data queries are formulated through declarative, as opposed to imperative,
specifications of sought results.
This means that query expressions convey the logic for the result rather than the
procedure for obtaining it.
@@ -86,32 +97,76 @@ Similar to spreadsheets, relations are often visualized as tables with *attribut
corresponding to *columns* and *entities* corresponding to *rows*.
In particular, SQL uses the terms *table*, *column*, and *row*.
-## DataJoint is a refinement of the relational data model
+## The DataJoint Model
DataJoint is a conceptual refinement of the relational data model offering a more
-expressive and rigorous framework for database programming
-([Yatsenko et al., 2018](https://arxiv.org/abs/1807.11104)).
-The DataJoint model facilitates clear conceptual modeling, efficient schema design, and
-precise and flexible data queries.
-The model has emerged over a decade of continuous development of complex data pipelines
-for neuroscience experiments
-([Yatsenko et al., 2015](https://www.biorxiv.org/content/early/2015/11/14/031658)).
-DataJoint has allowed researchers with no prior knowledge of databases to collaborate
-effectively on common data pipelines sustaining data integrity and supporting flexible
-access.
-DataJoint is currently implemented as client libraries in MATLAB and Python.
-These libraries work by transpiling DataJoint queries into SQL before passing them on
-to conventional relational database systems that serve as the backend, in combination
-with bulk storage systems for storing large contiguous data objects.
+expressive and rigorous framework for database programming ([Yatsenko et al.,
+2018](https://arxiv.org/abs/1807.11104)). The DataJoint model facilitates conceptual
+clarity, efficiency, workflow management, and precise and flexible data
+queries. By enforcing entity normalization,
+simplifying dependency declarations, offering a rich query algebra, and visualizing
+relationships through schema diagrams, DataJoint makes relational database programming
+more intuitive and robust for complex data pipelines.
+
+The model has emerged over a decade of continuous development of complex data
+pipelines for neuroscience experiments ([Yatsenko et al.,
+2015](https://www.biorxiv.org/content/early/2015/11/14/031658)). DataJoint has allowed
+researchers with no prior knowledge of databases to collaborate effectively on common
+data pipelines sustaining data integrity and supporting flexible access. DataJoint is
+currently implemented as client libraries in MATLAB and Python. These libraries work by
+transpiling DataJoint queries into SQL before passing them on to conventional relational
+database systems that serve as the backend, in combination with bulk storage systems for
+storing large contiguous data objects.
DataJoint comprises:
-- a schema [definition](../design/tables/declare.md) language
-- a data [manipulation](../manipulation/index.md) language
-- a data [query](../query/principles.md) language
-- a [diagramming](../design/diagrams.md) notation for visualizing relationships between
++ a schema [definition](../design/tables/declare.md) language
++ a data [manipulation](../manipulation/index.md) language
++ a data [query](../query/principles.md) language
++ a [diagramming](../design/diagrams.md) notation for visualizing relationships between
modeled entities
The key refinement of DataJoint over other relational data models and their
implementations is DataJoint's support of
[entity normalization](../design/normalization.md).
+
+### Core principles of the DataJoint model
+
+**Entity Normalization**
+ DataJoint enforces entity normalization, ensuring that every entity set (table) is
+ well-defined, with each element belonging to the same type, sharing the same
+ attributes, and distinguished by the same primary key. This principle reduces
+ redundancy and avoids data anomalies, similar to Boyce-Codd Normal Form, but with a
+ more intuitive structure than traditional SQL.
+
+**Simplified Schema Definition and Dependency Management**
+ DataJoint introduces a schema definition language that is more expressive and less
+ error-prone than SQL. Dependencies are explicitly declared using arrow notation
+ (->), making referential constraints easier to understand and visualize. The
+ dependency structure is enforced as an acyclic directed graph, which simplifies
+ workflows by preventing circular dependencies.
+
+**Integrated Query Operators producing a Relational Algebra**
+ DataJoint introduces five query operators (restrict, join, project, aggregate, and
+ union) with algebraic closure, allowing them to be combined seamlessly. These
+ operators are designed to maintain operational entity normalization, ensuring query
+ outputs remain valid entity sets.
+
+**Diagramming Notation for Conceptual Clarity**
+ DataJoint’s schema diagrams simplify the representation of relationships between
+ entity sets compared to ERM diagrams. Relationships are expressed as dependencies
+ between entity sets, which are visualized using solid or dashed lines for primary
+ and secondary dependencies, respectively.
+
+**Unified Logic for Binary Operators**
+ DataJoint simplifies binary operations by requiring attributes involved in joins or
+ comparisons to be homologous (i.e., sharing the same origin). This avoids the
+ ambiguity and pitfalls of natural joins in SQL, ensuring more predictable query
+ results.
+
+**Optimized Data Pipelines for Scientific Workflows**
+ DataJoint treats the database as a data pipeline where each entity set defines a
+ step in the workflow. This makes it ideal for scientific experiments and complex
+ data processing, such as in neuroscience. Its MATLAB and Python libraries transpile
+ DataJoint queries into SQL, bridging the gap between scientific programming and
+ relational databases.
diff --git a/docs/src/concepts/data-pipelines.md b/docs/src/concepts/data-pipelines.md
index 6dffcdae4..cf20b075b 100644
--- a/docs/src/concepts/data-pipelines.md
+++ b/docs/src/concepts/data-pipelines.md
@@ -157,10 +157,10 @@ with external groups.
## Summary of DataJoint features
1. A free, open-source framework for scientific data pipelines and workflow management
-1. Data hosting in cloud or in-house
-1. MySQL, filesystems, S3, and Globus for data management
-1. Define, visualize, and query data pipelines from MATLAB or Python
-1. Enter and view data through GUIs
-1. Concurrent access by multiple users and computational agents
-1. Data integrity: identification, dependencies, groupings
-1. Automated distributed computation
+2. Data hosting in cloud or in-house
+3. MySQL, filesystems, S3, and Globus for data management
+4. Define, visualize, and query data pipelines from MATLAB or Python
+5. Enter and view data through GUIs
+6. Concurrent access by multiple users and computational agents
+7. Data integrity: identification, dependencies, groupings
+8. Automated distributed computation
diff --git a/docs/src/concepts/teamwork.md b/docs/src/concepts/teamwork.md
index c25394441..4cccea9f5 100644
--- a/docs/src/concepts/teamwork.md
+++ b/docs/src/concepts/teamwork.md
@@ -5,10 +5,9 @@
Science labs organize their projects as a sequence of activities of experiment design,
data acquisition, and processing and analysis.
-
- {: style="width:520px; align:center"}
- Workflow and dataflow in a common findings-centered approach to data science in a science lab.
-
+{: style="width:510px; display:block; margin: 0 auto;"}
+
+Workflow and dataflow in a common findings-centered approach to data science in a science lab.
Many labs lack a uniform data management strategy that would span longitudinally across
the entire project lifecycle as well as laterally across different projects.
@@ -29,10 +28,9 @@ This approach requires formulating a general data science plan and upfront inves
for setting up resources and processes and training the teams.
The team uses DataJoint to build data pipelines to support multiple projects.
-
- {: style="width:510px; align:center"}
- Workflow and dataflow in a data pipeline-centered approach.
-
+{: style="width:510px; display:block; margin: 0 auto;"}
+
+Workflow and dataflow in a data pipeline-centered approach.
Data pipelines support project data across their entire lifecycle, including the
following functions
@@ -55,42 +53,41 @@ data integrity.
The adoption of a uniform data management framework allows separation of roles and
division of labor among team members, leading to greater efficiency and better scaling.
-
- {: style="width:350px; align:center"}
- Distinct responsibilities of data science and data engineering.
-
+{: style="width:510px; display:block; margin: 0 auto;"}
+
+Distinct responsibilities of data science and data engineering.
-Scientists
+### Scientists
- design and conduct experiments, collecting data.
- They interact with the data pipeline through graphical user interfaces designed by
- others.
- They understand what analysis is used to test their hypotheses.
+Design and conduct experiments, collecting data.
+They interact with the data pipeline through graphical user interfaces designed by
+others.
+They understand what analysis is used to test their hypotheses.
-Data scientists
+### Data scientists
- have the domain expertise and select and implement the processing and analysis
- methods for experimental data.
- Data scientists are in charge of defining and managing the data pipeline using
- DataJoint's data model, but they may not know the details of the underlying
- architecture.
- They interact with the pipeline using client programming interfaces directly from
- languages such as MATLAB and Python.
+Have the domain expertise and select and implement the processing and analysis
+methods for experimental data.
+Data scientists are in charge of defining and managing the data pipeline using
+DataJoint's data model, but they may not know the details of the underlying
+architecture.
+They interact with the pipeline using client programming interfaces directly from
+languages such as MATLAB and Python.
- The bulk of this manual is written for working data scientists, except for System
- Administration.
+The bulk of this manual is written for working data scientists, except for System
+Administration.
-Data engineers
+### Data engineers
- work with the data scientists to support the data pipeline.
- They rely on their understanding of the DataJoint data model to configure and
- administer the required IT resources such as database servers, data storage
- servers, networks, cloud instances, [Globus](https://globus.org) endpoints, etc.
- Data engineers can provide general solutions such as web hosting, data publishing,
- interfaces, exports and imports.
+Work with the data scientists to support the data pipeline.
+They rely on their understanding of the DataJoint data model to configure and
+administer the required IT resources such as database servers, data storage
+servers, networks, cloud instances, [Globus](https://globus.org) endpoints, etc.
+Data engineers can provide general solutions such as web hosting, data publishing,
+interfaces, exports and imports.
- The System Administration section of this tutorial contains materials helpful in
- accomplishing these tasks.
+The System Administration section of this tutorial contains materials helpful in
+accomplishing these tasks.
DataJoint is designed to delineate a clean boundary between **data science** and **data
engineering**.
diff --git a/docs/src/design/alter.md b/docs/src/design/alter.md
index fe791a11f..70ed39341 100644
--- a/docs/src/design/alter.md
+++ b/docs/src/design/alter.md
@@ -1 +1,53 @@
# Altering Populated Pipelines
+
+Tables can be altered after they have been declared and populated. This is useful when
+you want to add new secondary attributes or change the data type of existing attributes.
+Users can use the `definition` property to update a table's attributes and then use
+`alter` to apply the changes in the database. Currently, `alter` does not support
+changes to primary key attributes.
+
+Let's say we have a table `Student` with the following attributes:
+
+```python
+@schema
+class Student(dj.Manual):
+ definition = """
+ student_id: int
+ ---
+ first_name: varchar(40)
+ last_name: varchar(40)
+ home_address: varchar(100)
+ """
+```
+
+We can modify the table to include a new attribute `email`:
+
+```python
+Student.definition = """
+student_id: int
+---
+first_name: varchar(40)
+last_name: varchar(40)
+home_address: varchar(100)
+email: varchar(100)
+"""
+Student.alter()
+```
+
+The `alter` method will update the table in the database to include the new attribute
+`email` added by the user in the table's `definition` property.
+
+Similarly, you can modify the data type or length of an existing attribute. For example,
+to alter the `home_address` attribute to have a length of 200 characters:
+
+```python
+Student.definition = """
+student_id: int
+---
+first_name: varchar(40)
+last_name: varchar(40)
+home_address: varchar(200)
+email: varchar(100)
+"""
+Student.alter()
+```
diff --git a/docs/src/design/integrity.md b/docs/src/design/integrity.md
index 8c1f93376..299a2a45a 100644
--- a/docs/src/design/integrity.md
+++ b/docs/src/design/integrity.md
@@ -1,7 +1,7 @@
# Data Integrity
-The term **data integrity** describes guarantees made by the data management process
-that prevent errors and corruption in data due to technical failures and human errors
+The term **data integrity** describes guarantees made by the data management process
+that prevent errors and corruption in data due to technical failures and human errors
arising in the course of continuous use by multiple agents.
DataJoint pipelines respect the following forms of data integrity: **entity
integrity**, **referential integrity**, and **group integrity** as described in more
diff --git a/docs/src/design/tables/blobs.md b/docs/src/design/tables/blobs.md
index 76847983e..9f73d54d4 100644
--- a/docs/src/design/tables/blobs.md
+++ b/docs/src/design/tables/blobs.md
@@ -1 +1,26 @@
-# Work in progress
+# Blobs
+
+DataJoint provides functionality for serializing and deserializing complex data types
+into binary blobs for efficient storage and compatibility with MATLAB's mYm
+serialization. This includes support for:
+
++ Basic Python data types (e.g., integers, floats, strings, dictionaries).
++ NumPy arrays and scalars.
++ Specialized data types like UUIDs, decimals, and datetime objects.
+
+## Serialization and Deserialization Process
+
+Serialization converts Python objects into a binary representation for efficient storage
+within the database. Deserialization converts the binary representation back into the
+original Python object.
+
+Blobs over 1 KiB are compressed using the zlib library to reduce storage requirements.
+
+## Supported Data Types
+
+DataJoint supports the following data types for serialization:
+
++ Scalars: Integers, floats, booleans, strings.
++ Collections: Lists, tuples, sets, dictionaries.
++ NumPy: Arrays, structured arrays, and scalars.
++ Custom Types: UUIDs, decimals, datetime objects, MATLAB cell and struct arrays.
diff --git a/docs/src/design/tables/customtype.md b/docs/src/design/tables/customtype.md
index 76847983e..823dd987c 100644
--- a/docs/src/design/tables/customtype.md
+++ b/docs/src/design/tables/customtype.md
@@ -1 +1,80 @@
-# Work in progress
+# Custom Types
+
+In modern scientific research, data pipelines often involve complex workflows that
+generate diverse data types. From high-dimensional imaging data to machine learning
+models, these data types frequently exceed the basic representations supported by
+traditional relational databases. For example:
+
++ A lab working on neural connectivity might use graph objects to represent brain
+ networks.
++ Researchers processing raw imaging data might store custom objects for pre-processing
+ configurations.
++ Computational biologists might store fitted machine learning models or parameter
+ objects for downstream predictions.
+
+To handle these diverse needs, DataJoint provides the `dj.AttributeAdapter` method. It
+enables researchers to store and retrieve complex, non-standard data types—like Python
+objects or data structures—in a relational database while maintaining the
+reproducibility, modularity, and query capabilities required for scientific workflows.
+
+## Uses in Scientific Research
+
+Imagine a neuroscience lab studying neural connectivity. Researchers might generate
+graphs (e.g., networkx.Graph) to represent connections between brain regions, where:
+
++ Nodes are brain regions.
++ Edges represent connections weighted by signal strength or another metric.
+
+Storing these graph objects in a database alongside other experimental data (e.g.,
+subject metadata, imaging parameters) ensures:
+
+1. Centralized Data Management: All experimental data and analysis results are stored
+ together for easy access and querying.
+2. Reproducibility: The exact graph objects used in analysis can be retrieved later for
+ validation or further exploration.
+3. Scalability: Graph data can be integrated into workflows for larger datasets or
+ across experiments.
+
+However, since graphs are not natively supported by relational databases, here’s where
+`dj.AttributeAdapter` becomes essential. It allows researchers to define custom logic for
+serializing graphs (e.g., as edge lists) and deserializing them back into Python
+objects, bridging the gap between advanced data types and the database.
+
+### Example: Storing Graphs in DataJoint
+
+To store a networkx.Graph object in a DataJoint table, researchers can define a custom
+attribute type in a datajoint table class:
+
+```python
+import datajoint as dj
+
+class GraphAdapter(dj.AttributeAdapter):
+
+ attribute_type = 'longblob' # this is how the attribute will be declared
+
+ def put(self, obj):
+ # convert the nx.Graph object into an edge list
+ assert isinstance(obj, nx.Graph)
+ return list(obj.edges)
+
+ def get(self, value):
+ # convert edge list back into an nx.Graph
+ return nx.Graph(value)
+
+
+# instantiate for use as a datajoint type
+graph = GraphAdapter()
+
+
+# define a table with a graph attribute
+schema = dj.schema('test_graphs')
+
+
+@schema
+class Connectivity(dj.Manual):
+ definition = """
+ conn_id : int
+ ---
+ conn_graph = null : # a networkx.Graph object
+ """
+```
diff --git a/docs/src/design/tables/indexes.md b/docs/src/design/tables/indexes.md
index 76847983e..8c0b53f15 100644
--- a/docs/src/design/tables/indexes.md
+++ b/docs/src/design/tables/indexes.md
@@ -1 +1,97 @@
-# Work in progress
+# Indexes
+
+Table indexes are data structures that allow fast lookups by an indexed attribute or
+combination of attributes.
+
+In DataJoint, indexes are created by one of the three mechanisms:
+
+1. Primary key
+2. Foreign key
+3. Explicitly defined indexes
+
+The first two mechanisms are obligatory. Every table has a primary key, which serves as
+an unique index. Therefore, restrictions by a primary key are very fast. Foreign keys
+create additional indexes unless a suitable index already exists.
+
+## Indexes for single primary key tables
+
+Let’s say a mouse in the lab has a lab-specific ID but it also has a separate id issued
+by the animal facility.
+
+```python
+@schema
+class Mouse(dj.Manual):
+ definition = """
+ mouse_id : int # lab-specific ID
+ ---
+ tag_id : int # animal facility ID
+ """
+```
+
+In this case, searching for a mouse by `mouse_id` is much faster than by `tag_id`
+because `mouse_id` is a primary key, and is therefore indexed.
+
+To make searches faster on fields other than the primary key or a foreign key, you can
+add a secondary index explicitly.
+
+Regular indexes are declared as `index(attr1, ..., attrN)` on a separate line anywhere in
+the table declration (below the primary key divide).
+
+Indexes can be declared with unique constraint as `unique index (attr1, ..., attrN)`.
+
+Let’s redeclare the table with a unique index on `tag_id`.
+
+```python
+@schema
+class Mouse(dj.Manual):
+ definition = """
+ mouse_id : int # lab-specific ID
+ ---
+ tag_id : int # animal facility ID
+ unique index (tag_id)
+ """
+```
+Now, searches with `mouse_id` and `tag_id` are similarly fast.
+
+## Indexes for tables with multiple primary keys
+
+Let’s now imagine that rats in a lab are identified by the combination of `lab_name` and
+`rat_id` in a table `Rat`.
+
+```python
+@schema
+class Rat(dj.Manual):
+ definition = """
+ lab_name : char(16)
+ rat_id : int unsigned # lab-specific ID
+ ---
+ date_of_birth = null : date
+ """
+```
+Note that despite the fact that `rat_id` is in the index, searches by `rat_id` alone are not
+helped by the index because it is not first in the index. This is similar to searching for
+a word in a dictionary that orders words alphabetically. Searching by the first letters
+of a word is easy but searching by the last few letters of a word requires scanning the
+whole dictionary.
+
+In this table, the primary key is a unique index on the combination `(lab_name, rat_id)`.
+Therefore searches on these attributes or on `lab_name` alone are fast. But this index
+cannot help searches on `rat_id` alone. Similarly, searing by `date_of_birth` requires a
+full-table scan and is inefficient.
+
+To speed up searches by the `rat_id` and `date_of_birth`, we can explicit indexes to
+`Rat`:
+
+```python
+@schema
+class Rat2(dj.Manual):
+ definition = """
+ lab_name : char(16)
+ rat_id : int unsigned # lab-specific ID
+ ---
+ date_of_birth = null : date
+
+ index(rat_id)
+ index(date_of_birth)
+ """
+```
diff --git a/docs/src/faq.md b/docs/src/faq.md
index e82d67588..06ebbc2db 100644
--- a/docs/src/faq.md
+++ b/docs/src/faq.md
@@ -4,17 +4,18 @@
It is common to enter data during experiments using a graphical user interface.
-1. [DataJoint LabBook](https://github.com/datajoint/datajoint-labbook) is an open
-source project for data entry.
+1. The [DataJoint platform](https://works.datajoint.com) platform is a web-based,
+ end-to-end platform to host and execute data pipelines.
-2. The DataJoint Works platform is set up as a fully managed service to host and
-execute data pipelines.
+2. [DataJoint LabBook](https://github.com/datajoint/datajoint-labbook) is an open
+source project for data entry but is no longer actively maintained.
## Does DataJoint support other programming languages?
-DataJoint [Python](https://datajoint.com/docs/core/datajoint-python/) and
-[Matlab](https://datajoint.com/docs/core/datajoint-matlab/) APIs are both actively
-supported. Previous projects implemented some DataJoint features in
+DataJoint [Python](https://datajoint.com/docs/core/datajoint-python/) is the most
+up-to-date version and all future development will focus on the Python API. The
+[Matlab](https://datajoint.com/docs/core/datajoint-matlab/) API was actively developed
+through 2023. Previous projects implemented some DataJoint features in
[Julia](https://github.com/BrainCOGS/neuronex_workshop_2018/tree/julia/julia) and
[Rust](https://github.com/datajoint/datajoint-core). DataJoint's data model and data
representation are largely language independent, which means that any language with a
@@ -92,7 +93,7 @@ The entry of metadata can be manual, or it can be an automated part of data acqu
into the database).
Depending on their size and contents, raw data files can be stored in a number of ways.
-In the simplest and most common scenario, raw data continue to be stored in either a
+In the simplest and most common scenario, raw data continues to be stored in either a
local filesystem or in the cloud as collections of files and folders.
The paths to these files are entered in the database (again, either manually or by
automated processes).
@@ -100,8 +101,8 @@ This is the point at which the notion of a **data pipeline** begins.
Below these "manual tables" that contain metadata and file paths are a series of tables
that load raw data from these files, process it in some way, and insert derived or
summarized data directly into the database.
-For example, in an imaging application, the very large raw .TIFF stacks would reside on
-the filesystem, but the extracted fluorescent trace timeseries for each cell in the
+For example, in an imaging application, the very large raw `.TIFF` stacks would reside on
+the filesystem, but the extracted fluorescent trace timeseries for each cell in the
image would be stored as a numerical array directly in the database.
Or the raw video used for animal tracking might be stored in a standard video format on
the filesystem, but the computed X/Y positions of the animal would be stored in the
@@ -163,8 +164,8 @@ This brings us to the final important question:
## How do I get my data out?
-This is the fun part. See [queries](query/operators.md) for details of the DataJoint
-query language directly from MATLAB and Python.
+This is the fun part. See [queries](query/operators.md) for details of the DataJoint
+query language directly from Python.
## Interfaces
diff --git a/docs/src/internal/transpilation.md b/docs/src/internal/transpilation.md
index cc02380c0..b263c7528 100644
--- a/docs/src/internal/transpilation.md
+++ b/docs/src/internal/transpilation.md
@@ -34,7 +34,7 @@ restriction appending the new condition to the input's restriction.
Property `support` represents the `FROM` clause and contains a list of either
`QueryExpression` objects or table names in the case of base queries.
-The joint operator `*` adds new elements to the `support` attribute.
+The join operator `*` adds new elements to the `support` attribute.
At least one element must be present in `support`. Multiple elements in `support`
indicate a join.
@@ -56,10 +56,10 @@ self: `heading`, `restriction`, and `support`.
The input object is treated as a subquery in the following cases:
-1. A restriction is applied that uses alias attributes in the heading
-1. A projection uses an alias attribute to create a new alias attribute.
-1. A join is performed on an alias attribute.
-1. An Aggregation is used a restriction.
+1. A restriction is applied that uses alias attributes in the heading.
+2. A projection uses an alias attribute to create a new alias attribute.
+3. A join is performed on an alias attribute.
+4. An Aggregation is used a restriction.
An error arises if
@@ -117,8 +117,8 @@ input — the *aggregated* query expression.
The SQL equivalent of aggregation is
1. the NATURAL LEFT JOIN of the two inputs.
-1. followed by a GROUP BY on the primary key arguments of the first input
-1. followed by a projection.
+2. followed by a GROUP BY on the primary key arguments of the first input
+3. followed by a projection.
The projection works the same as `.proj` with respect to the first input.
With respect to the second input, the projection part of aggregation allows only
diff --git a/docs/src/manipulation/transactions.md b/docs/src/manipulation/transactions.md
index 4b05fc528..c7d6951a7 100644
--- a/docs/src/manipulation/transactions.md
+++ b/docs/src/manipulation/transactions.md
@@ -6,7 +6,7 @@ interrupting the sequence of such operations halfway would leave the data in an
state.
While the sequence is in progress, other processes accessing the database will not see
the partial results until the transaction is complete.
-The sequence make include [data queries](../query/principles.md) and
+The sequence may include [data queries](../query/principles.md) and
[manipulations](index.md).
In such cases, the sequence of operations may be enclosed in a transaction.
diff --git a/docs/src/publish-data.md b/docs/src/publish-data.md
index 774cf0456..d766f49da 100644
--- a/docs/src/publish-data.md
+++ b/docs/src/publish-data.md
@@ -23,12 +23,12 @@ populated DataJoint pipeline.
One example of publishing a DataJoint pipeline as a docker container is
> Sinz, F., Ecker, A.S., Fahey, P., Walker, E., Cobos, E., Froudarakis, E., Yatsenko, D., Pitkow, Z., Reimer, J. and Tolias, A., 2018. Stimulus domain transfer in recurrent models for large scale cortical population prediction on video. In Advances in Neural Information Processing Systems (pp. 7198-7209). https://www.biorxiv.org/content/early/2018/10/25/452672
-The code and the data can be found at https://github.com/sinzlab/Sinz2018_NIPS
+The code and the data can be found at [https://github.com/sinzlab/Sinz2018_NIPS](https://github.com/sinzlab/Sinz2018_NIPS).
## Exporting into a collection of files
-Another option for publishing and archiving data is to export the data from the
+Another option for publishing and archiving data is to export the data from the
DataJoint pipeline into a collection of files.
-DataJoint provides features for exporting and importing sections of the pipeline.
-Several ongoing projects are implementing the capability to export from DataJoint
+DataJoint provides features for exporting and importing sections of the pipeline.
+Several ongoing projects are implementing the capability to export from DataJoint
pipelines into [Neurodata Without Borders](https://www.nwb.org/) files.
diff --git a/docs/src/query/restrict.md b/docs/src/query/restrict.md
index 74d396183..f8b61e641 100644
--- a/docs/src/query/restrict.md
+++ b/docs/src/query/restrict.md
@@ -191,3 +191,15 @@ experiments that are part of sessions performed by Alice.
query = Session & 'user = "Alice"'
Experiment & query
```
+
+## Restriction by `dj.Top`
+
+Restriction by `dj.Top` returns the number of entities specified by the `limit`
+argument. These entities can be returned in the order specified by the `order_by`
+argument. And finally, the `offset` argument can be used to offset the returned entities
+which is useful for pagination in web applications.
+
+```python
+# Return the first 10 sessions in descending order of session date
+Session & dj.Top(limit=10, order_by='session_date DESC')
+```
diff --git a/docs/src/quick-start.md b/docs/src/quick-start.md
index 056e2f6c6..f3309c066 100644
--- a/docs/src/quick-start.md
+++ b/docs/src/quick-start.md
@@ -1,5 +1,14 @@
# Quick Start Guide
+## Tutorials
+
+The easiest way to get started is through the [DataJoint
+Tutorials](https://github.com/datajoint/datajoint-tutorials). These tutorials are
+configured to run using [GitHub Codespaces](https://github.com/features/codespaces)
+where the full environment including the database is already set up.
+
+Advanced users can install DataJoint locally. Please see the installation instructions below.
+
## Installation
First, please [install Python](https://www.python.org/downloads/) version
diff --git a/docs/src/sysadmin/bulk-storage.md b/docs/src/sysadmin/bulk-storage.md
index 1289b8c9b..12af44791 100644
--- a/docs/src/sysadmin/bulk-storage.md
+++ b/docs/src/sysadmin/bulk-storage.md
@@ -8,18 +8,17 @@ significant and useful for a number of reasons.
### Cost
-One of these is that the high-performance storage commonly used in
-database systems is more expensive than that used in more typical
-commodity storage, and so storing the smaller identifying information
-typically used in queries on fast, relational database storage and
-storing the larger bulk data used for analysis or processing on lower
-cost commodity storage can allow for large savings in storage expense.
+One reason is that the high-performance storage commonly used in database systems is
+more expensive than typical commodity storage. Therefore, storing the smaller identifying
+information typically used in queries on fast, relational database storage and storing
+the larger bulk data used for analysis or processing on lower cost commodity storage
+enables large savings in storage expense.
### Flexibility
Storing bulk data separately also facilitates more flexibility in
usage, since the bulk data can managed using separate maintenance
-processes than that in the relational storage.
+processes than those in the relational storage.
For example, larger relational databases may require many hours to be
restored in the event of system failures. If the relational portion of
@@ -40,11 +39,10 @@ been retrieved in previous queries.
### Data Sharing
-DataJoint provides pluggable support for different external bulk
-storage backends, which can provide benefits for data sharing by
-publishing bulk data to S3-Protocol compatible data shares both in the
-cloud and on locally managed systems and other common tools for data
-sharing, such as Globus, etc.
+DataJoint provides pluggable support for different external bulk storage backends,
+allowing data sharing by publishing bulk data to S3-Protocol compatible data shares both
+in the cloud and on locally managed systems and other common tools for data sharing,
+such as Globus, etc.
## Bulk Storage Scenarios
diff --git a/docs/src/sysadmin/database-admin.md b/docs/src/sysadmin/database-admin.md
index 63f3afb7b..352a3af11 100644
--- a/docs/src/sysadmin/database-admin.md
+++ b/docs/src/sysadmin/database-admin.md
@@ -179,7 +179,7 @@ grouped together by common prefixes. For example, a lab may have a
collection of schemas that begin with `common_`. Some common
processing may be organized into several schemas that begin with
`pipeline_`. Typically each user has all privileges to schemas that
-begin with her username.
+begin with their username.
For example, alice may have privileges to select and insert data from
the common schemas (but not create new tables), and have all
diff --git a/docs/src/sysadmin/external-store.md b/docs/src/sysadmin/external-store.md
index dbcdc169d..aac61fe24 100644
--- a/docs/src/sysadmin/external-store.md
+++ b/docs/src/sysadmin/external-store.md
@@ -255,19 +255,19 @@ to upgrade to DataJoint v0.12, the following process should be followed:
5. Migrate external tracking tables for each schema to use the new format. For
instance in Python:
- ```python
- import datajoint.migrate as migrate
- db_schema_name='schema_1'
- external_store='raw'
- migrate.migrate_dj011_external_blob_storage_to_dj012(db_schema_name, external_store)
- ```
+ ```python
+ import datajoint.migrate as migrate
+ db_schema_name='schema_1'
+ external_store='raw'
+ migrate.migrate_dj011_external_blob_storage_to_dj012(db_schema_name, external_store)
+ ```
6. Verify pipeline functionality after this process has completed. For instance in
Python:
- ```python
- x = myschema.TableWithExternal.fetch('external_field', limit=1)[0]
- ```
+ ```python
+ x = myschema.TableWithExternal.fetch('external_field', limit=1)[0]
+ ```
Note: This migration function is provided on a best-effort basis, and will
convert the external tracking tables into a format which is compatible
diff --git a/docs/src/tutorials/dj-top.ipynb b/docs/src/tutorials/dj-top.ipynb
new file mode 100644
index 000000000..bbfe59f11
--- /dev/null
+++ b/docs/src/tutorials/dj-top.ipynb
@@ -0,0 +1,1004 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Using the dj.Top restriction"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "First you will need to [install](../../getting-started/#installation) and [connect](../../getting-started/#connection) to a DataJoint [data pipeline](https://datajoint.com/docs/core/glossary/#data-pipeline).\n",
+ "\n",
+ "Now let's start by importing the `datajoint` client."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "[2024-12-20 11:10:20,120][INFO]: Connecting root@127.0.0.1:3306\n",
+ "[2024-12-20 11:10:20,259][INFO]: Connected root@127.0.0.1:3306\n"
+ ]
+ }
+ ],
+ "source": [
+ "import datajoint as dj\n",
+ "dj.config[\"database.host\"] = \"127.0.0.1\"\n",
+ "schema = dj.Schema('university')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Table Definition"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "@schema\n",
+ "class Student(dj.Manual):\n",
+ " definition = \"\"\"\n",
+ " student_id : int unsigned # university-wide ID number\n",
+ " ---\n",
+ " first_name : varchar(40)\n",
+ " last_name : varchar(40)\n",
+ " sex : enum('F', 'M', 'U')\n",
+ " date_of_birth : date\n",
+ " home_address : varchar(120) # mailing street address\n",
+ " home_city : varchar(60) # mailing address\n",
+ " home_state : char(2) # US state acronym: e.g. OH\n",
+ " home_zip : char(10) # zipcode e.g. 93979-4979\n",
+ " home_phone : varchar(20) # e.g. 414.657.6883x0881\n",
+ " \"\"\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "@schema\n",
+ "class Department(dj.Manual):\n",
+ " definition = \"\"\"\n",
+ " dept : varchar(6) # abbreviated department name, e.g. BIOL\n",
+ " ---\n",
+ " dept_name : varchar(200) # full department name\n",
+ " dept_address : varchar(200) # mailing address\n",
+ " dept_phone : varchar(20)\n",
+ " \"\"\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "@schema\n",
+ "class StudentMajor(dj.Manual):\n",
+ " definition = \"\"\"\n",
+ " -> Student\n",
+ " ---\n",
+ " -> Department\n",
+ " declare_date : date # when student declared her major\n",
+ " \"\"\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "@schema\n",
+ "class Course(dj.Manual):\n",
+ " definition = \"\"\"\n",
+ " -> Department\n",
+ " course : int unsigned # course number, e.g. 1010\n",
+ " ---\n",
+ " course_name : varchar(200) # e.g. \"Neurobiology of Sensation and Movement.\"\n",
+ " credits : decimal(3,1) # number of credits earned by completing the course\n",
+ " \"\"\"\n",
+ " \n",
+ "@schema\n",
+ "class Term(dj.Manual):\n",
+ " definition = \"\"\"\n",
+ " term_year : year\n",
+ " term : enum('Spring', 'Summer', 'Fall')\n",
+ " \"\"\"\n",
+ "\n",
+ "@schema\n",
+ "class Section(dj.Manual):\n",
+ " definition = \"\"\"\n",
+ " -> Course\n",
+ " -> Term\n",
+ " section : char(1)\n",
+ " ---\n",
+ " auditorium : varchar(12)\n",
+ " \"\"\"\n",
+ " \n",
+ "@schema\n",
+ "class CurrentTerm(dj.Manual):\n",
+ " definition = \"\"\"\n",
+ " -> Term\n",
+ " \"\"\"\n",
+ "\n",
+ "@schema\n",
+ "class Enroll(dj.Manual):\n",
+ " definition = \"\"\"\n",
+ " -> Student\n",
+ " -> Section\n",
+ " \"\"\"\n",
+ "\n",
+ "@schema\n",
+ "class LetterGrade(dj.Lookup):\n",
+ " definition = \"\"\"\n",
+ " grade : char(2)\n",
+ " ---\n",
+ " points : decimal(3,2)\n",
+ " \"\"\"\n",
+ " contents = [\n",
+ " ['A', 4.00],\n",
+ " ['A-', 3.67],\n",
+ " ['B+', 3.33],\n",
+ " ['B', 3.00],\n",
+ " ['B-', 2.67],\n",
+ " ['C+', 2.33],\n",
+ " ['C', 2.00],\n",
+ " ['C-', 1.67],\n",
+ " ['D+', 1.33],\n",
+ " ['D', 1.00],\n",
+ " ['F', 0.00]\n",
+ " ]\n",
+ "\n",
+ "@schema\n",
+ "class Grade(dj.Manual):\n",
+ " definition = \"\"\"\n",
+ " -> Enroll\n",
+ " ---\n",
+ " -> LetterGrade\n",
+ " \"\"\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Insert"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from tqdm import tqdm\n",
+ "import faker\n",
+ "import random\n",
+ "import datetime\n",
+ "fake = faker.Faker()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def yield_students():\n",
+ " fake_name = {'F': fake.name_female, 'M': fake.name_male}\n",
+ " while True: # ignore invalid values\n",
+ " try:\n",
+ " sex = random.choice(('F', 'M'))\n",
+ " first_name, last_name = fake_name[sex]().split(' ')[:2]\n",
+ " street_address, city = fake.address().split('\\n')\n",
+ " city, state = city.split(', ')\n",
+ " state, zipcode = state.split(' ') \n",
+ " except ValueError:\n",
+ " continue\n",
+ " else:\n",
+ " yield dict(\n",
+ " first_name=first_name,\n",
+ " last_name=last_name,\n",
+ " sex=sex,\n",
+ " home_address=street_address,\n",
+ " home_city=city,\n",
+ " home_state=state,\n",
+ " home_zip=zipcode,\n",
+ " date_of_birth=str(\n",
+ " fake.date_time_between(start_date=\"-35y\", end_date=\"-15y\").date()),\n",
+ " home_phone = fake.phone_number()[:20])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "Student.insert(\n",
+ " dict(k, student_id=i) for i, k in zip(range(100,300), yield_students()))\n",
+ "\n",
+ "Department.insert(\n",
+ " dict(dept=dept, \n",
+ " dept_name=name, \n",
+ " dept_address=fake.address(), \n",
+ " dept_phone=fake.phone_number()[:20])\n",
+ " for dept, name in [\n",
+ " [\"CS\", \"Computer Science\"],\n",
+ " [\"BIOL\", \"Life Sciences\"],\n",
+ " [\"PHYS\", \"Physics\"],\n",
+ " [\"MATH\", \"Mathematics\"]])\n",
+ "\n",
+ "StudentMajor.insert({**s, **d, \n",
+ " 'declare_date':fake.date_between(start_date=datetime.date(1999,1,1))}\n",
+ " for s, d in zip(Student.fetch('KEY'), random.choices(Department.fetch('KEY'), k=len(Student())))\n",
+ " if random.random() < 0.75)\n",
+ "\n",
+ "# from https://www.utah.edu/\n",
+ "Course.insert([\n",
+ " ['BIOL', 1006, 'World of Dinosaurs', 3],\n",
+ " ['BIOL', 1010, 'Biology in the 21st Century', 3],\n",
+ " ['BIOL', 1030, 'Human Biology', 3],\n",
+ " ['BIOL', 1210, 'Principles of Biology', 4],\n",
+ " ['BIOL', 2010, 'Evolution & Diversity of Life', 3],\n",
+ " ['BIOL', 2020, 'Principles of Cell Biology', 3],\n",
+ " ['BIOL', 2021, 'Principles of Cell Science', 4],\n",
+ " ['BIOL', 2030, 'Principles of Genetics', 3],\n",
+ " ['BIOL', 2210, 'Human Genetics',3],\n",
+ " ['BIOL', 2325, 'Human Anatomy', 4],\n",
+ " ['BIOL', 2330, 'Plants & Society', 3],\n",
+ " ['BIOL', 2355, 'Field Botany', 2],\n",
+ " ['BIOL', 2420, 'Human Physiology', 4],\n",
+ "\n",
+ " ['PHYS', 2040, 'Classcal Theoretical Physics II', 4],\n",
+ " ['PHYS', 2060, 'Quantum Mechanics', 3],\n",
+ " ['PHYS', 2100, 'General Relativity and Cosmology', 3],\n",
+ " ['PHYS', 2140, 'Statistical Mechanics', 4],\n",
+ " \n",
+ " ['PHYS', 2210, 'Physics for Scientists and Engineers I', 4], \n",
+ " ['PHYS', 2220, 'Physics for Scientists and Engineers II', 4],\n",
+ " ['PHYS', 3210, 'Physics for Scientists I (Honors)', 4],\n",
+ " ['PHYS', 3220, 'Physics for Scientists II (Honors)', 4],\n",
+ " \n",
+ " ['MATH', 1250, 'Calculus for AP Students I', 4],\n",
+ " ['MATH', 1260, 'Calculus for AP Students II', 4],\n",
+ " ['MATH', 1210, 'Calculus I', 4],\n",
+ " ['MATH', 1220, 'Calculus II', 4],\n",
+ " ['MATH', 2210, 'Calculus III', 3],\n",
+ " \n",
+ " ['MATH', 2270, 'Linear Algebra', 4],\n",
+ " ['MATH', 2280, 'Introduction to Differential Equations', 4],\n",
+ " ['MATH', 3210, 'Foundations of Analysis I', 4],\n",
+ " ['MATH', 3220, 'Foundations of Analysis II', 4],\n",
+ " \n",
+ " ['CS', 1030, 'Foundations of Computer Science', 3],\n",
+ " ['CS', 1410, 'Introduction to Object-Oriented Programming', 4],\n",
+ " ['CS', 2420, 'Introduction to Algorithms & Data Structures', 4],\n",
+ " ['CS', 2100, 'Discrete Structures', 3],\n",
+ " ['CS', 3500, 'Software Practice', 4],\n",
+ " ['CS', 3505, 'Software Practice II', 3],\n",
+ " ['CS', 3810, 'Computer Organization', 4],\n",
+ " ['CS', 4400, 'Computer Systems', 4],\n",
+ " ['CS', 4150, 'Algorithms', 3],\n",
+ " ['CS', 3100, 'Models of Computation', 3],\n",
+ " ['CS', 3200, 'Introduction to Scientific Computing', 3],\n",
+ " ['CS', 4000, 'Senior Capstone Project - Design Phase', 3],\n",
+ " ['CS', 4500, 'Senior Capstone Project', 3],\n",
+ " ['CS', 4940, 'Undergraduate Research', 3],\n",
+ " ['CS', 4970, 'Computer Science Bachelor''s Thesis', 3]])\n",
+ "\n",
+ "Term.insert(dict(term_year=year, term=term) \n",
+ " for year in range(1999, 2019) \n",
+ " for term in ['Spring', 'Summer', 'Fall'])\n",
+ "\n",
+ "Term().fetch(order_by=('term_year DESC', 'term DESC'), as_dict=True, limit=1)[0]\n",
+ "\n",
+ "CurrentTerm().insert1({\n",
+ " **Term().fetch(order_by=('term_year DESC', 'term DESC'), as_dict=True, limit=1)[0]})\n",
+ "\n",
+ "def make_section(prob):\n",
+ " for c in (Course * Term).proj():\n",
+ " for sec in 'abcd':\n",
+ " if random.random() < prob:\n",
+ " break\n",
+ " yield {\n",
+ " **c, 'section': sec, \n",
+ " 'auditorium': random.choice('ABCDEF') + str(random.randint(1,100))} \n",
+ "\n",
+ "Section.insert(make_section(0.5))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|██████████| 200/200 [00:27<00:00, 7.17it/s]\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Enrollment \n",
+ "terms = Term().fetch('KEY')\n",
+ "quit_prob = 0.1\n",
+ "for student in tqdm(Student.fetch('KEY')):\n",
+ " start_term = random.randrange(len(terms))\n",
+ " for term in terms[start_term:]:\n",
+ " if random.random() < quit_prob:\n",
+ " break\n",
+ " else:\n",
+ " sections = ((Section & term) - (Course & (Enroll & student))).fetch('KEY')\n",
+ " if sections:\n",
+ " Enroll.insert({**student, **section} for section in \n",
+ " random.sample(sections, random.randrange(min(5, len(sections)))))\n",
+ " \n",
+ "# assign random grades\n",
+ "grades = LetterGrade.fetch('grade')\n",
+ "\n",
+ "grade_keys = Enroll.fetch('KEY')\n",
+ "random.shuffle(grade_keys)\n",
+ "grade_keys = grade_keys[:len(grade_keys)*9//10]\n",
+ "\n",
+ "Grade.insert({**key, 'grade':grade} \n",
+ " for key, grade in zip(grade_keys, random.choices(grades, k=len(grade_keys))))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# dj.Top Restriction"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
student_id
\n",
+ " university-wide ID number\n",
+ "
\n",
+ "
dept
\n",
+ " abbreviated department name, e.g. BIOL\n",
+ "
\n",
+ " abbreviated department name, e.g. BIOL\n",
+ "
\n",
+ "
course
\n",
+ " course number, e.g. 1010\n",
+ "
\n",
+ "
term_year
\n",
+ " \n",
+ "
\n",
+ "
term
\n",
+ " \n",
+ "
\n",
+ "
section
\n",
+ " \n",
+ "
\n",
+ "
grade
\n",
+ " \n",
+ "
\n",
+ "
points
\n",
+ " \n",
+ "
\n",
+ "
100
\n",
+ "
CS
\n",
+ "
3200
\n",
+ "
2018
\n",
+ "
Fall
\n",
+ "
c
\n",
+ "
A
\n",
+ "
4.00
100
\n",
+ "
MATH
\n",
+ "
2280
\n",
+ "
2018
\n",
+ "
Fall
\n",
+ "
a
\n",
+ "
A-
\n",
+ "
3.67
100
\n",
+ "
PHYS
\n",
+ "
2210
\n",
+ "
2018
\n",
+ "
Spring
\n",
+ "
d
\n",
+ "
A
\n",
+ "
4.00
122
\n",
+ "
CS
\n",
+ "
1030
\n",
+ "
2018
\n",
+ "
Fall
\n",
+ "
c
\n",
+ "
B+
\n",
+ "
3.33
131
\n",
+ "
BIOL
\n",
+ "
2030
\n",
+ "
2018
\n",
+ "
Spring
\n",
+ "
a
\n",
+ "
A
\n",
+ "
4.00
131
\n",
+ "
CS
\n",
+ "
3200
\n",
+ "
2018
\n",
+ "
Fall
\n",
+ "
b
\n",
+ "
B+
\n",
+ "
3.33
136
\n",
+ "
BIOL
\n",
+ "
2210
\n",
+ "
2018
\n",
+ "
Spring
\n",
+ "
c
\n",
+ "
B+
\n",
+ "
3.33
136
\n",
+ "
MATH
\n",
+ "
2210
\n",
+ "
2018
\n",
+ "
Fall
\n",
+ "
b
\n",
+ "
B+
\n",
+ "
3.33
141
\n",
+ "
BIOL
\n",
+ "
2010
\n",
+ "
2018
\n",
+ "
Summer
\n",
+ "
c
\n",
+ "
B+
\n",
+ "
3.33
141
\n",
+ "
CS
\n",
+ "
2420
\n",
+ "
2018
\n",
+ "
Fall
\n",
+ "
b
\n",
+ "
A
\n",
+ "
4.00
141
\n",
+ "
CS
\n",
+ "
3200
\n",
+ "
2018
\n",
+ "
Fall
\n",
+ "
b
\n",
+ "
A-
\n",
+ "
3.67
182
\n",
+ "
CS
\n",
+ "
1410
\n",
+ "
2018
\n",
+ "
Summer
\n",
+ "
c
\n",
+ "
A-
\n",
+ "
3.67
\n",
+ "
\n",
+ "
...
\n",
+ "
Total: 20
\n",
+ " "
+ ],
+ "text/plain": [
+ "*student_id *dept *course *term_year *term *section *grade points \n",
+ "+------------+ +------+ +--------+ +-----------+ +--------+ +---------+ +-------+ +--------+\n",
+ "100 CS 3200 2018 Fall c A 4.00 \n",
+ "100 MATH 2280 2018 Fall a A- 3.67 \n",
+ "100 PHYS 2210 2018 Spring d A 4.00 \n",
+ "122 CS 1030 2018 Fall c B+ 3.33 \n",
+ "131 BIOL 2030 2018 Spring a A 4.00 \n",
+ "131 CS 3200 2018 Fall b B+ 3.33 \n",
+ "136 BIOL 2210 2018 Spring c B+ 3.33 \n",
+ "136 MATH 2210 2018 Fall b B+ 3.33 \n",
+ "141 BIOL 2010 2018 Summer c B+ 3.33 \n",
+ "141 CS 2420 2018 Fall b A 4.00 \n",
+ "141 CS 3200 2018 Fall b A- 3.67 \n",
+ "182 CS 1410 2018 Summer c A- 3.67 \n",
+ " ...\n",
+ " (Total: 20)"
+ ]
+ },
+ "execution_count": 47,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "(Grade * LetterGrade) & \"term_year='2018'\" & dj.Top(limit=20, order_by='points DESC', offset=0)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 41,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
grade
\n",
+ " \n",
+ "
\n",
+ "
student_id
\n",
+ " university-wide ID number\n",
+ "
\n",
+ "
dept
\n",
+ " abbreviated department name, e.g. BIOL\n",
+ "
\n",
+ "
course
\n",
+ " course number, e.g. 1010\n",
+ "
\n",
+ "
term_year
\n",
+ " \n",
+ "
\n",
+ "
term
\n",
+ " \n",
+ "
\n",
+ "
section
\n",
+ " \n",
+ "
\n",
+ "
points
\n",
+ " \n",
+ "
\n",
+ "
A
\n",
+ "
100
\n",
+ "
CS
\n",
+ "
3200
\n",
+ "
2018
\n",
+ "
Fall
\n",
+ "
c
\n",
+ "
4.00
A
\n",
+ "
100
\n",
+ "
PHYS
\n",
+ "
2210
\n",
+ "
2018
\n",
+ "
Spring
\n",
+ "
d
\n",
+ "
4.00
A
\n",
+ "
131
\n",
+ "
BIOL
\n",
+ "
2030
\n",
+ "
2018
\n",
+ "
Spring
\n",
+ "
a
\n",
+ "
4.00
A
\n",
+ "
141
\n",
+ "
CS
\n",
+ "
2420
\n",
+ "
2018
\n",
+ "
Fall
\n",
+ "
b
\n",
+ "
4.00
A
\n",
+ "
186
\n",
+ "
PHYS
\n",
+ "
2210
\n",
+ "
2018
\n",
+ "
Spring
\n",
+ "
a
\n",
+ "
4.00
A
\n",
+ "
191
\n",
+ "
MATH
\n",
+ "
2210
\n",
+ "
2018
\n",
+ "
Spring
\n",
+ "
b
\n",
+ "
4.00
A
\n",
+ "
211
\n",
+ "
CS
\n",
+ "
2100
\n",
+ "
2018
\n",
+ "
Fall
\n",
+ "
a
\n",
+ "
4.00
A
\n",
+ "
273
\n",
+ "
PHYS
\n",
+ "
2100
\n",
+ "
2018
\n",
+ "
Spring
\n",
+ "
a
\n",
+ "
4.00
A
\n",
+ "
282
\n",
+ "
BIOL
\n",
+ "
2021
\n",
+ "
2018
\n",
+ "
Spring
\n",
+ "
d
\n",
+ "
4.00
A-
\n",
+ "
100
\n",
+ "
MATH
\n",
+ "
2280
\n",
+ "
2018
\n",
+ "
Fall
\n",
+ "
a
\n",
+ "
3.67
A-
\n",
+ "
141
\n",
+ "
CS
\n",
+ "
3200
\n",
+ "
2018
\n",
+ "
Fall
\n",
+ "
b
\n",
+ "
3.67
A-
\n",
+ "
182
\n",
+ "
CS
\n",
+ "
1410
\n",
+ "
2018
\n",
+ "
Summer
\n",
+ "
c
\n",
+ "
3.67
\n",
+ "
\n",
+ "
...
\n",
+ "
Total: 20
\n",
+ " "
+ ],
+ "text/plain": [
+ "*grade *student_id *dept *course *term_year *term *section points \n",
+ "+-------+ +------------+ +------+ +--------+ +-----------+ +--------+ +---------+ +--------+\n",
+ "A 100 CS 3200 2018 Fall c 4.00 \n",
+ "A 100 PHYS 2210 2018 Spring d 4.00 \n",
+ "A 131 BIOL 2030 2018 Spring a 4.00 \n",
+ "A 141 CS 2420 2018 Fall b 4.00 \n",
+ "A 186 PHYS 2210 2018 Spring a 4.00 \n",
+ "A 191 MATH 2210 2018 Spring b 4.00 \n",
+ "A 211 CS 2100 2018 Fall a 4.00 \n",
+ "A 273 PHYS 2100 2018 Spring a 4.00 \n",
+ "A 282 BIOL 2021 2018 Spring d 4.00 \n",
+ "A- 100 MATH 2280 2018 Fall a 3.67 \n",
+ "A- 141 CS 3200 2018 Fall b 3.67 \n",
+ "A- 182 CS 1410 2018 Summer c 3.67 \n",
+ " ...\n",
+ " (Total: 20)"
+ ]
+ },
+ "execution_count": 41,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "(LetterGrade * Grade) & \"term_year='2018'\" & dj.Top(limit=20, order_by='points DESC', offset=0)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "elements",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.8"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/docs/src/tutorials/json.ipynb b/docs/src/tutorials/json.ipynb
index f83b960bc..f39b43e33 100644
--- a/docs/src/tutorials/json.ipynb
+++ b/docs/src/tutorials/json.ipynb
@@ -6,7 +6,7 @@
"id": "7fe24127-c0d0-4ff8-96b4-6ab0d9307e73",
"metadata": {},
"source": [
- "# Using the `json` type"
+ "# Using the json type"
]
},
{
@@ -39,7 +39,7 @@
"metadata": {},
"outputs": [],
"source": [
- "import datajoint as dj\n"
+ "import datajoint as dj"
]
},
{
@@ -57,9 +57,9 @@
"source": [
"For this exercise, let's imagine we work for an awesome company that is organizing a fun RC car race across various teams in the company. Let's see which team has the fastest car! 🏎️\n",
"\n",
- "This establishes 2 important entities: a `Team` and a `Car`. Normally we'd map this to their own dedicated table, however, let's assume that `Team` is well-structured but `Car` is less structured then we'd prefer. In other words, the structure for what makes up a *car* is varing too much between entries (perhaps because users of the pipeline haven't agreed yet on the definition? 🤷).\n",
+ "This establishes 2 important entities: a `Team` and a `Car`. Normally the entities are mapped to their own dedicated table, however, let's assume that `Team` is well-structured but `Car` is less structured than we'd prefer. In other words, the structure for what makes up a *car* is varying too much between entries (perhaps because users of the pipeline haven't agreed yet on the definition? 🤷).\n",
"\n",
- "This would make it a good use-case to keep `Team` as a table but make `Car` actually a `json` type defined within the `Team` table.\n",
+ "This would make it a good use-case to keep `Team` as a table but make `Car` a `json` type defined within the `Team` table.\n",
"\n",
"Let's begin."
]
@@ -80,7 +80,7 @@
}
],
"source": [
- "schema = dj.Schema(f\"{dj.config['database.user']}_json\")\n"
+ "schema = dj.Schema(f\"{dj.config['database.user']}_json\")"
]
},
{
@@ -99,7 +99,7 @@
" car=null: json # A car belonging to a team (null to allow registering first but specifying car later)\n",
" \n",
" unique index(car.length:decimal(4, 1)) # Add an index if this key is frequently accessed\n",
- " \"\"\"\n"
+ " \"\"\""
]
},
{
@@ -145,7 +145,7 @@
" ],\n",
" },\n",
" }\n",
- ")\n"
+ ")"
]
},
{
@@ -193,7 +193,7 @@
" },\n",
" },\n",
" ]\n",
- ")\n"
+ ")"
]
},
{
@@ -1044,7 +1044,7 @@
"metadata": {},
"outputs": [],
"source": [
- "schema.drop()\n"
+ "schema.drop()"
]
},
{