Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
3358a60
Rename "Create Tables" chapter to "Tables"
claude Dec 16, 2025
70acc36
Reorder chapters: Primary Keys before Lookup Tables
claude Dec 16, 2025
7b1b5af
Revise Primary Keys chapter for clarity and flow
claude Dec 16, 2025
bd3ad7f
Merge pull request #41 from dimitri-yatsenko/claude/rename-create-tab…
dimitri-yatsenko Dec 16, 2025
ad07fdc
Correct primary key documentation for DataJoint best practices
claude Dec 16, 2025
98b6062
Add schema dimensions concept to primary key documentation
claude Dec 16, 2025
9bcae6b
Document schema dimension constraints for auto-populated tables
claude Dec 16, 2025
5e334a7
Merge pull request #42 from dimitri-yatsenko/claude/fix-primary-key-d…
dimitri-yatsenko Dec 16, 2025
d35970c
Propagate schema dimensions concept to master-part and populate chapters
claude Dec 16, 2025
28b62fe
Refine natural vs surrogate key definitions and add partial entity in…
claude Dec 16, 2025
2e01558
Merge pull request #43 from dimitri-yatsenko/claude/fix-primary-key-d…
dimitri-yatsenko Dec 16, 2025
3eb7570
Restore narrative and historical context to primary key chapter
claude Dec 16, 2025
6915ae4
Reorder chapters and add prohibition on default values in primary keys
claude Dec 16, 2025
0f545d4
Merge pull request #44 from dimitri-yatsenko/claude/restore-primary-k…
dimitri-yatsenko Dec 16, 2025
aeae65b
Fix markdown rendering in nested admonition block
claude Dec 16, 2025
3ecc3b4
Improve surrogate key section with UUID reference
claude Dec 16, 2025
4b42ed8
Merge pull request #45 from dimitri-yatsenko/claude/fix-composite-key…
dimitri-yatsenko Dec 16, 2025
520b0b8
Remove oversimplified decision framework from Primary Keys chapter
claude Dec 17, 2025
bbefc7c
Merge pull request #46 from dimitri-yatsenko/claude/improve-primary-k…
dimitri-yatsenko Dec 17, 2025
14fa4ba
Organize DataJoint and SQL variants as tabs in Primary Key chapter
claude Dec 17, 2025
9b734c2
Merge pull request #47 from dimitri-yatsenko/claude/organize-primary-…
dimitri-yatsenko Dec 17, 2025
7d4f77b
Fix syntax highlighting in tab-set code blocks
claude Dec 17, 2025
b28fd7a
Merge pull request #48 from dimitri-yatsenko/claude/organize-primary-…
dimitri-yatsenko Dec 17, 2025
6ad6351
Remove date and author metadata from chapter frontmatter
claude Dec 17, 2025
0481550
Harmonize UUID chapter with Primary Keys chapter
claude Dec 17, 2025
d244441
Wrap long code lines to ~84 characters throughout the book
claude Dec 17, 2025
4d4d9dc
Merge pull request #49 from dimitri-yatsenko/claude/remove-chapter-me…
dimitri-yatsenko Dec 17, 2025
77f9cea
Harmonize Foreign Keys, Relationships, and Diagrams with Primary Keys…
claude Dec 17, 2025
99545bc
Merge pull request #50 from dimitri-yatsenko/claude/harmonize-foreign…
dimitri-yatsenko Dec 17, 2025
11cc0f3
Fix nested code block syntax highlighting in Foreign Keys chapter
claude Dec 17, 2025
dd4848b
Merge pull request #51 from dimitri-yatsenko/claude/harmonize-foreign…
dimitri-yatsenko Dec 17, 2025
05997da
Complete SQL examples in Relationships chapter to match DataJoint code
claude Dec 17, 2025
bbd04e9
Merge pull request #52 from dimitri-yatsenko/claude/harmonize-foreign…
dimitri-yatsenko Dec 17, 2025
523e3fa
Complete SQL examples in Diagrams chapter to match DataJoint code
claude Dec 17, 2025
f7160cd
Convert Relationships and Diagrams chapters to Jupyter notebooks
claude Dec 17, 2025
9043e2e
Merge pull request #53 from dimitri-yatsenko/claude/harmonize-foreign…
dimitri-yatsenko Dec 17, 2025
e9e4df1
Add diagram intersection operator (*) documentation
claude Dec 17, 2025
8580e6c
Merge pull request #54 from dimitri-yatsenko/claude/add-diagram-inter…
dimitri-yatsenko Dec 17, 2025
a72d314
regenerate diagrams in the Diagramming and Relationships chapters
dimitri-yatsenko Dec 18, 2025
decf35d
Reorder chapters: Foreign Keys → Diagramming → Relationships
claude Dec 18, 2025
8600433
Merge pull request #55 from dimitri-yatsenko/claude/reorganize-diagra…
dimitri-yatsenko Dec 18, 2025
a909ed7
Add enum vs lookup table comparison in Relationships chapter
claude Dec 18, 2025
401933c
Add UI/dropdown menu consideration to enum vs lookup table comparison
claude Dec 18, 2025
6d8fbd6
Merge pull request #56 from dimitri-yatsenko/claude/add-lookup-tables…
dimitri-yatsenko Dec 18, 2025
d996d03
regenerate diagrams in Relationships
dimitri-yatsenko Dec 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 56 additions & 10 deletions book/00-introduction/49-connect.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -28,19 +28,34 @@
{
"cell_type": "markdown",
"metadata": {},
"source": "# Connect with DataJoint\n\nDataJoint is the primary way to connect to the database in this book. The DataJoint client library reads the database credentials from the environment variables `DJ_HOST`, `DJ_USER`, and `DJ_PASS`. \n\nSimply importing the DataJoint library is sufficient—it will connect to the database automatically when needed. Here we call `dj.conn()` only to verify the connection, but this step is not required in normal use."
"source": [
"# Connect with DataJoint\n",
"\n",
"DataJoint is the primary way to connect to the database in this book. The DataJoint client library reads the database credentials from the environment variables `DJ_HOST`, `DJ_USER`, and `DJ_PASS`. \n",
"\n",
"Simply importing the DataJoint library is sufficient—it will connect to the database automatically when needed. Here we call `dj.conn()` only to verify the connection, but this step is not required in normal use."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "import datajoint as dj\ndj.conn() # test the connection (optional)"
"source": [
"import datajoint as dj\n",
"dj.conn() # test the connection (optional)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": "# Connect with SQL Magic\n\nSQL \"Jupyter magic\" allows executing SQL statements directly in Jupyter notebooks, implemented by the [`jupysql`](https://ploomber.io/blog/jupysql/) library. This is useful for quick interactive SQL queries and for learning SQL syntax. We will use SQL magic in this book for demonstrating SQL concepts, but it is not used as part of Python application code.\n\nThe following cell sets up the SQL magic connection to the database."
"source": [
"# Connect with SQL Magic\n",
"\n",
"SQL \"Jupyter magic\" allows executing SQL statements directly in Jupyter notebooks, implemented by the [`jupysql`](https://ploomber.io/blog/jupysql/) library. This is useful for quick interactive SQL queries and for learning SQL syntax. We will use SQL magic in this book for demonstrating SQL concepts, but it is not used as part of Python application code.\n",
"\n",
"The following cell sets up the SQL magic connection to the database."
]
},
{
"cell_type": "code",
Expand All @@ -51,43 +66,74 @@
}
},
"outputs": [],
"source": "%load_ext sql\n%sql mysql+pymysql://dev:devpass@db"
"source": [
"%load_ext sql\n",
"%sql mysql+pymysql://dev:devpass@db"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": "You can issue SQL commands from a Jupyter cell by starting it with `%%sql`.\nChange the cell type to `SQL` for appropriate syntax highlighting."
"source": [
"You can issue SQL commands from a Jupyter cell by starting it with `%%sql`.\n",
"Change the cell type to `SQL` for appropriate syntax highlighting."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "%%sql\n-- show all users\nSELECT User FROM mysql.user"
"source": [
"%%sql\n",
"-- show all users\n",
"SELECT User FROM mysql.user"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": "# Connect with a Python MySQL Client\n\nTo issue SQL queries directly from Python code (outside of Jupyter magic), you can use a conventional SQL client library such as `pymysql`. This approach gives you full programmatic control over database interactions and is useful when you need to execute raw SQL within Python scripts."
"source": [
"# Connect with a Python MySQL Client\n",
"\n",
"To issue SQL queries directly from Python code (outside of Jupyter magic), you can use a conventional SQL client library such as `pymysql`. This approach gives you full programmatic control over database interactions and is useful when you need to execute raw SQL within Python scripts."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "import os\nimport pymysql\n\n# create a database connection\nconn = pymysql.connect(\n host=os.environ['DJ_HOST'], \n user=os.environ['DJ_USER'], \n password=os.environ['DJ_PASS']\n)"
"source": [
"import os\n",
"import pymysql\n",
"\n",
"# create a database connection\n",
"conn = pymysql.connect(\n",
" host=os.environ['DJ_HOST'], \n",
" user=os.environ['DJ_USER'], \n",
" password=os.environ['DJ_PASS']\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# create a query cursor and issue an SQL query\ncur = conn.cursor()\ncur.execute('SELECT User FROM mysql.user')\ncur.fetchall()"
"source": [
"# create a query cursor and issue an SQL query\n",
"cur = conn.cursor()\n",
"cur.execute('SELECT User FROM mysql.user')\n",
"cur.fetchall()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": "We are all set for executing all the database queries in this book!"
"source": [
"We are all set for executing all the database queries in this book!"
]
}
],
"metadata": {
Expand Down
13 changes: 5 additions & 8 deletions book/20-concepts/04-integrity.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
---
title: Data Integrity
date: 2025-10-31
authors:
- name: Dimitri Yatsenko
---

# Why Data Integrity Matters
Expand Down Expand Up @@ -95,7 +92,7 @@ Entity integrity ensures a **one-to-one correspondence** between real-world enti
**Example:** Each mouse in the lab has exactly one unique ID, and that ID refers to exactly one mouse—never two different mice sharing the same ID, and never one mouse having multiple IDs.

**Covered in:**
- [Primary Keys](../30-design/020-primary-key.md) — Entity integrity and the 1:1 correspondence guarantee (elaborated in detail)
- [Primary Keys](../30-design/018-primary-key.md) — Entity integrity and the 1:1 correspondence guarantee (elaborated in detail)
- [UUID](../85-special-topics/025-uuid.ipynb) — Universally unique identifiers

---
Expand All @@ -111,7 +108,7 @@ Referential integrity maintains logical associations across tables:
**Example:** A recording session cannot reference a non-existent mouse.

**Covered in:**
- [Foreign Keys](../30-design/030-foreign-keys.ipynb) — Cross-table relationships
- [Foreign Keys](../30-design/030-foreign-keys.md) — Cross-table relationships
- [Relationships](../30-design/050-relationships.ipynb) — Dependency patterns

---
Expand Down Expand Up @@ -161,7 +158,7 @@ Workflow integrity maintains valid operation sequences through:
**Example:** An analysis pipeline cannot compute results before acquiring raw data. If `NeuronAnalysis` depends on `SpikeData`, which depends on `RecordingSession`, the database enforces that recordings are created before spike detection, which occurs before analysis—maintaining the integrity of the entire scientific workflow.

**Covered in:**
- [Foreign Keys](../30-design/030-foreign-keys.ipynb) — How foreign keys encode workflow dependencies
- [Foreign Keys](../30-design/030-foreign-keys.md) — How foreign keys encode workflow dependencies
- [Populate](../40-operations/050-populate.ipynb) — Automatic workflow execution and dependency resolution

---
Expand Down Expand Up @@ -212,8 +209,8 @@ Now that you understand *why* integrity matters, the next chapter introduces how
The [Design](../30-design/010-schema.ipynb) section then shows *how* to implement each constraint type:

1. **[Tables](../30-design/015-table.ipynb)** — Basic structure with domain integrity
2. **[Primary Keys](../30-design/020-primary-key.md)** — Entity integrity through unique identification
3. **[Foreign Keys](../30-design/030-foreign-keys.ipynb)** — Referential integrity across tables
2. **[Primary Keys](../30-design/018-primary-key.md)** — Entity integrity through unique identification
3. **[Foreign Keys](../30-design/030-foreign-keys.md)** — Referential integrity across tables

Each chapter builds on these foundational integrity concepts.
```
4 changes: 2 additions & 2 deletions book/30-design/010-schema.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": "---\ntitle: Schemas\nauthors:\n - name: Dimitri Yatsenko\n---\n\n# What is a schema?\n\nThe term schema has two related meanings in the context of databases:\n\n## 1. Schema as a Data Blueprint\nA **schema** is a formal specification of the structure of data and the rules governing its integrity.\nIt serves as a blueprint that defines how data is organized, stored, and accessed within a database.\nThis ensures that the database reflects the rules and requirements of the underlying business or research project it supports.\n\nIn structured data models, such as the relational model, a schema provides a robust framework for defining:\n* The structure of tables (relations) and their attributes (columns).\n* Rules and constraints that ensure data consistency, accuracy, and reliability.\n* Relationships between tables, such as primary keys (unique identifiers for records) and foreign keys (references to related records in other tables).\n\n### Aims of Good Schema Design\n* **Data Integrity**: Ensures consistency and prevents anomalies.\n* **Query Efficiency**: Facilitates fast and accurate data retrieval, supports complex queries, and optimizes database performance.\n* **Scalability**: Allows the database to grow and adapt as data volumes increase.\n\n### Key Elements of Schema Design\n* **Tables and Attributes**: Each table is defined with specific attributes (columns), each assigned a data type.\n* **Primary Keys**: Uniquely identify each record in a table.\n* **Foreign Keys**: Establish relationships between entities in tables.\n* **Indexes**: Support efficient queries.\n\nThrough careful schema design, database architects create systems that are both efficient and flexible, meeting the current and future needs of an organization. The schema acts as a living document that guides the structure, operations, and integrity of the database.\n\n## 2. Schema as a Database Module\n\nIn complex database designs, the term \"schema\" is also used to describe a distinct module of a larger database with its own namespace that groups related tables together. \nThis modular approach:\n* Separates tables into logical groups for better organization.\n* Avoids naming conflicts in large databases with multiple schemas."
"source": "---\ntitle: Schemas\n---\n\n# What is a schema?\n\nThe term schema has two related meanings in the context of databases:\n\n## 1. Schema as a Data Blueprint\nA **schema** is a formal specification of the structure of data and the rules governing its integrity.\nIt serves as a blueprint that defines how data is organized, stored, and accessed within a database.\nThis ensures that the database reflects the rules and requirements of the underlying business or research project it supports.\n\nIn structured data models, such as the relational model, a schema provides a robust framework for defining:\n* The structure of tables (relations) and their attributes (columns).\n* Rules and constraints that ensure data consistency, accuracy, and reliability.\n* Relationships between tables, such as primary keys (unique identifiers for records) and foreign keys (references to related records in other tables).\n\n### Aims of Good Schema Design\n* **Data Integrity**: Ensures consistency and prevents anomalies.\n* **Query Efficiency**: Facilitates fast and accurate data retrieval, supports complex queries, and optimizes database performance.\n* **Scalability**: Allows the database to grow and adapt as data volumes increase.\n\n### Key Elements of Schema Design\n* **Tables and Attributes**: Each table is defined with specific attributes (columns), each assigned a data type.\n* **Primary Keys**: Uniquely identify each record in a table.\n* **Foreign Keys**: Establish relationships between entities in tables.\n* **Indexes**: Support efficient queries.\n\nThrough careful schema design, database architects create systems that are both efficient and flexible, meeting the current and future needs of an organization. The schema acts as a living document that guides the structure, operations, and integrity of the database.\n\n## 2. Schema as a Database Module\n\nIn complex database designs, the term \"schema\" is also used to describe a distinct module of a larger database with its own namespace that groups related tables together. \nThis modular approach:\n* Separates tables into logical groups for better organization.\n* Avoids naming conflicts in large databases with multiple schemas."
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -40,7 +40,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": "# Using the `schema` Object\n\nThe schema object groups related tables together and helps prevent naming conflicts.\n\nBy convention, the object created by `dj.Schema` is named `schema`. Typically, only one schema object is used in any given Python namespace, usually at the level of a Python module.\n\nThe schema object serves multiple purposes:\n* **Creating Tables**: Used as a *class decorator* (`@schema`) to declare tables within the schema. \nFor details, see the next section, [Create Tables](015-table.ipynb)\n* **Visualizing the Schema**: Generates diagrams to illustrate relationships between tables.\n* **Exporting Data**: Facilitates exporting data for external use or backup.\n\nWith this foundation, you are ready to begin declaring tables and building your data pipeline."
"source": "# Using the `schema` Object\n\nThe schema object groups related tables together and helps prevent naming conflicts.\n\nBy convention, the object created by `dj.Schema` is named `schema`. Typically, only one schema object is used in any given Python namespace, usually at the level of a Python module.\n\nThe schema object serves multiple purposes:\n* **Creating Tables**: Used as a *class decorator* (`@schema`) to declare tables within the schema. \nFor details, see the next section, [Tables](015-table.ipynb)\n* **Visualizing the Schema**: Generates diagrams to illustrate relationships between tables.\n* **Exporting Data**: Facilitates exporting data for external use or backup.\n\nWith this foundation, you are ready to begin declaring tables and building your data pipeline."
},
{
"cell_type": "markdown",
Expand Down
Loading