Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
f512163
Add Pipeline Projects chapter to Operations section
claude Dec 15, 2025
43abd03
Merge pull request #32 from dimitri-yatsenko/claude/add-pipeline-proj…
dimitri-yatsenko Dec 15, 2025
db6726d
Update Pipeline Projects chapter with standard structure
claude Dec 15, 2025
5c91846
Merge pull request #33 from dimitri-yatsenko/claude/add-pipeline-proj…
dimitri-yatsenko Dec 15, 2025
50b3f59
Move Pipeline Projects chapter to Design section and harmonize with C…
claude Dec 15, 2025
0c85b14
Reposition Pipeline Projects as production deployment guidance
claude Dec 15, 2025
4033f8d
Add DataJoint Platform as managed deployment option
claude Dec 15, 2025
fdc7241
Merge pull request #34 from dimitri-yatsenko/claude/add-pipeline-proj…
dimitri-yatsenko Dec 15, 2025
ec273dc
Move Pipeline Projects to end of Design section
claude Dec 15, 2025
273daa8
Merge pull request #35 from dimitri-yatsenko/claude/add-pipeline-proj…
dimitri-yatsenko Dec 15, 2025
50ff79b
Fix figure sizes in Pipeline Projects chapter
claude Dec 15, 2025
bcc76fa
Merge pull request #36 from dimitri-yatsenko/claude/add-pipeline-proj…
dimitri-yatsenko Dec 15, 2025
d81a98a
Improve Indexes chapter organization and harmonization
claude Dec 15, 2025
52e413d
Merge pull request #37 from dimitri-yatsenko/claude/improve-indexes-c…
dimitri-yatsenko Dec 15, 2025
7d2085f
Clean up Databases chapter: remove unnecessary emphasis and add links
claude Dec 15, 2025
55c977d
Add links to other chapters in Databases chapter summary
claude Dec 15, 2025
11d36b8
Merge pull request #38 from dimitri-yatsenko/claude/databases-chapter…
dimitri-yatsenko Dec 15, 2025
1aff5cb
Remove knowledge check chapter (concepts-quiz.md)
claude Dec 15, 2025
6569311
Merge pull request #39 from dimitri-yatsenko/claude/remove-knowledge-…
dimitri-yatsenko Dec 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 22 additions & 13 deletions book/20-concepts/00-databases.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,14 @@ The database not only tracks the current state of the enterprise's processes but
**Key traits of databases**:
- Structured data reflects the logic of the enterprise's operations
- Supports the organization's operations by reflecting and enforcing its rules and constraints (data integrity)
- **Precise access control ensures only authorized users can view or modify specific data**
- Precise access control ensures only authorized users can view or modify specific data
- Ability to evolve over time
- Facilitates distributed, concurrent access by multiple users
- Centralized data consistency, appearing as a single source of data even if physically distributed, reflecting all changes
- Allows specific and precise queries through various interfaces for different users
```

Databases are crucial for the smooth and organized operation of various entities, from hotels and airlines to universities, banks, and research projects. They ensure that processes are accurately tracked, essential rules are enforced, only valid transactions are allowed, and **sensitive data is protected** from unauthorized access. This combination of data integrity and data security makes databases indispensable for any operation where data reliability and confidentiality matter.
Databases are crucial for the smooth and organized operation of various entities, from hotels and airlines to universities, banks, and research projects. They ensure that processes are accurately tracked, essential rules are enforced, only valid transactions are allowed, and sensitive data is protected from unauthorized access. This combination of data integrity and data security makes databases indispensable for any operation where data reliability and confidentiality matter.

## Database Management Systems (DBMS)

Expand All @@ -29,28 +29,28 @@ A Database Management System (DBMS) is a software system that serves as the comp
It defines and enforces the structure of the data, ensuring that the organization's rules are consistently applied.
A DBMS manages data storage and efficiently executes data updates and queries while safeguarding the data's structure and integrity, particularly in environments with multiple concurrent users.

**Critically, a DBMS also manages user authentication and authorization**, controlling who can access which data and what operations they can perform.
Critically, a DBMS also manages user authentication and authorization, controlling who can access which data and what operations they can perform.
```

Consider an airline's database for flight schedules and ticket bookings. The airline must adhere to several key rules:

* A seat cannot be booked by two passengers for the same flight
* A seat is considered reserved only after all details are verified and payment is processed
* **Only authorized ticketing agents can modify reservations**
* **Passengers can view only their own booking information**
* **Financial data is accessible only to accounting staff**
* Only authorized ticketing agents can modify reservations
* Passengers can view only their own booking information
* Financial data is accessible only to accounting staff

A robust DBMS enforces such rules reliably, ensuring smooth operations while interacting with multiple users and systems at once. The same system that prevents double-booking also prevents unauthorized access to passenger records.

Databases are dynamic, with data continuously updated by both users and systems. Even in the face of disruptions like power outages, errors, or cyberattacks, the DBMS ensures that the system recovers quickly and returns to a stable state. For users, the database should function seamlessly, allowing actions to be performed without interference from others working on the system simultaneously—**while ensuring they can only perform actions they're authorized to do**.
Databases are dynamic, with data continuously updated by both users and systems. Even in the face of disruptions like power outages, errors, or cyberattacks, the DBMS ensures that the system recovers quickly and returns to a stable state. For users, the database should function seamlessly, allowing actions to be performed without interference from others working on the system simultaneously—while ensuring they can only perform actions they're authorized to do.

## Data Security and Access Management

One of the most critical features distinguishing databases from simple file storage is **precise access control**. In scientific research, healthcare, finance, and many other domains, not all data should be accessible to all users.

### Authentication and Authorization

Before you can work with a database, you must **authentication**—prove your identity with a username and password. Once authenticated, the database enforces **authorization** rules that determine what you can do:
Before you can work with a database, you must authenticate—prove your identity with a username and password. Once authenticated, the database enforces authorization rules that determine what you can do:

- **Read**: View specific tables or columns
- **Write**: Add new data to certain tables
Expand Down Expand Up @@ -109,10 +109,19 @@ This book focuses on **DataJoint**, a framework that extends relational database
The relational data model—introduced by Edgar F. Codd in 1970—revolutionized data management by organizing data into tables with well-defined relationships. This model has dominated database systems for over five decades due to its mathematical rigor and versatility. Modern relational databases like MySQL and PostgreSQL continue to evolve, incorporating new capabilities for scalability and security while maintaining the core principles that make them reliable and powerful.

The following chapters build the conceptual foundation you need to understand DataJoint's approach:
- **Data Models**: What data models are and why schemas matter for scientific work
- **Relational Theory**: The mathematical foundations that make relational databases powerful
- **Relational Practice**: Hands-on experience with database operations
- **Relational Workflows**: How DataJoint extends relational theory for computational pipelines
- **Scientific Data Pipelines**: How workflows scale into complete research data operations systems
- [Data Models](01-models.md): What data models are and why schemas matter for scientific work
- [Relational Theory](02-relational.md): The mathematical foundations that make relational databases powerful
- [Data Integrity](04-integrity.md): Hands-on experience with database operations
- [Relational Workflows](05-workflows.md): How DataJoint extends relational theory for computational pipelines
- [Scientific Data Pipelines](06-pipelines.md): How workflows scale into complete research data operations systems

By the end, you'll understand both the mathematical foundations and their practical application to your research.

## Links

- [MySQL](https://www.mysql.com/) — Popular open-source relational database management system
- [PostgreSQL](https://www.postgresql.org/) — Advanced open-source relational database
- [SQLite](https://www.sqlite.org/) — Embedded relational database engine
- [Google Spanner](https://cloud.google.com/spanner) — Distributed relational database service
- [CockroachDB](https://www.cockroachlabs.com/) — Distributed SQL database
- [DataJoint](https://datajoint.com/) — Framework for scientific data pipelines
Loading