-
Notifications
You must be signed in to change notification settings - Fork 92
Open
Description
Problem
Currently, DataJoint's type system is closely tied to the underlying SQL backend (e.g., MySQL, PostgreSQL), exposing database-specific type names like TINYINT UNSIGNED or DOUBLE. This approach has several drawbacks:
- Lack of Portability: A schema defined for MySQL may not work on PostgreSQL without modification, hindering pipeline portability.
- Inconsistent User Experience: Scientists and developers are forced to learn the nuances of different SQL dialects.
- Less Intuitive Naming: Scientific programmers are more familiar with terms like uint8 and float64 than their SQL equivalents.
A standardized, abstract layer for attribute types is needed to create a consistent, portable, and user-friendly experience.
Requirements
The following core attribute types SHALL be supported by DataJoint implementations:
| Category | Type | Description |
|---|---|---|
| UUID | uuid |
Universally Unique Identifier (RFC 4122). Default values are not supported for uuid attributes. |
| Integers | int8, uint8, int16, uint16,int32, uint32, int64, uint64 |
Standard signed and unsigned integer types of varying bit widths. |
| Floating-Point | float32, float64 |
Single-precision (32-bit) and double-precision (64-bit) floating-point numbers. Note: NaN (Not a Number) behavior MAY vary by backend; e.g., MySQL does not natively support NaN in indexed FLOAT columns. |
| Decimal | decimal(M,N) |
Fixed-point decimal number with a total precision of M digits and N digits after the decimal point (scale). |
| Character Strings | char(N), varchar(N) |
Fixed-length (char) or variable-length (varchar) character strings, where N specifies the maximum length. |
| Enumeration | enum('val1', 'val2', ...) |
A type restricted to a predefined set of allowed string values. |
| Date | date |
Represents a calendar date (year, month, day) in ISO 8601 format (YYYY-MM-DD). A special default value of NOW MAY be used to set the current date upon insertion. |
| Time / Timestamp | timestamp |
Represents a point in time, typically with microsecond precision, stored in UTC. Values SHOULD conform to ISO 8601 format. A special default value of NOW MAY be used to set the current timestamp upon insertion. |
| Binary Large Object | blob |
Stores large binary data directly within the database row (inline storage). Suitable for moderately sized binary objects. |
| Object Reference | object |
Stores a reference (e.g., a key or path) to an external data object managed by DataJoint but stored outside the primary database (e.g., in an object store or file system). See Object Types. |
| Custom Type | <adaptor_name> |
A user-defined type managed by a Custom Type Adaptor, allowing for specialized storage and handling of complex data structures. |
Distinction: blob vs. object Attribute Types
The blob and object types both handle non-scalar data, but differ in their storage strategy and typical use cases:
| Type | Intended Use | Data Storage Location |
|---|---|---|
blob |
Raw binary data stored directly within the database table row. Suitable for relatively small to moderately sized binary data where inline storage is acceptable and efficient. | Database System (e.g., as BLOB or BYTEA columns in MySQL/PostgreSQL). |
object |
References to data entities stored externally to the primary database. Suitable for large files, datasets, or complex objects where external storage is preferred for scalability or management. | External Storage Systems (e.g., file systems, cloud object stores like S3/GCS/Azure Blob, network-attached storage). The database stores metadata and a reference key. |
Metadata
Metadata
Assignees
Labels
featureIndicates new featuresIndicates new features