Skip to content

Commit 24c2008

Browse files
committed
Design document with a diagram of metadata life cycle
Metadata lifecycle inspired by the one we created for BIDS: see bids-standard/bids-website#626
1 parent 430cc57 commit 24c2008

File tree

1 file changed

+163
-0
lines changed

1 file changed

+163
-0
lines changed

doc/design/vendored-schema-1.md

Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# Use of dandi-schema
2+
3+
## Current situation
4+
5+
This mermaid diagram depicts current overall definition and flow of the metadata schema:
6+
7+
```mermaid
8+
flowchart TD
9+
%% repositories as grouped nodes
10+
subgraph dandi_schema_repo["<a href='https://github.com/dandi/dandi-schema/'>dandi/dandi-schema</a>"]
11+
Pydantic["Pydantic Models"]
12+
end
13+
14+
subgraph schema_repo["<a href='https://github.com/dandi/schema/'>dandi/schema</a>"]
15+
JSONSchema["JSONSchema<br>serializations"]
16+
17+
end
18+
19+
subgraph dandi_cli_repo["<a href='https://github.com/dandi/dandi-cli'>dandi-cli</a>"]
20+
CLI["CLI & Library<br>validation logic<br/>(Python)"]
21+
end
22+
23+
subgraph dandi_archive_repo["<a href='https://github.com/dandi/dandi-archive/'>dandi-archive</a>"]
24+
Meditor["Web UI<br/>Metadata Editor<br/>(meditor; Vue)"]
25+
API["Archive API<br/>(Python; DJANGO)"]
26+
Storage[("DB (Postgresql)")]
27+
end
28+
29+
%% main flow
30+
Pydantic -->|"serialize into<br/>(CI)"| JSONSchema
31+
Pydantic -->|used to validate| CLI
32+
Pydantic -->|used to validate| API
33+
34+
JSONSchema -->|used to produce| Meditor
35+
JSONSchema -->|used to validate??| Meditor
36+
Meditor -->|submits metadata| API
37+
38+
CLI -->|used to upload & submit metadata| API
39+
40+
API <-->|metadata JSON| Storage
41+
42+
%% styling
43+
classDef repo fill:#f9f9f9,stroke:#333,stroke-width:1px;
44+
classDef code fill:#e1f5fe,stroke:#0277bd,stroke-width:1px;
45+
classDef ui fill:#e8f5e9,stroke:#2e7d32,stroke-width:1px;
46+
classDef data fill:#fff3e0,stroke:#e65100,stroke-width:1px;
47+
JSONSchema@{ shape: docs }
48+
49+
class dandi_schema_repo,schema_repo,dandi_cli_repo,dandi_archive_repo repo;
50+
class Pydantic,CLI,API code;
51+
class JSONSchema,Storage data;
52+
class Meditor ui;
53+
```
54+
55+
NB Might need fixing since failed to find explicit use of serialized JSONSchema's by frontend for validation.
56+
57+
In summary, dandi-archive relies on two *instantiations* of `dandi-schema`:
58+
59+
- **Pydantic**: backend validates metadata using Python library;
60+
- **JSONSchema**: frontend is produced and validates against JSONSchema serialization.
61+
62+
### Pydantic models: backend
63+
64+
The JSONSchema models are generated from the Pydantic models in the `dandi-schema` repository, and stored in `dandi/schema` repository for every version of `dandi-schema` Pydantic model.
65+
The idea was to be able to validate against specific version of the `dandi-schema` model.
66+
AFAIK it was never realized and `dandi-archive` always uses specific version of the `dandi-schema` model, as prescribed by the `DANDI_SCHEMA_VERSION` constant [in `dandischema.consts`](https://github.com/dandi/dandi-schema/blob/HEAD/dandi-schema/consts.py) with possibility to overload in [dandiapi.settings](https://github.com/dandi/dandi-archive/blob/HEAD/dandiapi/settings.py#L98C1-L101C85).
67+
68+
```python
69+
from dandischema.consts import DANDI_SCHEMA_VERSION as _DANDI_SCHEMA_VERSION
70+
71+
class DandiMixin(ConfigMixin):
72+
...
73+
# This is where the schema version should be set.
74+
# It can optionally be overwritten with the environment variable, but that should only be
75+
# considered a temporary fix.
76+
DANDI_SCHEMA_VERSION = values.Value(default=_DANDI_SCHEMA_VERSION, environ=True)
77+
```
78+
79+
and us hardcoding to use very specific version of `dandi-schema` in the `dandi-archive` repository's [`setup.py`](https://github.com/dandi/dandi-archive/blob/HEAD/setup.py)
80+
81+
```python
82+
# Pin dandischema to exact version to make explicit which schema version is being used
83+
'dandischema==0.11.0', # schema version 0.6.9
84+
```
85+
86+
Then we use `dandischema` library to validate the metadata in the backend (via celery tasks AFAIK) and against both Pydantic and JSONSchema models
87+
88+
```python
89+
❯ git grep -e 'validate(' -e 'import.*validate\>' dandiapi/api/services/
90+
dandiapi/api/services/metadata/__init__.py:from dandischema.metadata import aggregate_assets_summary, validate
91+
dandiapi/api/services/metadata/__init__.py: validate(metadata, schema_key='PublishedAsset', json_validation=True)
92+
dandiapi/api/services/metadata/__init__.py: validate(
93+
dandiapi/api/services/publish/__init__.py:from dandischema.metadata import aggregate_assets_summary, validate
94+
dandiapi/api/services/publish/__init__.py: validate(new_version.metadata, schema_key='PublishedDandiset', json_validation=True)
95+
```
96+
97+
### Web frontend (Vue)
98+
99+
Uses JSONSchema model via vjsf to produce WebUI.
100+
Unclear though if we are up-to-date since
101+
102+
```python
103+
❯ head -n4 web/src/types/schema.ts
104+
/**
105+
* This file was automatically generated by json-schema-to-typescript.
106+
* DO NOT MODIFY IT BY HAND. All changes should be made through the "yarn migrate" command.
107+
* TypeScript typings for dandiset metadata are based on schema v0.6.2 (https://raw.githubusercontent.com/dandi/schema/master/releases/0.6.2/dandiset.json)
108+
```
109+
110+
although we already use v0.6.9 of dandischema.
111+
112+
NB Yarik failed to find location where we explicitly load JSONSchema if we do...
113+
114+
### Vendorization
115+
116+
ATM we also have some hardcoded vendorization in dandi-archive code (see below).
117+
Work is ongoing in [dandi-schema:PR#294](https://github.com/dandi/dandi-schema/pull/294) to make vendorization of the schema configurable.
118+
That would result in `dandi/schema` JSONSchema serializations becoming generally de-vendorized.
119+
And it will be `dandi-archive` instance responsibility to vendorize, which would primarily consist in changing regular expressions more restrictive, via configuration/environment-variables.
120+
121+
#### Backend
122+
123+
Excluding some where we might want to vendorize too (e.g. email subjects etc):
124+
125+
```shell
126+
❯ git grep DANDI: -- dandiapi | grep -v -e test_ -e 'subject=' -e 'verbose_name'
127+
dandiapi/api/models/version.py: 'identifier': f'DANDI:{self.dandiset.identifier}',
128+
dandiapi/api/models/version.py: 'id': f'DANDI:{self.dandiset.identifier}/{self.version}',
129+
dandiapi/api/services/metadata/__init__.py: f'DANDI:{publishable_version.dandiset.identifier}/{publishable_version.version}'
130+
dandiapi/api/tests/fuzzy.py:DANDISET_SCHEMA_ID_RE = Re(r'DANDI:\d{6}')
131+
dandiapi/api/views/dandiset.py: if identifier.startswith('DANDI:'):
132+
```
133+
134+
#### Web frontend
135+
136+
```shell
137+
❯ git grep DANDI: -- web | grep -v -e test_ -e 'subject=' -e 'verbose_name'
138+
web/src/components/DandisetList.vue: DANDI:<b>{{ item.dandiset.identifier }}</b>
139+
web/src/stores/dandiset.ts: schema['properties']['identifier']['pattern'] = '^DANDI:\\d{6}$'
140+
web/src/views/DandisetLandingView/DownloadDialog.vue: // Use the special 'DANDI:' url prefix if appropriate.
141+
web/src/views/DandisetLandingView/DownloadDialog.vue: const dandiUrl = `DANDI:${identifier}`;
142+
```
143+
144+
145+
### Summary
146+
147+
We
148+
- do neither support nor use multiple versions of the schema in dandi-archive
149+
- use two instantiations of the schema and rely on external process to generate JSONSchema from Pydantic models
150+
- manually trigger update of web frontend files according to some version of the schema
151+
- hardcoded some vendorization inside the dandi-archive codebase (backend and frontend)
152+
153+
## Proposed solution idea
154+
155+
The idea was to remove use/reliance on https://github.com/dandi/schema/ JSONSchema serializations by `dandi-archive` and perform serialization to be used by the frontend, by directly serializing needed JSONSchema at startup time.
156+
157+
## Current verdict
158+
159+
But reviewing code, it seems that we do not use JSONSchema serializations in `dandi-archive` at run time at all.
160+
161+
So we might be ok to switch to use vendorized version of dandi-schema, and just address hardcoded vendorizations.
162+
163+
**Note:** We would still need `context.json` among those `dandi/schema` serializations but not sure if others are used explicitly anywhere. We do expose `dandiset.json` schema as `schema_url` in our "server info" at https://dandiarchive.org/server-info and https://api.dandiarchive.org/api/info/. But I do not think `schema_url` is actually used by anything ATM.

0 commit comments

Comments
 (0)