|
| 1 | +# Use of dandi-schema |
| 2 | + |
| 3 | +## Current situation |
| 4 | + |
| 5 | +This mermaid diagram depicts current overall definition and flow of the metadata schema: |
| 6 | + |
| 7 | +```mermaid |
| 8 | +flowchart TD |
| 9 | + %% repositories as grouped nodes |
| 10 | + subgraph dandi_schema_repo["<a href='https://github.com/dandi/dandi-schema/'>dandi/dandi-schema</a>"] |
| 11 | + Pydantic["Pydantic Models"] |
| 12 | + end |
| 13 | +
|
| 14 | + subgraph schema_repo["<a href='https://github.com/dandi/schema/'>dandi/schema</a>"] |
| 15 | + JSONSchema["JSONSchema<br>serializations"] |
| 16 | +
|
| 17 | + end |
| 18 | +
|
| 19 | + subgraph dandi_cli_repo["<a href='https://github.com/dandi/dandi-cli'>dandi-cli</a>"] |
| 20 | + CLI["CLI & Library<br>validation logic<br/>(Python)"] |
| 21 | + end |
| 22 | +
|
| 23 | + subgraph dandi_archive_repo["<a href='https://github.com/dandi/dandi-archive/'>dandi-archive</a>"] |
| 24 | + Meditor["Web UI<br/>Metadata Editor<br/>(meditor; Vue)"] |
| 25 | + API["Archive API<br/>(Python; DJANGO)"] |
| 26 | + Storage[("DB (Postgresql)")] |
| 27 | + end |
| 28 | +
|
| 29 | + %% main flow |
| 30 | + Pydantic -->|"serialize into<br/>(CI)"| JSONSchema |
| 31 | + Pydantic -->|used to validate| CLI |
| 32 | + Pydantic -->|used to validate| API |
| 33 | +
|
| 34 | + JSONSchema -->|used to produce| Meditor |
| 35 | + JSONSchema -->|used to validate??| Meditor |
| 36 | + Meditor -->|submits metadata| API |
| 37 | +
|
| 38 | + CLI -->|used to upload & submit metadata| API |
| 39 | +
|
| 40 | + API <-->|metadata JSON| Storage |
| 41 | +
|
| 42 | + %% styling |
| 43 | + classDef repo fill:#f9f9f9,stroke:#333,stroke-width:1px; |
| 44 | + classDef code fill:#e1f5fe,stroke:#0277bd,stroke-width:1px; |
| 45 | + classDef ui fill:#e8f5e9,stroke:#2e7d32,stroke-width:1px; |
| 46 | + classDef data fill:#fff3e0,stroke:#e65100,stroke-width:1px; |
| 47 | + JSONSchema@{ shape: docs } |
| 48 | +
|
| 49 | + class dandi_schema_repo,schema_repo,dandi_cli_repo,dandi_archive_repo repo; |
| 50 | + class Pydantic,CLI,API code; |
| 51 | + class JSONSchema,Storage data; |
| 52 | + class Meditor ui; |
| 53 | +``` |
| 54 | + |
| 55 | +NB Might need fixing since failed to find explicit use of serialized JSONSchema's by frontend for validation. |
| 56 | + |
| 57 | +In summary, dandi-archive relies on two *instantiations* of `dandi-schema`: |
| 58 | + |
| 59 | +- **Pydantic**: backend validates metadata using Python library; |
| 60 | +- **JSONSchema**: frontend is produced and validates against JSONSchema serialization. |
| 61 | + |
| 62 | +### Pydantic models: backend |
| 63 | + |
| 64 | +The JSONSchema models are generated from the Pydantic models in the `dandi-schema` repository, and stored in `dandi/schema` repository for every version of `dandi-schema` Pydantic model. |
| 65 | +The idea was to be able to validate against specific version of the `dandi-schema` model. |
| 66 | +AFAIK it was never realized and `dandi-archive` always uses specific version of the `dandi-schema` model, as prescribed by the `DANDI_SCHEMA_VERSION` constant [in `dandischema.consts`](https://github.com/dandi/dandi-schema/blob/HEAD/dandi-schema/consts.py) with possibility to overload in [dandiapi.settings](https://github.com/dandi/dandi-archive/blob/HEAD/dandiapi/settings.py#L98C1-L101C85). |
| 67 | + |
| 68 | +```python |
| 69 | +from dandischema.consts import DANDI_SCHEMA_VERSION as _DANDI_SCHEMA_VERSION |
| 70 | + |
| 71 | +class DandiMixin(ConfigMixin): |
| 72 | + ... |
| 73 | + # This is where the schema version should be set. |
| 74 | + # It can optionally be overwritten with the environment variable, but that should only be |
| 75 | + # considered a temporary fix. |
| 76 | + DANDI_SCHEMA_VERSION = values.Value(default=_DANDI_SCHEMA_VERSION, environ=True) |
| 77 | +``` |
| 78 | + |
| 79 | +and us hardcoding to use very specific version of `dandi-schema` in the `dandi-archive` repository's [`setup.py`](https://github.com/dandi/dandi-archive/blob/HEAD/setup.py) |
| 80 | + |
| 81 | +```python |
| 82 | + # Pin dandischema to exact version to make explicit which schema version is being used |
| 83 | + 'dandischema==0.11.0', # schema version 0.6.9 |
| 84 | +``` |
| 85 | + |
| 86 | +Then we use `dandischema` library to validate the metadata in the backend (via celery tasks AFAIK) and against both Pydantic and JSONSchema models |
| 87 | + |
| 88 | +```python |
| 89 | +❯ git grep -e 'validate(' -e 'import.*validate\>' dandiapi/api/services/ |
| 90 | +dandiapi/api/services/metadata/__init__.py:from dandischema.metadata import aggregate_assets_summary, validate |
| 91 | +dandiapi/api/services/metadata/__init__.py: validate(metadata, schema_key='PublishedAsset', json_validation=True) |
| 92 | +dandiapi/api/services/metadata/__init__.py: validate( |
| 93 | +dandiapi/api/services/publish/__init__.py:from dandischema.metadata import aggregate_assets_summary, validate |
| 94 | +dandiapi/api/services/publish/__init__.py: validate(new_version.metadata, schema_key='PublishedDandiset', json_validation=True) |
| 95 | +``` |
| 96 | + |
| 97 | +### Web frontend (Vue) |
| 98 | + |
| 99 | +Uses JSONSchema model via vjsf to produce WebUI. |
| 100 | +Unclear though if we are up-to-date since |
| 101 | + |
| 102 | +```python |
| 103 | +❯ head -n4 web/src/types/schema.ts |
| 104 | +/** |
| 105 | + * This file was automatically generated by json-schema-to-typescript. |
| 106 | + * DO NOT MODIFY IT BY HAND. All changes should be made through the "yarn migrate" command. |
| 107 | + * TypeScript typings for dandiset metadata are based on schema v0.6.2 (https://raw.githubusercontent.com/dandi/schema/master/releases/0.6.2/dandiset.json) |
| 108 | +``` |
| 109 | + |
| 110 | +although we already use v0.6.9 of dandischema. |
| 111 | + |
| 112 | +NB Yarik failed to find location where we explicitly load JSONSchema if we do... |
| 113 | + |
| 114 | +### Vendorization |
| 115 | + |
| 116 | +ATM we also have some hardcoded vendorization in dandi-archive code (see below). |
| 117 | +Work is ongoing in [dandi-schema:PR#294](https://github.com/dandi/dandi-schema/pull/294) to make vendorization of the schema configurable. |
| 118 | +That would result in `dandi/schema` JSONSchema serializations becoming generally de-vendorized. |
| 119 | +And it will be `dandi-archive` instance responsibility to vendorize, which would primarily consist in changing regular expressions more restrictive, via configuration/environment-variables. |
| 120 | + |
| 121 | +#### Backend |
| 122 | + |
| 123 | +Excluding some where we might want to vendorize too (e.g. email subjects etc): |
| 124 | + |
| 125 | +```shell |
| 126 | +❯ git grep DANDI: -- dandiapi | grep -v -e test_ -e 'subject=' -e 'verbose_name' |
| 127 | +dandiapi/api/models/version.py: 'identifier': f'DANDI:{self.dandiset.identifier}', |
| 128 | +dandiapi/api/models/version.py: 'id': f'DANDI:{self.dandiset.identifier}/{self.version}', |
| 129 | +dandiapi/api/services/metadata/__init__.py: f'DANDI:{publishable_version.dandiset.identifier}/{publishable_version.version}' |
| 130 | +dandiapi/api/tests/fuzzy.py:DANDISET_SCHEMA_ID_RE = Re(r'DANDI:\d{6}') |
| 131 | +dandiapi/api/views/dandiset.py: if identifier.startswith('DANDI:'): |
| 132 | +``` |
| 133 | + |
| 134 | +#### Web frontend |
| 135 | + |
| 136 | +```shell |
| 137 | +❯ git grep DANDI: -- web | grep -v -e test_ -e 'subject=' -e 'verbose_name' |
| 138 | +web/src/components/DandisetList.vue: DANDI:<b>{{ item.dandiset.identifier }}</b> |
| 139 | +web/src/stores/dandiset.ts: schema['properties']['identifier']['pattern'] = '^DANDI:\\d{6}$' |
| 140 | +web/src/views/DandisetLandingView/DownloadDialog.vue: // Use the special 'DANDI:' url prefix if appropriate. |
| 141 | +web/src/views/DandisetLandingView/DownloadDialog.vue: const dandiUrl = `DANDI:${identifier}`; |
| 142 | +``` |
| 143 | + |
| 144 | + |
| 145 | +### Summary |
| 146 | + |
| 147 | +We |
| 148 | +- do neither support nor use multiple versions of the schema in dandi-archive |
| 149 | +- use two instantiations of the schema and rely on external process to generate JSONSchema from Pydantic models |
| 150 | +- manually trigger update of web frontend files according to some version of the schema |
| 151 | +- hardcoded some vendorization inside the dandi-archive codebase (backend and frontend) |
| 152 | + |
| 153 | +## Proposed solution idea |
| 154 | + |
| 155 | +The idea was to remove use/reliance on https://github.com/dandi/schema/ JSONSchema serializations by `dandi-archive` and perform serialization to be used by the frontend, by directly serializing needed JSONSchema at startup time. |
| 156 | + |
| 157 | +## Current verdict |
| 158 | + |
| 159 | +But reviewing code, it seems that we do not use JSONSchema serializations in `dandi-archive` at run time at all. |
| 160 | + |
| 161 | +So we might be ok to switch to use vendorized version of dandi-schema, and just address hardcoded vendorizations. |
| 162 | + |
| 163 | +**Note:** We would still need `context.json` among those `dandi/schema` serializations but not sure if others are used explicitly anywhere. We do expose `dandiset.json` schema as `schema_url` in our "server info" at https://dandiarchive.org/server-info and https://api.dandiarchive.org/api/info/. But I do not think `schema_url` is actually used by anything ATM. |
0 commit comments