assert ranks for MIxS slots (in the context of classes) #901

turbomam · 2025-02-19T15:06:01Z

I generated some support files, especially

in https://github.com/microbiomedata/external-metadata-awareness/tree/main/notebooks

They probably won't stay there indefinitely

mixs_slot_rank_template.tsv is structured with the assumption that ranks would be asserted in slot_usages. @jfy133 has also proposed a mechanism by which ranks would have meaningful semantics for the millions, thousands, etc places. Presumably that would make more sense to assert on slots globally, not in slot_usage.

Or just start with the order in NCBI or EBI slot ordering in their templates? See https://www.ncbi.nlm.nih.gov/biosample/docs/packages/?format=xml

The text was updated successfully, but these errors were encountered:

turbomam · 2025-02-19T15:07:07Z

rank is a metaslot where the implementation is the responsibility of the client application (like DH or LinkML documentation pages)

grouping of "like" slots

single axis of similarity?

other supporting LinkML features:

subsets
slot_groups (use by DH!)

turbomam · 2025-02-19T15:11:08Z

The NMDC submission-schema uses ranks and slot_groups

https://github.com/microbiomedata/submission-schema/blob/main/src/nmdc_submission_schema/schema/nmdc_submission_schema.yaml

  analysis_type:
    name: analysis_type
    description: Select all the data types associated or available for this biosample
    title: analysis/data type
    examples:
    - value: metagenomics; metabolomics; metaproteomics
    from_schema: https://example.com/nmdc_submission_schema
    see_also:
    - MIxS:investigation_type
    rank: 3
    domain_of:
    - Biosample
    slot_group: Sample ID
    range: AnalysisTypeEnum
    recommended: true
    multivalued: true

Woolly-at-EBI · 2025-02-25T16:32:35Z

Below is a snippet from the ena_checklists_simplified.json to show the structure. The entire file is attached
(this is data extracted from an internal ENA XML file that contains all the checklists/field_groups/fields/values etc.. Publicly, the individual XML files can be seen on t'internet:
https://www.ebi.ac.uk/ena/browser/checklists and downloaded individually e.g.
curl -s https://www.ebi.ac.uk/ena/browser/api/xml/ERC000012?download=true)

In the ena_checklists_simplified.json the structure is:
"checklists"/{checklist_id}/"description"
"checklists"/{checklist_id}/"name"
"checklists"/{checklist_id}/"checklists_source" # I made this one up to makes it easy for people to parse out just the "GSC MIxS" checklists.
"checklists"/{checklist_id}/"ordered_field" # a list of the field_names ordered as they appear in our checklists (to my knowledge!), which was the original driver of this mini-task
"checklists"/{checklist_id}/"field"/{field_name}/"requirement" # mandatory|recommended|optional
"checklists"/{checklist_id}/"field"/{field_name}/"field_group"
"field_group" #list of field groups, found at least once/ # I talked about them on 25th Feb., but only shared a subset during the meeting, this is all of them. Some are a little artificial e.g. for now obsolete technical reasons we could only have about 100 field_names per field_group, so some had to effectively be split into subsets. Cleaning this up is on my to do list for later this year.

Below is a snippet of the JSON file to show some real data, and all the field_group_names (overall the checklists)
{
"checklists": {
"ERC000038": {
"description": "Shellfish contextual information associated with molecular data. The checklist has been developed in collaboration with EMBRIC Project partners.",
"name": "ENA Shellfish Checklist",
"checklists_source": "ENA",
"field": {
"Latitude Start": {
"requirement": "mandatory",
"field_group": "Marine Event"
},
"Longitude Start": {
"requirement": "mandatory",
"field_group": "Marine Event"
},
"Protocol Label": {
"requirement": "mandatory",
"field_group": "Marine Sample"
},
...
"ordered_field": [
"Latitude Start",
"Longitude Start",
"Protocol Label",
....
},
"field_group": [
"Associated host information",
"Collection event information",
"Environmental information",
"General collection event information",
"Host association",
"Host inoculation",
"Human surveillance data",
"Infraspecies information",
"Marine Environment",
"Marine Event",
"Marine Sample",
"Marine Sampling",
"Organism characteristics",
"Organism characteristics: aquatic specific",
"Organism characteristics: ecosystem",
"Organism characteristics: genetic",
"Part and developmental stage of organism",
"Pathogen testing",
"Pointer to physical material",
"Reference",
"Serology detection",
"Virus isolate information",
"bioreactor",
"building related",
"concentration measurement",
"default",
"demography",
"experimental factor and block",
"food and agriculture",
"food and agriculture: farm",
"growth medium",
"host description",
"host details",
"host disorder",
"internal environment",
"investigation and results",
"investigation experiment design",
"link",
"local environment conditions",
"local environment conditions imposed",
"local environment conditions: soil",
"local environment history",
"non-sample terms",
"non-sample terms: study or project",
"other",
"sample collection",
"sample collection: core sample properties",
"sample collection: methods, storage and transport",
"sample collection: site related",
"sample processing",
"treatment",
"unusual properties"
]

ena_checklists_simplified.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assert ranks for MIxS slots (in the context of classes) #901

assert ranks for MIxS slots (in the context of classes) #901

turbomam commented Feb 19, 2025 •

edited

Loading

turbomam commented Feb 19, 2025 •

edited

Loading

turbomam commented Feb 19, 2025 •

edited

Loading

Woolly-at-EBI commented Feb 25, 2025

assert ranks for MIxS slots (in the context of classes) #901

assert ranks for MIxS slots (in the context of classes) #901

Comments

turbomam commented Feb 19, 2025 • edited Loading

turbomam commented Feb 19, 2025 • edited Loading

turbomam commented Feb 19, 2025 • edited Loading

Woolly-at-EBI commented Feb 25, 2025

turbomam commented Feb 19, 2025 •

edited

Loading

turbomam commented Feb 19, 2025 •

edited

Loading

turbomam commented Feb 19, 2025 •

edited

Loading