Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Omero metadata optional #297

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

will-moore
Copy link
Member

NB: This shouldn't be merged into 0.5, and should wait until #276 is resolved.

Fixes #192.

The omero metadata is useful for storing channels names (labels), colors and rendering settings, but currently ALL fields are required which means it is hard to use if any of these values are unknown.

This PR proposes that all the omero.channels values are optional.
It also adds more details for "label", "active" and "inverted" as well as notes on the "window" values.

cc @giovp @dstansby

Copy link
Contributor

github-actions bot commented Feb 7, 2025

Automated Review URLs

@d-v-b
Copy link
Contributor

d-v-b commented Feb 7, 2025

-1 on optional fields in metadata, this is a headache for validation. can't we use a fixed set of fields, and allow the optional ones to take the value null?

@will-moore
Copy link
Member Author

Elsewhere we have optional fields, such as multiscale: name, type, coordinateTransform, metadata. And the spec doesn't say "These values MUST be set but can be null".

So is this already making validation hard?

Looking at
https://github.com/ome-zarr-models/ome-zarr-models-py/blob/b9058a772a720bc55896ec5d5f126a6ff043ffdc/src/ome_zarr_models/common/multiscales.py#L247
I'm not sure I understand the the difference between | None = None and simply = None?
e.g.

    name: JsonValue | None = None
    type: JsonValue = None

Would this be different (simpler) if the spec said "Multiscale MUST be set but can be null"?

@d-v-b
Copy link
Contributor

d-v-b commented Feb 7, 2025

Elsewhere we have optional fields, such as multiscale: name, type, coordinateTransform, metadata. And the spec doesn't say "These values MUST be set but can be null".

So is this already making validation hard?

Yes, those are exactly the fields that caused headaches on the validation side. Python objects like to have a static set of properties, so to model metadata with dynamic properties with a python object means you have to "fill in" the missing fields with None when you create your python object, but then you cant distinguish "this field was unset" from "this field was set to None". We could introduce a new special value to indicate unset fields, but it's rather pointless work when we control the metadata and could just use a static set of fields.

@d-v-b
Copy link
Contributor

d-v-b commented Feb 7, 2025

Would this be different (simpler) if the spec said "Multiscale MUST be set but can be null"?

Yes, and IMO the simplest thing would be to use a table like this for every single field in the spec. It would remove a lot of ambiguity.

field name required type
"name" True string or null

@will-moore
Copy link
Member Author

Is it important to distinguish between setting a value to None and not setting it?

The spec doesn't distinguish between these anywhere for optional properties (unless I've missed some)?

For the name example above, why would a user want to use name = None?
Does ome-zarr-model-py distinguish between name: null and no name set when reading?

As discussed elsewhere, if name is not set, it shouldn't write name: null.

Making name and other optional fields required (but allowed to be null)would be a breaking change, so that's not an easy fix. As a user, I wouldn't be too happy if I've got to fill in every "optional" field withnull`.

@d-v-b
Copy link
Contributor

d-v-b commented Feb 10, 2025

Is it important to distinguish between setting a value to None and not setting it?

From a python POV, a class where an attribute can be None or a string is easy to model:

class Foo:
  required_field: int
  name: str | None

but directly modelling a class where an attribute might not exist is harder: you need two classes:

class Foo:
  required_field: int
  ...

class FooWithName(Foo):
  name: str

This latter thing is cumbersome. So at least in python, it's easier to represent an optional value as T | None, where T is the type the value takes.

I'm a little confused by the discussion about the name field, since that's not part of this PR. I'm voicing a preference that the fields of the OMERO metadata (color, label, active, etc) be required (and nullable), instead of optional. If that's impossible for some reason then you can ignore this opinion.

@d-v-b
Copy link
Contributor

d-v-b commented Feb 10, 2025

As a user, I wouldn't be too happy if I've got to fill in every "optional" field withnull`.

This is what default values in a class / function definition are for.

@dataclass
class Foo:
  required_field: int
  name: str | None = None

Foo(required_field=2)
Foo.name == None
# True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

relax omero metadata
2 participants