Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Group By" to display multiple values as a hierarchy #434

Open
will-moore opened this issue Feb 14, 2025 · 7 comments
Open

"Group By" to display multiple values as a hierarchy #434

will-moore opened this issue Feb 14, 2025 · 7 comments
Assignees
Labels

Comments

@will-moore
Copy link
Contributor

I'd like to provide a way for users to browse their zarr images on a local machine, and to see the images in a hierarchy that looks like their existing folder structure.

Work in progress at ome/ome-zarr-py#436

I have added a 'Folder' column to the CSV, that contains the directories that the images were in.
E.g. "path,to,my,data".

I was hoping there might be a way to use the Group By feature to browse the folders as a hierarchy?
This would actually be much more powerful than a regular folder hierarchy as you could pick and choose the folders in any order.

The current behaviour only gives me a single level of hierarchy - I can't group the images under data into groups for the other folders like idr, idr0062, 2023 etc.

Image

One other approach I tried was to use multiple table columns Folder1, Folder2, Folder3 which would be fine if all the images existed at the same depth in the hierarchy. But with images that are in the top Folder (Folder1) I need to create dummy "-" values for the other Folders:

All the images at the top here are in the data folder but to see them I need to expand 2 other "-" folders. Also this restricts the order that I can create my hierarchy.

Image

I wonder if there's any other way I might be able to achieve this?

@SeanLeRoy
Copy link
Contributor

One other approach I tried was to use multiple table columns Folder1, Folder2, Folder3 which would be fine if all the images existed at the same depth in the hierarchy. But with images that are in the top Folder (Folder1) I need to create dummy "-" values for the other Folders:

This is a really good point and potentially an example of a feature we have been considering adding which is to display those top level folder elements somehow alongside the nested folders. It sounds like from my perspective that would solve your request except that you mention "This would actually be much more powerful than a regular folder hierarchy as you could pick and choose the folders in any order.". I'm not sure I understand what you mean by picking and choosing the order?

I wonder if there's any other way I might be able to achieve this?

I don't think I have a great suggestion on how to accomplish this at the moment beyond what you have tried which is to create those dummy values. We do have a feature underway right now that will allow you to open all folders at once which would solve the "All the images at the top here are in the data folder but to see them I need to expand 2 other "-" folders." in part.

@will-moore
Copy link
Contributor Author

Re: "choosing the order"...

In my first screenshot above, instead of opening the data group first (which includes images with data, idr, idr0062) I could open idr then idr0062 then data.

Or, if I have some images in a directory path like:

data/HeLa/GFP-H2B/Monastrol/24hours/microtubule-stain/

and some others in

data/Mouse/GFP-H2B/microtubule-stain/

If I could "choose the order" to browse, I could Group-By microtubule-stain then GFP-H2B, then Mouse/HeL to give this hierarchy.

  • microtubule-stain
    • GFP-H2B
      • HeLa
      • Mouse

And this works even with the images being at different depths in the hierarchy.
This wouldn't be possible when using the Folder 1, Folder 2 approach because in that case, the microtubule-stain would be in Folder 6 for some images and Folder 4 for other images.

@toloudis
Copy link

toloudis commented Feb 17, 2025

We didn't want people to think of this as a file system browser, because it's not. We also didn't want to encourage "lazy" annotation naming. But in some cases, a folder hierarchy from the filesystem actually is capturing interesting metadata that people would be entering. We've talked about the possibility of tagging special annotations to be parsed as slash-delimited annotation values.
Definitely makes things powerful when you can drag the folders around to invert the hierarchy.
Some details:

  • "Folder N" or "Level N" seems a bit weird but I can't think of a better way to auto-assign the annotation name.
  • For where the folder structure is uneven, did you try just leaving empty values at the deeper levels, instead of using - ?
  • It potentially gets weird when a user starts to mix in the other annotations into the displayed hierarchy.
  • A simple python script could parse the filesystem and turn them into annotations ahead of time (when creating the csv) and then gives you a chance to give them actual names. Choosing these names is actually semantically important for datasets...

@will-moore
Copy link
Contributor Author

I agree that BFF is definitely NOT a file system browser. For me, the file-system is just the source of "Tags" on each image.
You don't really need to do fancy dragging to re-arrange. You already have the ability to expand any of the Groups / Tags that an image is under, so you can choose which is at the "top" level of the hierarchy... you just can't expand any other levels.

  • I would really prefer not to use "Folder N" approach (my 2nd screenshot) because of the limitations I mentioned.
  • I tried leaving empty values, but then the groups just don't show up at-all, so you can't see the images.
  • Yes, I can see it may get weird if you've got a column of Tags other columns and you Group By Tags and other columns. I think it should be possible to imagine what the UI would look like, but the coding would definitely be tricky!
  • The python script I already have in View in biofile finder ome/ome-zarr-py#436. That walks the file-system, looking for zarrs, builds them into a CSV, serves the CSV (and the images) from a local server and opens BFF with the csv, all in one command.

@toloudis
Copy link

Sean will know more about the current plans and correct me if I'm saying anything wrong... My comment about things getting weird is just the idea of mixing in user-defined annotations with the filesystem-derived hierarchy annotations... it's also pretty powerful. I was imagining that under the hood in the code they are all treated the same way, just as annotation names that have values (empty or not) for every file(row). So the implementation doesn't have to care what they mean.

Regarding empty values, I was imagining that this would do what you want:
A/B/C/filename0: Folder1=A, Folder2=B, Folder3=C
A/B/filename1: Folder1=A, Folder2=B, Folder3=[no value]
I think then if you put all the Folders in the groupby, you will reconstruct your filesystem hierarchy?
I probably should just try it too...

@lynwilhelm lynwilhelm added the UX label Feb 17, 2025
@toloudis
Copy link

I tried it with a tiny example csv and got the result that you saw, Will.

test.csv

I wonder if we could make an option to have grouped items that don't have the subgroup annotations still able to appear. So if you have grouped by Folder1/Folder2/Folder3, and you have a file that has F1 and F3 but not F2, the file would not show up. (This would not happen in a directory structure but for arbitrary annotations it can). But if the file has F1 and neither F2 nor F3, it could show up under the Folder1=F1 grouping?

@will-moore
Copy link
Contributor Author

@toloudis Thanks, yes, I think that would be a nice fix (just so you don't "loose" images that are missing a value for the Group".

However, I think for my usage I'll be sticking with the single column of comma-delimited "Tags" (as in my first screenshot above) as then you can find all the images that have C tag (or folder), regardless of where it appears in the hierarchy.

E.g. csv:

File Path, Folders (or Tags)
image1.zarr,  "A,B,C"
image2.zarr, "A,C"
image3.zarr, C
image4.zarr, "B,C"
image5.zarr, "B,A"
image6.zarr, B

Hierarchy:

  • A
  • B
  • C (expanded)
    • image1.zarr, "A,B,C"
    • image2.zarr, "A,C"
    • image3.zarr, C
    • image4.zarr, "B,C"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants