Skip to content

Conversation

@olivialynn
Copy link
Member

@olivialynn olivialynn commented Dec 1, 2025

Closes (with other PR) astronomy-commons/hats-import#628

The PR-pair adds a create_metadata flag to hats-import to make the creation of the /dataset/_metadata` parquet file optional (but created by default.

This PR adds a create_metadata flag to write_parquet_metadata specifically, which is the method called anytime we wish to create the _metadata file from hats-import.

@codecov
Copy link

codecov bot commented Dec 1, 2025

Codecov Report

❌ Patch coverage is 56.52174% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.38%. Comparing base (b62d29c) to head (9078104).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/hats/catalog/partition_info.py 18.18% 9 Missing ⚠️
src/hats/io/paths.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #605      +/-   ##
==========================================
- Coverage   92.92%   92.38%   -0.54%     
==========================================
  Files          49       49              
  Lines        2261     2286      +25     
==========================================
+ Hits         2101     2112      +11     
- Misses        160      174      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Dec 1, 2025

Before [fbd235e] <v0.7.3> After [0e5e408] Ratio Benchmark (Parameter)
106±0.5ms 107±0.5ms 1.01 benchmarks.time_test_alignment_even_sky
1.04±0.03ms 1.05±0.02ms 1.01 benchmarks.time_test_cone_filter_multiple_order
358±2ms 356±2ms 0.99 benchmarks.Suite.time_outer_pixel_alignment
36.2±0.7ms 35.8±0.3ms 0.99 benchmarks.Suite.time_pixel_tree_creation
12.7±0.6ms 12.4±0.3ms 0.98 benchmarks.Suite.time_inner_pixel_alignment
230±2ms 226±1ms 0.98 benchmarks.time_open_large_catalog
36.6±0.6ms 35.9±0.3ms 0.98 benchmarks.time_open_midsize_catalog
228±2ms 225±0.8ms 0.98 benchmarks.time_small_cone_large_catalog

Click here to view all benchmarks.

Copy link
Contributor

@delucchi-cmu delucchi-cmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more information on the general parquet metadata files, and why we write them, see
Creates files (relative to the catalog root):
data_thumbnail.parquet (only if create_thumbnail=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is written to dataset/data_thumbnail.parquet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this diagram in the IVOA note, it seems to be at the root level - have we changed the behavior since?

"""Write Parquet dataset-level metadata files (and optional thumbnail) for a catalog.
For more information on the general parquet metadata files, and why we write them, see
Creates files (relative to the catalog root):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table isn't rendered well by sphinx, and the connection to the below paragraphs is awkward. How about:

* thumbnail
  * only if create_thumbnail=True
  * <purpose and contents>
* _common_metadata
  * always written
  * <purpose and contents>
* etc...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated based on our conversation earlier today, see rendered docs here: https://hats--605.org.readthedocs.build/en/605/autoapi/hats/io/parquet_metadata/index.html

I mostly copied the descriptions from the IVOA note, and ofc totally open to revisions to writing/formatting/etc

The `data_thumbnail.parquet` file contains one row from each data partition,
up to a maximum of `thumbnail_threshold` rows.
References
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sphinx doesn't know about References. Maybe Notes? It's also not happy about the bullets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants