Skip to content

Ad-hoc testing of Kerchunk engine compatibility with xCDAT#61

Open
tomvothecoder wants to merge 25 commits intomainfrom
kerchunk
Open

Ad-hoc testing of Kerchunk engine compatibility with xCDAT#61
tomvothecoder wants to merge 25 commits intomainfrom
kerchunk

Conversation

@tomvothecoder
Copy link
Collaborator

@tomvothecoder tomvothecoder commented Jan 12, 2026

Related to xCDAT/xcdat#812

Perform ad-hoc manual testing to confirm that xCDAT works seamlessly with Kerchunk without modifying any code or adding a formal test suite. The goal is to confirm functional parity between Kerchunk-backed datasets and traditional NetCDF I/O through exploratory testing and to document any issues found.

Key areas to check manually:

  1. Basic Open Behavior

    • Can open single-file and multi-file Kerchunk JSONs.
  2. Performance and Stability

    • Compare performance (random sample with n=40)
      • I/O aggregate metrics (median and mean)
      • I/O individual dataset metrics -- in progress
      • Investigate specific 3hr datasets where Kerchunk is slower
      • Investigate single-file only datasets -- done
      • Add test for .load() behavior (subset) in a new notebook -- IN PROGRESS
    • Lazy loading works as expected (no data read on open).
    • No Dask graph errors or performance regressions.
  3. Metadata and CF Handling

    • Dataset contents match the same data opened via NetCDF.
    • CF axes (time, lat, lon, lev) are detected correctly.
    • Time decoding, bounds variables, and attributes are preserved.
  4. xCDAT Functionalities -- identical results?

    • Temporal
    • Spatial
    • Horizontal
    • Vertical regridding
  5. xCDAT Functionalities -- performance differences + .load?

    • Temporal
    • Spatial
    • Horizontal
    • Vertical regridding

@tomvothecoder tomvothecoder self-assigned this Jan 12, 2026
@tomvothecoder
Copy link
Collaborator Author

tomvothecoder commented Jan 12, 2026

Notes from 01/12/26 meeting

Debug

  • Loop over CFsubhr and pin down individual timing, capture outliers or issues (extends first task to script extensions)

Script Extensions

  • Use pandas to store timings
  • Capture timing for JSON/NetCDF pairing
  • Add number of timesteps for each pairing
  • Add number of files netCDF files for each pairing

Notes

  • Steve suggests that frequency might not be the biggest factor in the speed difference, but other factors or issues. Capturing more granular information will allow us to extrapolate more information as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments