Skip to content

Loading a netCDF file with multiple variables is very slow #6223

Closed as not planned
@schlunma

Description

@schlunma

📰 Custom Issue

Hi! While evaluating a large number of files with multiple variables each I noticed that ESMValTool is much slower when files contain a lot of variables. I could trace that back to Iris' load function. Here is an example of a loading files with 1 and 61 variables:

import iris

one_path = "data/one_cube.nc"  # file with 1 variable
multi_path = "data/multiple_cubes.nc"  # file with 61 variables

%%timeit
iris.load(one_path)  # 13.2 ms ± 136 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
iris.load(multi_path)  # 673 ms ± 984 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
constraint = iris.Constraint("zonal stress from subgrid scale orographic drag")
iris.load(multi_path, constraint)  # 611 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

As you can see, loading the file with 61 variables takes ~51 times as long as loading the file with 1 variable. Using a constraint does not help.

Doing the same with xarray gives:

import xarray as xr

one_path = "data/one_cube.nc"  # file with 1 variable
multi_path = "data/multiple_cubes.nc"  # file with 61 variables

%%timeit
xr.open_dataset(one_path, chunks='auto')  # 7.75 ms ± 164 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
xr.open_dataset(multi_path, chunks='auto')  # 54.6 ms ± 241 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Here, the difference between 1 and 61 variables is only a factor of ~7.

If only a single file needs to be loaded, this is not a problem, but this quickly adds up to a lot of time if 100s or even 1000s of files need to be read (which can be the case for climate models that write one file with many variables per time step).

Have you ever encountered this problem? Are there any tricks to make loading faster? As mentioned, I tried with a constraint, but that didn't work.

Thanks for your help!

Sample data:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions