Description
📰 Custom Issue
Hi! While evaluating a large number of files with multiple variables each I noticed that ESMValTool is much slower when files contain a lot of variables. I could trace that back to Iris' load
function. Here is an example of a loading files with 1 and 61 variables:
import iris
one_path = "data/one_cube.nc" # file with 1 variable
multi_path = "data/multiple_cubes.nc" # file with 61 variables
%%timeit
iris.load(one_path) # 13.2 ms ± 136 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
iris.load(multi_path) # 673 ms ± 984 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
constraint = iris.Constraint("zonal stress from subgrid scale orographic drag")
iris.load(multi_path, constraint) # 611 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
As you can see, loading the file with 61 variables takes ~51 times as long as loading the file with 1 variable. Using a constraint does not help.
Doing the same with xarray gives:
import xarray as xr
one_path = "data/one_cube.nc" # file with 1 variable
multi_path = "data/multiple_cubes.nc" # file with 61 variables
%%timeit
xr.open_dataset(one_path, chunks='auto') # 7.75 ms ± 164 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
xr.open_dataset(multi_path, chunks='auto') # 54.6 ms ± 241 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Here, the difference between 1 and 61 variables is only a factor of ~7.
If only a single file needs to be loaded, this is not a problem, but this quickly adds up to a lot of time if 100s or even 1000s of files need to be read (which can be the case for climate models that write one file with many variables per time step).
Have you ever encountered this problem? Are there any tricks to make loading faster? As mentioned, I tried with a constraint, but that didn't work.
Thanks for your help!
Sample data:
Metadata
Metadata
Assignees
Type
Projects
Status