-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
views of variables don't use the DiskArrays interface #274
Comments
Just to demonstrate one possible fix: removing these view methods just totally fix the problem
The very same code as above gives
|
I'll just add an example with NCDatasets only, might be easier to see what is going on: using NCDatasets
NCDataset("test_file.nc","c") do ds
defVar(ds,"temp",rand(100,100),("lon","lat");
chunksizes = [10,10]
)
@time variable(ds, "temp") .+= 1
end; Before deleting the view methods: |
@Alexander-Barth, would you like to share your thoughts here? |
Personally, I do not make element-wise operations on types of NCDatasets. But a better integration with DiskArray would be very desirable but it is not that straightforward to implement.
All the code is in |
Needing a getproperties field is a real problem for shared APIs, and not really possible with DiskArrays.jl. But you can easily make an Otherwise this will be another ecosystem integration blocker. In Rasters we have to wrap your variables to prevent this from happening. (Also just realised a function based interface can get attribs after reshape or permutedims or whatever - a getproperty interface has the same problem with all of them) |
A function based interface would be quite a massive breaking change. I like the ability to modify attributes as it were a dictionary. Maybe an approach would be that a view of a CFVariable (wrapping a DiskArray) should be also a CFVariable (wrapping the corresponding DiskArray view)? |
CFVariable is not a DiskArray... I'm always confused by this conversation, because wrapping a DiskArray essentially makes it useless, broadcast and many other base methods wont be chunked - as happens with the SubVariable in this issue. We need an object that is a disk array and always stays a disk array. (Or alternatively implement broadcast and everything like DimensionalData.jl does so broadcast and other methods are properly forwarded to the inner array, and rewrap around As with the old If you switch to just using For now we will just wrap and hide CDM variables in other geo packages, and make sure chunk handling works on top of them. See: rafaqz/Rasters.jl#892. We just really need disk arrays chunking to always work. Or maybe I misunderstood... do you mean you keep the |
Calling
view
on aVariable
returns aSubVariable
fromCommonDataModel
, which doesn't implement the DiskArray interface.This unfortunately means that any chunked operation (such as lazy raster operations) are extremely slow, as discussed in rafaqz/Rasters.jl#889
For example:
The last operation here is slow because we are copying a DiskArray to a DiskArray, which happens chunk by chunk, so
view
is called internally. So clearly this is not great.Two possible way forward are to implement (parts of) the DiskArray interface for SubVariable, or to return a SubDiskArray from
view
onVariable
. Arguable NCDatasets violates the DiskArray interface here.The text was updated successfully, but these errors were encountered: