Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling large IFD data arrays #70

Open
feefladder opened this issue Mar 20, 2025 · 5 comments
Open

Handling large IFD data arrays #70

feefladder opened this issue Mar 20, 2025 · 5 comments

Comments

@feefladder
Copy link
Contributor

feefladder commented Mar 20, 2025

EDIT: Since deferring fetching of tile_info=tile_offsets+tile_byte_counts was quite an intrusive change, and the remaining tile_infos can be reasonably estimated from the first tile_offsets, this issue changed a bit in my question and is now mainly about #76

TileByteCounts and TileOffsets arrays are special-cased in the COG spec, and don't generally fit within the first 16kB.

Thus, using coalesce_ranges and waiting with fetching that data until all tags are read could be beneficial (reduce the number of requests). I've updated the amazing diagram to include COG file structure:

Image

@kylebarron
Copy link
Member

Fetching 16kb is just a default. The user can override how much to prefetch. And the user can add a "block cache middleware" or something like that to speed up requests that extend beyond the prefetch range.

@feefladder
Copy link
Contributor Author

feefladder commented Mar 22, 2025

The main problem is that array size is a-priori unknown and a good estimate is hard to make, unless the application knows exactly which types of COGs are going to be handled

I've thought about this a bit and think that what would allow the user to do such a thing is to be clear about where we are in the file (fetching metadata or data), so I've made a pr that adds functions to the trait #73. In #74, I've concluded that, when fetching data arrays, fetching the additional smaller overviews can be done approximately by fetching (and caching) twice the size, based on this heuristic:

  1. number of tiles of next overview = nrows.div_ceil(4)*ncols.div_ceil(4) ~= ntiles/4
  2. geometric series => total tiles = ntiles/(1-1/4)=4/3*ntiles
  3. add the byte_counts, which is Long, whereas tile_offsets is Long8, so bytes*3/2
  4. total_bytes = 2*bytes

but that underestimates by a bit, resulting in another fetch for less bytes, which conveniently also clears the data structure :) - I've tested with some COGs, undershoot in bytes, percentage of first count:

@kylebarron
Copy link
Member

Ok it sounds like your desire is "fetch multiple overviews for a tile concurrently"? Are you speaking of data or IFD metadata?

I don't think this is anything that async-tiff needs to concern itself at the core level.

The main things to consider when deciding whether to merge requests or not are:

  • network conditions
  • latency
  • bandwidth

None of those need to know whether the range we're currently fetching is a tile at which zoom level.

@kylebarron
Copy link
Member

fetching the additional smaller overviews can be done approximately by fetching (and caching) twice the size

Instead of doing this, just make separate fetches for each tile (pseudocode)

let cog = TIFF::open(file);
let full_res_ifd = cog.ifds()[0];
let overview1 = cog.ifds()[1];

let full_res_tile_future = full_res_ifd.load_tile();
let overview_tile_future = overview1.load_tile();
// await on these two requests concurrently (not sure how to do that in
// Rust, but in JS it would be like `Promise.all`)

@feefladder
Copy link
Contributor Author

feefladder commented Mar 25, 2025

Ok it sounds like your desire is "fetch multiple overviews for a tile concurrently"? Are you speaking of data or IFD metadata?

Ok, sorry, it's specifically about tag data, which is IFD metadata. like so:

Image

The main things to consider when deciding whether to merge requests or not are:

* network conditions

* latency

* bandwidth

I would like to add:

  • number of requests for loading (all/desired) IFDs

That these arrays can vary in size so much is my reason for #76 and #74

cast showing http example
asciicast

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants