-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read tag data in its entirety from AsyncFileReader. #76
Conversation
Can you summarize what you're trying to do here? The changes aren't clear.
How do you know the full byte range of the IFD? |
(depends on #71) Basically trying to solve #70 in the most non-intrusive way possible. The biggest problem, imho is that a tiff has clear access patterns that could be exploited. The current implementation gives literally zero information down to the middleware, since only ever values are read using AsyncCursor, which reads single values1. This PR is an improvement, since it fetches the entire array in one go2. to be clear: the change only works on a single tag, there is no way to know the location of all tags, except COGs make things a bit better.
So if we are reading out-of-prefetch or without any middleware, it will at most make 1 request per IFD data, in stead of In The IFD size can also be known beforehand, when Footnotes
|
88ce7be
to
ab6c6a3
Compare
src/ifd.rs
Outdated
// prefetch all tag data | ||
let mut data = if (bigtiff && value_byte_length <= 8) || value_byte_length <= 4 { | ||
// value fits in offset field | ||
let res = cursor.read(value_byte_length).await?; | ||
if bigtiff { | ||
cursor.advance(8-value_byte_length); | ||
} else { | ||
cursor.advance(4-value_byte_length); | ||
} | ||
res | ||
} else { | ||
// Seek cursor | ||
let offset = if bigtiff { | ||
cursor.read_u64().await? | ||
} else { | ||
cursor.read_u32().await?.into() | ||
}; | ||
let reader = cursor.reader().get_bytes(offset..offset+value_byte_length).await?.reader(); | ||
EndianAwareReader::new(reader, cursor.endianness()) | ||
// cursor.seek(offset); | ||
// cursor.read(value_byte_length).await? | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the main logic, after which there are only 2 match statements
ab6c6a3
to
72ffc22
Compare
Yes it reads single values, but that's by design. It makes the underlying code simple to reason about, while deferring all of the actual request logic to the network layer. The I think it's possibly/likely a good idea to tell the network layer when you're doing image vs metadata requests (#78 ), but I'm not sure about some of these other more invasive code changes.
Right but my point is that at most one of these async functions should ever request data anyways. I'd be happy to make the docs more explicit to say that the underlying reader should never be used without a caching/buffering middleware.
The overhead of an async function is likely nanoseconds, so changing the number of async calls (as long as the underlying request layer responds from its cache) should be insignificant. |
I would say the image-tiff implementation of
This PR splits that logic into 2 easier steps.
Or we could give the
yes
It's not about the overhead of an async function, but about the ability for #74 to estimate the position of remaining tag data. Footnotes
|
Ok first things first you need to designate one PR to review, instead of four, and ensure that is up to date. I can't even tell how many things are changing in this PR. |
Ok, will rebase on top of #79, is that ok? |
72ffc22
to
3721436
Compare
3721436
to
feefadd
Compare
Read tags from a buffer which is fetched in its entirety in stead of reading them value-by-value. This allows any middleware to be a bit smart about caching outside of the prefetch range. Also it simplifies logic in
read_tag_value
.This same idea could also be used for pre-fetchin/reading IFDs themselves from the point the count is known. That does change things a tiny little more, so is its own PR