-
Notifications
You must be signed in to change notification settings - Fork 15
Implement zoneinfo parsing/compilation and add TZif structs #257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f0dd735
to
fe9d51b
Compare
It's time to finally tackle the zoneinfo and tzif sourcing issue. The work here lays the baseline for to handle sourcing. There's still a lot more to add, and maybe some bugs to uncover. But this is also major progress.
fe9d51b
to
12221b7
Compare
03303bd
to
3a9ffcc
Compare
3a9ffcc
to
a12f14e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some general comments for context.
write!(file, "//@generated\n\n{generated}") | ||
} | ||
|
||
fn write_debug(&self, debug_path: &Path) -> io::Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there may be a better way to output the debug info rather than just one single JSON that will hopefully make the debugs a little more digestible
zoneinfo/src/lib.rs
Outdated
let tzif = TransitionData { | ||
transitions, | ||
single_line_zone, | ||
// TODO: Handle POSIX tz string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is already really big, and sort of dense. Not sure that POSIX tz string serialization should be added to it.
But if it's preferred, then I can add it.
/// | ||
/// Please note: implementation is very minimal | ||
#[derive(Debug)] | ||
pub struct TzifBlockV2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of TZif structs everywhere. I like the convenience of a related struct being in zoneinfo_compiler
, but I could be convinced that it should live elsewhere.
It might be nice to use structs from tzif
, but that may require some upstream changes.
}) | ||
} | ||
|
||
pub(crate) fn calculate_transitions_for_year( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is where a large bulk of the zoneinfo compilation work lives.
I tried my best to document it as much as possible because it's basically a rough reverse engineering of outzone
. So keeping general steps in the code seemed like a good idea for whenever we have to come back to it.
Just to note: I'm working through a couple bugs that I came across when double checking some more TZifs. |
zoneinfo/src/lib.rs
Outdated
fn test_chicago() { | ||
let manifest_dir = Path::new(env!("CARGO_MANIFEST_DIR")); | ||
let mut zoneinfo = | ||
ZoneInfoCompiler::from_filepath(manifest_dir.join("examples/northamerica")).unwrap(); | ||
|
||
// Association is needed. | ||
let computed_transitions = match zoneinfo | ||
.associate_and_build_for_zone("America/Chicago", &test_default_settings()) | ||
{ | ||
CompiledZone::Single(_) => unreachable!(), | ||
CompiledZone::Transitions(set) => set, | ||
}; | ||
|
||
let data = tzif::parse_tzif_file(Path::new("/usr/share/zoneinfo/America/Chicago")).unwrap(); | ||
let fs_transitions = data.data_block2.unwrap().transition_times; | ||
|
||
for (computed, fs) in computed_transitions.iter().zip(fs_transitions.iter()) { | ||
assert_eq!(computed.at_time, fs.0); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to bundle every zoneinfo file? IMO we should just create an artificial sample from the files that covers all the edge cases.
I'm mostly worried that if we test using /usr/share/zoneinfo
as our source of truth, the tests could break every time a new version of tzdb gets released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had initially created an artificial zoneinfo file of test cases. My concern after doing so was that if down the line there are changes to any zone used as a test case in the artificial version, then it potentially becomes a very manual process to update the zone versus just dropping the update zoneinfo file into the examples folder and letting git handle the diffing. I was trying to minimize the amount bundled, which is why there are only 4 zoneinfo files included vs. the every file.
On the opposite side of things, creating an artificial file would help to isolate test cases, and probably make things a little less cumbersome to deal with. Comment lines could also then be used to note different edge cases of the test. So I'm definitely open to doing so. I just had some concerns about long term maintenance.
Might have been overthinking on my part though, tbh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern after doing so was that if down the line there are changes to any zone used as a test case in the artificial version, then it potentially becomes a very manual process to update the zone versus just dropping the update zoneinfo file into the examples folder and letting git handle the diffing.
I don't think we need to worry about this because we're testing the correct parsing of timezone data, not if the data reflects the real world in a correct way, so artificial cases are a lot more useful to test if the parser works correctly.
We can still add a checker to see if the parser works with the latest tzdb data, but IMO that should be done in a nightly or weekly check on the CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, fair enough. There's a good argument for another tool completely separate from our unit tests to test against all the tzif data anyways.
I'll isolate them into a virtual file and pull the compiled data for the test cases.
I think this is finally ready for merge. Or at least, I'm feeling more confident about the correctness of the data this generates/our path towards correctness. The full zoneinfo files have been pulled in favor of a custom made zoneinfo file. I do have some serialized tzif data in json files locally that we can check in to test the output against if there's any interest in that approach. |
404d5b4
to
2f115fe
Compare
It's time to finally tackle the zoneinfo compilation and tzif sourcing issue. The work here lays the baseline for to handle sourcing. There's still a lot more to add, and maybe some bugs to uncover. But this is also major progress to supporting the ability to source tzifs.
Opening as a draft first for early feedback due to just how dense this actually is. So far, the work here allows us to parse zoneinfo files and compile sets of transition data from the parsed zoneinfo files. There is also related work in provider to setup the tzif structures.
The steps missing from this PR is: