Skip to content

Implement zoneinfo parsing/compilation and add TZif structs #257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

nekevss
Copy link
Member

@nekevss nekevss commented Apr 2, 2025

It's time to finally tackle the zoneinfo compilation and tzif sourcing issue. The work here lays the baseline for to handle sourcing. There's still a lot more to add, and maybe some bugs to uncover. But this is also major progress to supporting the ability to source tzifs.

Opening as a draft first for early feedback due to just how dense this actually is. So far, the work here allows us to parse zoneinfo files and compile sets of transition data from the parsed zoneinfo files. There is also related work in provider to setup the tzif structures.

The steps missing from this PR is:

  • All that's left is to bridge the gap from transition sets -> TZIFs,
  • Plug everything into the bakeddata tool or a new connected tool.

@nekevss nekevss requested a review from jedel1043 April 2, 2025 03:04
@nekevss nekevss force-pushed the impl-zoneinfo-support branch 2 times, most recently from f0dd735 to fe9d51b Compare April 2, 2025 03:16
It's time to finally tackle the zoneinfo and tzif sourcing
issue. The work here lays the baseline for to handle sourcing.
There's still a lot more to add, and maybe some bugs to
uncover. But this is also major progress.
@nekevss nekevss force-pushed the impl-zoneinfo-support branch from 03303bd to 3a9ffcc Compare April 4, 2025 05:36
@nekevss nekevss force-pushed the impl-zoneinfo-support branch from 3a9ffcc to a12f14e Compare April 4, 2025 13:33
Copy link
Member Author

@nekevss nekevss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some general comments for context.

write!(file, "//@generated\n\n{generated}")
}

fn write_debug(&self, debug_path: &Path) -> io::Result<()> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there may be a better way to output the debug info rather than just one single JSON that will hopefully make the debugs a little more digestible

let tzif = TransitionData {
transitions,
single_line_zone,
// TODO: Handle POSIX tz string
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is already really big, and sort of dense. Not sure that POSIX tz string serialization should be added to it.

But if it's preferred, then I can add it.

///
/// Please note: implementation is very minimal
#[derive(Debug)]
pub struct TzifBlockV2 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of TZif structs everywhere. I like the convenience of a related struct being in zoneinfo_compiler, but I could be convinced that it should live elsewhere.

It might be nice to use structs from tzif, but that may require some upstream changes.

})
}

pub(crate) fn calculate_transitions_for_year(
Copy link
Member Author

@nekevss nekevss Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is where a large bulk of the zoneinfo compilation work lives.

I tried my best to document it as much as possible because it's basically a rough reverse engineering of outzone. So keeping general steps in the code seemed like a good idea for whenever we have to come back to it.

@nekevss nekevss marked this pull request as ready for review April 4, 2025 22:07
@nekevss
Copy link
Member Author

nekevss commented Apr 5, 2025

Just to note: I'm working through a couple bugs that I came across when double checking some more TZifs.

Comment on lines 242 to 261
fn test_chicago() {
let manifest_dir = Path::new(env!("CARGO_MANIFEST_DIR"));
let mut zoneinfo =
ZoneInfoCompiler::from_filepath(manifest_dir.join("examples/northamerica")).unwrap();

// Association is needed.
let computed_transitions = match zoneinfo
.associate_and_build_for_zone("America/Chicago", &test_default_settings())
{
CompiledZone::Single(_) => unreachable!(),
CompiledZone::Transitions(set) => set,
};

let data = tzif::parse_tzif_file(Path::new("/usr/share/zoneinfo/America/Chicago")).unwrap();
let fs_transitions = data.data_block2.unwrap().transition_times;

for (computed, fs) in computed_transitions.iter().zip(fs_transitions.iter()) {
assert_eq!(computed.at_time, fs.0);
}
}
Copy link
Member

@jedel1043 jedel1043 Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to bundle every zoneinfo file? IMO we should just create an artificial sample from the files that covers all the edge cases.

I'm mostly worried that if we test using /usr/share/zoneinfo as our source of truth, the tests could break every time a new version of tzdb gets released.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had initially created an artificial zoneinfo file of test cases. My concern after doing so was that if down the line there are changes to any zone used as a test case in the artificial version, then it potentially becomes a very manual process to update the zone versus just dropping the update zoneinfo file into the examples folder and letting git handle the diffing. I was trying to minimize the amount bundled, which is why there are only 4 zoneinfo files included vs. the every file.

On the opposite side of things, creating an artificial file would help to isolate test cases, and probably make things a little less cumbersome to deal with. Comment lines could also then be used to note different edge cases of the test. So I'm definitely open to doing so. I just had some concerns about long term maintenance.

Might have been overthinking on my part though, tbh.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern after doing so was that if down the line there are changes to any zone used as a test case in the artificial version, then it potentially becomes a very manual process to update the zone versus just dropping the update zoneinfo file into the examples folder and letting git handle the diffing.

I don't think we need to worry about this because we're testing the correct parsing of timezone data, not if the data reflects the real world in a correct way, so artificial cases are a lot more useful to test if the parser works correctly.

We can still add a checker to see if the parser works with the latest tzdb data, but IMO that should be done in a nightly or weekly check on the CI.

Copy link
Member Author

@nekevss nekevss Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, fair enough. There's a good argument for another tool completely separate from our unit tests to test against all the tzif data anyways.

I'll isolate them into a virtual file and pull the compiled data for the test cases.

@nekevss nekevss requested a review from jedel1043 April 15, 2025 03:08
@nekevss
Copy link
Member Author

nekevss commented Apr 15, 2025

I think this is finally ready for merge. Or at least, I'm feeling more confident about the correctness of the data this generates/our path towards correctness.

The full zoneinfo files have been pulled in favor of a custom made zoneinfo file. I do have some serialized tzif data in json files locally that we can check in to test the output against if there's any interest in that approach.

@nekevss nekevss force-pushed the impl-zoneinfo-support branch from 404d5b4 to 2f115fe Compare April 17, 2025 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants