Skip to content

Conversation

@leontrolski
Copy link

@leontrolski leontrolski commented Jul 9, 2025

I was somewhat surprised to find that tick wasn't deterministic and actually increments by time passed. At $WORK, this was causing some tests to become flakey and not reproducible. (Just to clarify - this seems like a reasonable design decision, just surprising to me and a couple of colleagues).

This commit adds an optional tick_delta argument such that it's possible to make the tick deterministic.

I was somewhat surprised to find that `tick` wasn't deterministic and
actually increments by time passed. At $WORK, this was causing some
tests to become flakey and not reproducible.

This commit adds an optional `tick_delta` argument such that it's possible
to make the tick deterministic.
@adamchainz
Copy link
Owner

Recent issue #518 suggested making tick accept a timedelta, which I think is a more reasonable API than adding another argument.

I'm still not convinced of the utility, though. IMO, you can't fully remove flakiness with a deterministic tick, because the tests still depend on the number of clock reads, which can change due to library changes, background threads, etc.

It's much better to loosen tests to avoid relying on precise values.

@leontrolski
Copy link
Author

making tick accept a timedelta

This does make a lot more sense - I was worried about making a backwards incompatible change, but I guess it's not one.

I'm still not convinced of the utility, though ... still depend on the number of clock reads
It's much better to loosen tests to avoid relying on precise values.

I agree in principle, but let me walk you through the test failure that motivated this PR - the key problem is that because the current behaviour isn't deterministic, I can't reproduce a failure from CI where things run slower.

A test depending on the number of clock reads is bad, but assuming my build is reproducible, it's at least consistent across environments.


The test is something along the lines of:

def run_application() -> None:
    for f in files:
        filename = f"{name}-{dt.datetime.now()}.txt"
        write_to_s3(filename, f)

def test_application() -> None:
    with time_machine.travel(NOW, tick=True):
        run_application()
    assert read_s3_filenames(...) == [
        "A-2025-12-03T00:00:00.txt"
        "B-2025-12-03T00:00:00.txt"
    ]

In the passing case:

def run_application() -> None:
    filename = "A-2025-12-03T00:00:00.txt"
    write_to_s3(filename, f)  # takes < 1s
    filename = "B-2025-12-03T00:00:00.txt"
    write_to_s3(filename, f)

And in the failing case:

def run_application() -> None:
    filename = "A-2025-12-03T00:00:00.txt"
    write_to_s3(filename, f)  # takes 9s
    filename = "B-2025-12-03T00:00:09.txt"  # the assertion will fail on this
    write_to_s3(filename, f)

The reason for using tick=True at all (and I suspect is a common pattern), is that all of our db inserts have a updated_at=now() added by Django and many of our queries rely on ORDER BY updated_at.

Again, feedback from colleagues was that they too were surprised that tick=dt.timedelta(microseconds=1) wasn't the default behaviour - it feels that by using the library you're making everything deterministic.

@adamchainz
Copy link
Owner

adamchainz commented Jul 10, 2025

I would avoid testing exact timestamps in the filenames there, and instead check the format of the filenames. That could be some str.startswith() and str.endswith() calls, or more complexly but accurately, dirty-equals IsStr with a regex.

I don't think your test cares about the exact value of the time that went into the filename, but rather that the files were written with names of the correct format. Loosening the assertions will prevent them from failing on slower machines (a problem that's really orthogonal to mocking time).

I appreciate that time-machine may give the impression of determinism, but it's not possible to promise that. I fear that depending on deterministic ticking will leave you with a bunch of tests that fail as soon as one more clock read is introduced, which could come from any library upgrade.

@leontrolski
Copy link
Author

I agree with your comment on the test - it's not great. However, the CI-reproducibility thing is still important to me.

a bunch of tests that fail as soon as one more clock read is introduced

Agreed again, this is bad, but at least when that happens in CI (due to a library upgrade or something similarly tangential to the test at hand), I can run locally and reproduce.


Just for my future reference, we did end up subclassing a là:

import time_machine

class Coordinates(time_machine.Coordinates):
    def time_ns(self) -> int:
        if self._tick:
            out = self._destination_timestamp_ns
            self._destination_timestamp_ns += 1000
            return out
        return self._destination_timestamp_ns

class travel(time_machine.travel):
    def start(self) -> Coordinates:
        time_machine.coordinates_stack
        time_machine._time_machine.patch_if_needed()  # type: ignore[attr-defined]

        if not time_machine.coordinates_stack:
            time_machine.uuid_generate_time_patcher.start()
            time_machine.uuid_uuid_create_patcher.start()

        coordinates = Coordinates(
            destination_timestamp=self.destination_timestamp,
            destination_tzname=self.destination_tzname,
            tick=self.tick,
        )
        time_machine.coordinates_stack.append(coordinates)
        coordinates._start()

        return coordinates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants