Skip to content

incrementalFetch(): fix race condition which would miss rows based on changed_at#288

Closed
Al2Klimov wants to merge 1 commit intomainfrom
changed_at-racecond
Closed

incrementalFetch(): fix race condition which would miss rows based on changed_at#288
Al2Klimov wants to merge 1 commit intomainfrom
changed_at-racecond

Conversation

@Al2Klimov
Copy link
Copy Markdown
Member

@Al2Klimov Al2Klimov commented Mar 11, 2025

incrementalFetch() relies on every newly inserted row having a higher changed_at than every already committed one. At the moment, web doesn't guarantee this. It even writes changed_at rounded to whole seconds. Hence, we could skip one of two rows which appeared during the same second.

This change simply ignores the most recent 3s of config updates (for 3s), so that two or more new rows which appear during the same second aren't a problem. They're processed together 3s after they get committed.

closes #237

@Al2Klimov Al2Klimov requested a review from oxzi March 11, 2025 11:25
@cla-bot cla-bot Bot added the cla/signed CLA is signed by all contributors of a PR label Mar 11, 2025
@oxzi
Copy link
Copy Markdown
Member

oxzi commented Mar 11, 2025

Before you spend any more time looking into this, please take a look at #237.

@Al2Klimov
Copy link
Copy Markdown
Member Author

Before you spend any more time looking into this,

Don't worry. I was already done. (#288 (comment), #289)

please take a look at #237.

Seems more complicated to me. 😅

@oxzi
Copy link
Copy Markdown
Member

oxzi commented Apr 16, 2025

Please rebase to rerun the failing CI job, fixed in the current main branch.

@Al2Klimov
Copy link
Copy Markdown
Member Author

This is just a workaround. I prefer to fix the root cause properly, i.e:

If you agree, please review that PR's intention (description) itself, not necessarily the PHP code.

… changed_at

incrementalFetch() relies on every newly inserted row having a higher changed_at than every already committed one. At the moment, web doesn't guarantee this. It even writes changed_at rounded to whole seconds. Hence, we could skip one of two rows which appeared during the same second.

This change simply ignores the most recent 3s of config updates (for 3s), so that two or more new rows which appear during the same second aren't a problem. They're processed together 3s after they get committed.
@Al2Klimov Al2Klimov force-pushed the changed_at-racecond branch from ab1d223 to d5110bb Compare April 17, 2025 10:36
@Al2Klimov
Copy link
Copy Markdown
Member Author

@Al2Klimov
Copy link
Copy Markdown
Member Author

@nilmerg You said you don't like voluntary thresholds like these 3s. But Icinga DB already gives up after the DB is down for 5m which is also a voluntary threshold. :)

@Al2Klimov
Copy link
Copy Markdown
Member Author

As discussed offline with @nilmerg, I think a changed_at map per row id, and not just per table, could also solve this, but this PR is the way of the least resistance.

@julianbrost
Copy link
Copy Markdown
Member

Can you please stop doing random things. Yes, I'm aware that there are limitations with the current implementation, but now doing things without any coordination between @nilmerg and me for such core mechanism makes little sense.

And now this ist getting quite messy here:
20250424_13h22m16s_grim
(Not to mention that you've also closed #237 in the process.)

@Al2Klimov
Copy link
Copy Markdown
Member Author

@Al2Klimov Al2Klimov closed this May 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla/signed CLA is signed by all contributors of a PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants