Implemented fixes based on identified issues by qiangao7 · Pull Request #8 · opensafely/vaccines-data-quality

qiangao7 · 2026-05-19T10:28:20Z

KP2 added to approval logic
Added campaign_product_lookup table
Merged product and campaign into a single product × campaign table
Removed old product and campaign tables
Added campaign × interval bin table
Removed redundant same-day multiple product flag
Updated flag total to records total

…ampaign table.

… due to time constraints for re-output and checks).

venexia

Hi @qiangao7 - thank you for promptly turning around these edits. You have a duplicated line of code that needs correcting. Otherwise, I have just two queries:

You filter to remove same_day_same_product records but this means all records get removed whereas I think we want to keep one and remove the others as duplicates - please can you confirm which of these you meant to do?
Also, this morning, we discussed adding death date to the active patients filter (alongside (de)registration) to minimize the denominator issues, while acknowledging that some mismatch is to be expected as active patients is determined at a different time than when the records are fetched for pre-campaigns.

venexia · 2026-05-19T14:13:39Z

+  filter(!flag_same_day_same_product) |> # exclude same-day multiple-record combinations
+  filter(!flag_same_day_same_product) |> # exclude same-day multiple-record combinations


Line 174 is duplicated on line 175.

Also, this removes all records that are flagged, but I think we want to keep one of the records and remove the others as duplicates?

Thanks for flagging this, @venexia ! I was considering deduplicating same-day records by keeping just one record first, and dropping mixed product cases for now. Since mixed cases are a small proportion and can create odd interval patterns, it might be cleaner to exclude them at this stage.

At the same time, we could generate a separate table to look at mixed product patterns and use that to define a more comprehensive cleaning rule. Then in a later run, we can apply a consistent approach to all same-day records and get a more accurate interval pattern.

Do you think this sounds reasonable, or do you have any other thoughts?

Hi, @venexia! I think this is already covered in the active patient definition in fn_covid_data_quality (lines 125–127):
active_on_vax_date = registered_on_vax_date & (is.na(death_date) | death_date >= vax_date)

So this should already ensure patients are both registered and alive on the vaccination date — but please correct me if I’ve missed anything.

Hi @qiangao7. I am a bit confused by your reply because it doesn't match what I think is happening in the code. For example, consider patient 1033 in the dummy data, who has both a same product/same day flag in one campaign and a mixed product/same day in another campaign.

Same product/same day: The code is removing all records of the same product on the same day. So, for patient 1033, both their sanofigsk_B1 records get removed (this seems contrary to your comment above?), which means ultimately they have nothing recorded for the spring 2025 campaign. I think we want to keep one record as you suggest, as it is relevant to the interval calculation and vax_product is not contentious.

Mixed product/same day Patient 1033 also happens to have mixed products on the same day during the autumn 2023 campaign, and both are kept and intervals calculated - seemingly just in the order that they were in the original dataset (this also seems contrary to your comment above?). For an interval calculation, I would probably replace this with a single record with the common date and mark the vax_product as 'conflicted'. We want to know something happened in autumn 2023, but we can't be sure what [We haven't really talked about having a conflicted option for some tables before but it could make sense here.]

Finally, from a code review point of view, the same line of code is repeated, so line 175 needs to be removed.

Hi @venexia. Thanks - I think this was based on an earlier version. I updated the logic locally after my reply yesterday but wanted to wait for your confirmation before committing, so things got a bit crossed. Sorry for the confusion!

venexia

Hi @qiangao7 - looks great, thank you for the quick fix!

venexia · 2026-05-20T12:59:52Z

+  filter(!flag_same_day_same_product) |> # exclude same-day multiple-record combinations
+  filter(!flag_same_day_same_product) |> # exclude same-day multiple-record combinations


Hi @qiangao7. I am a bit confused by your reply because it doesn't match what I think is happening in the code. For example, consider patient 1033 in the dummy data, who has both a same product/same day flag in one campaign and a mixed product/same day in another campaign.

Same product/same day: The code is removing all records of the same product on the same day. So, for patient 1033, both their sanofigsk_B1 records get removed (this seems contrary to your comment above?), which means ultimately they have nothing recorded for the spring 2025 campaign. I think we want to keep one record as you suggest, as it is relevant to the interval calculation and vax_product is not contentious.

Mixed product/same day Patient 1033 also happens to have mixed products on the same day during the autumn 2023 campaign, and both are kept and intervals calculated - seemingly just in the order that they were in the original dataset (this also seems contrary to your comment above?). For an interval calculation, I would probably replace this with a single record with the common date and mark the vax_product as 'conflicted'. We want to know something happened in autumn 2023, but we can't be sure what [We haven't really talked about having a conflicted option for some tables before but it could make sense here.]

Finally, from a code review point of view, the same line of code is repeated, so line 175 needs to be removed.

qiangao7 added 5 commits May 19, 2026 10:18

Add KP2 to approval flow; introduce campaign_product_lookup;

a8aed36

Remove separate product and campaign tables; build combined product×c…

4ebb3b0

…ampaign table.

Remove redundant flag (same-day multiple product)

11bee9d

Update flag total to records total (5.11 report used early-year total…

62082d7

… due to time constraints for re-output and checks).

Generate campaign × interval bin table.

4510106

qiangao7 requested a review from venexia May 19, 2026 10:28

venexia requested changes May 19, 2026

View reviewed changes

qiangao7 added 2 commits May 20, 2026 13:06

Keep only one per same product and remove all mixed products first

2893c7a

Create a table to analyze patterns of mixed products

42444f8

qiangao7 requested a review from venexia May 20, 2026 13:07

venexia approved these changes May 20, 2026

View reviewed changes

qiangao7 merged commit 805f53b into main May 20, 2026
1 check passed

qiangao7 deleted the fix-covid-dq-updates branch May 20, 2026 13:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented fixes based on identified issues#8

Implemented fixes based on identified issues#8
qiangao7 merged 7 commits into
mainfrom
fix-covid-dq-updates

qiangao7 commented May 19, 2026

Uh oh!

venexia left a comment

Uh oh!

venexia May 19, 2026

Uh oh!

venexia May 19, 2026

Uh oh!

qiangao7 May 19, 2026 •

edited

Loading

Uh oh!

qiangao7 May 19, 2026

Uh oh!

venexia May 20, 2026

Uh oh!

qiangao7 May 20, 2026

Uh oh!

venexia left a comment

Uh oh!

venexia May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		filter(!flag_same_day_same_product) \|> # exclude same-day multiple-record combinations
		filter(!flag_same_day_same_product) \|> # exclude same-day multiple-record combinations

Conversation

qiangao7 commented May 19, 2026

Uh oh!

venexia left a comment

Choose a reason for hiding this comment

Uh oh!

venexia May 19, 2026

Choose a reason for hiding this comment

Uh oh!

venexia May 19, 2026

Choose a reason for hiding this comment

Uh oh!

qiangao7 May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qiangao7 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

venexia May 20, 2026

Choose a reason for hiding this comment

Uh oh!

qiangao7 May 20, 2026

Choose a reason for hiding this comment

Uh oh!

venexia left a comment

Choose a reason for hiding this comment

Uh oh!

venexia May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qiangao7 May 19, 2026 •

edited

Loading