Implemented fixes based on identified issues#8
Conversation
qiangao7
commented
May 19, 2026
- KP2 added to approval logic
- Added campaign_product_lookup table
- Merged product and campaign into a single product × campaign table
- Removed old product and campaign tables
- Added campaign × interval bin table
- Removed redundant same-day multiple product flag
- Updated flag total to records total
… due to time constraints for re-output and checks).
venexia
left a comment
There was a problem hiding this comment.
Hi @qiangao7 - thank you for promptly turning around these edits. You have a duplicated line of code that needs correcting. Otherwise, I have just two queries:
- You filter to remove same_day_same_product records but this means all records get removed whereas I think we want to keep one and remove the others as duplicates - please can you confirm which of these you meant to do?
- Also, this morning, we discussed adding death date to the active patients filter (alongside (de)registration) to minimize the denominator issues, while acknowledging that some mismatch is to be expected as active patients is determined at a different time than when the records are fetched for pre-campaigns.
| filter(!flag_same_day_same_product) |> # exclude same-day multiple-record combinations | ||
| filter(!flag_same_day_same_product) |> # exclude same-day multiple-record combinations |
There was a problem hiding this comment.
Line 174 is duplicated on line 175.
There was a problem hiding this comment.
Also, this removes all records that are flagged, but I think we want to keep one of the records and remove the others as duplicates?
There was a problem hiding this comment.
Thanks for flagging this, @venexia ! I was considering deduplicating same-day records by keeping just one record first, and dropping mixed product cases for now. Since mixed cases are a small proportion and can create odd interval patterns, it might be cleaner to exclude them at this stage.
At the same time, we could generate a separate table to look at mixed product patterns and use that to define a more comprehensive cleaning rule. Then in a later run, we can apply a consistent approach to all same-day records and get a more accurate interval pattern.
Do you think this sounds reasonable, or do you have any other thoughts?
There was a problem hiding this comment.
Hi, @venexia! I think this is already covered in the active patient definition in fn_covid_data_quality (lines 125–127):
active_on_vax_date = registered_on_vax_date & (is.na(death_date) | death_date >= vax_date)
So this should already ensure patients are both registered and alive on the vaccination date — but please correct me if I’ve missed anything.
There was a problem hiding this comment.
Hi @qiangao7. I am a bit confused by your reply because it doesn't match what I think is happening in the code. For example, consider patient 1033 in the dummy data, who has both a same product/same day flag in one campaign and a mixed product/same day in another campaign.
Same product/same day: The code is removing all records of the same product on the same day. So, for patient 1033, both their sanofigsk_B1 records get removed (this seems contrary to your comment above?), which means ultimately they have nothing recorded for the spring 2025 campaign. I think we want to keep one record as you suggest, as it is relevant to the interval calculation and vax_product is not contentious.
Mixed product/same day Patient 1033 also happens to have mixed products on the same day during the autumn 2023 campaign, and both are kept and intervals calculated - seemingly just in the order that they were in the original dataset (this also seems contrary to your comment above?). For an interval calculation, I would probably replace this with a single record with the common date and mark the vax_product as 'conflicted'. We want to know something happened in autumn 2023, but we can't be sure what [We haven't really talked about having a conflicted option for some tables before but it could make sense here.]
Finally, from a code review point of view, the same line of code is repeated, so line 175 needs to be removed.
There was a problem hiding this comment.
Hi @venexia. Thanks - I think this was based on an earlier version. I updated the logic locally after my reply yesterday but wanted to wait for your confirmation before committing, so things got a bit crossed. Sorry for the confusion!
| filter(!flag_same_day_same_product) |> # exclude same-day multiple-record combinations | ||
| filter(!flag_same_day_same_product) |> # exclude same-day multiple-record combinations |
There was a problem hiding this comment.
Hi @qiangao7. I am a bit confused by your reply because it doesn't match what I think is happening in the code. For example, consider patient 1033 in the dummy data, who has both a same product/same day flag in one campaign and a mixed product/same day in another campaign.
Same product/same day: The code is removing all records of the same product on the same day. So, for patient 1033, both their sanofigsk_B1 records get removed (this seems contrary to your comment above?), which means ultimately they have nothing recorded for the spring 2025 campaign. I think we want to keep one record as you suggest, as it is relevant to the interval calculation and vax_product is not contentious.
Mixed product/same day Patient 1033 also happens to have mixed products on the same day during the autumn 2023 campaign, and both are kept and intervals calculated - seemingly just in the order that they were in the original dataset (this also seems contrary to your comment above?). For an interval calculation, I would probably replace this with a single record with the common date and mark the vax_product as 'conflicted'. We want to know something happened in autumn 2023, but we can't be sure what [We haven't really talked about having a conflicted option for some tables before but it could make sense here.]
Finally, from a code review point of view, the same line of code is repeated, so line 175 needs to be removed.