Skip to content

Implemented fixes based on identified issues#8

Merged
qiangao7 merged 7 commits into
mainfrom
fix-covid-dq-updates
May 20, 2026
Merged

Implemented fixes based on identified issues#8
qiangao7 merged 7 commits into
mainfrom
fix-covid-dq-updates

Conversation

@qiangao7

Copy link
Copy Markdown
Contributor
  1. KP2 added to approval logic
  2. Added campaign_product_lookup table
  3. Merged product and campaign into a single product × campaign table
  4. Removed old product and campaign tables
  5. Added campaign × interval bin table
  6. Removed redundant same-day multiple product flag
  7. Updated flag total to records total

@qiangao7 qiangao7 requested a review from venexia May 19, 2026 10:28

@venexia venexia left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @qiangao7 - thank you for promptly turning around these edits. You have a duplicated line of code that needs correcting. Otherwise, I have just two queries:

  1. You filter to remove same_day_same_product records but this means all records get removed whereas I think we want to keep one and remove the others as duplicates - please can you confirm which of these you meant to do?
  2. Also, this morning, we discussed adding death date to the active patients filter (alongside (de)registration) to minimize the denominator issues, while acknowledging that some mismatch is to be expected as active patients is determined at a different time than when the records are fetched for pre-campaigns.

Comment thread analysis/covid/2_covid_data_quality.R Outdated
Comment on lines +174 to +175
filter(!flag_same_day_same_product) |> # exclude same-day multiple-record combinations
filter(!flag_same_day_same_product) |> # exclude same-day multiple-record combinations

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 174 is duplicated on line 175.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this removes all records that are flagged, but I think we want to keep one of the records and remove the others as duplicates?

@qiangao7 qiangao7 May 19, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging this, @venexia ! I was considering deduplicating same-day records by keeping just one record first, and dropping mixed product cases for now. Since mixed cases are a small proportion and can create odd interval patterns, it might be cleaner to exclude them at this stage.

At the same time, we could generate a separate table to look at mixed product patterns and use that to define a more comprehensive cleaning rule. Then in a later run, we can apply a consistent approach to all same-day records and get a more accurate interval pattern.

Do you think this sounds reasonable, or do you have any other thoughts?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @venexia! I think this is already covered in the active patient definition in fn_covid_data_quality (lines 125–127):
active_on_vax_date = registered_on_vax_date & (is.na(death_date) | death_date >= vax_date)

So this should already ensure patients are both registered and alive on the vaccination date — but please correct me if I’ve missed anything.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @qiangao7. I am a bit confused by your reply because it doesn't match what I think is happening in the code. For example, consider patient 1033 in the dummy data, who has both a same product/same day flag in one campaign and a mixed product/same day in another campaign.

Image

Same product/same day: The code is removing all records of the same product on the same day. So, for patient 1033, both their sanofigsk_B1 records get removed (this seems contrary to your comment above?), which means ultimately they have nothing recorded for the spring 2025 campaign. I think we want to keep one record as you suggest, as it is relevant to the interval calculation and vax_product is not contentious.

Mixed product/same day Patient 1033 also happens to have mixed products on the same day during the autumn 2023 campaign, and both are kept and intervals calculated - seemingly just in the order that they were in the original dataset (this also seems contrary to your comment above?). For an interval calculation, I would probably replace this with a single record with the common date and mark the vax_product as 'conflicted'. We want to know something happened in autumn 2023, but we can't be sure what [We haven't really talked about having a conflicted option for some tables before but it could make sense here.]

Finally, from a code review point of view, the same line of code is repeated, so line 175 needs to be removed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @venexia. Thanks - I think this was based on an earlier version. I updated the logic locally after my reply yesterday but wanted to wait for your confirmation before committing, so things got a bit crossed. Sorry for the confusion!

@qiangao7 qiangao7 requested a review from venexia May 20, 2026 13:07

@venexia venexia left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @qiangao7 - looks great, thank you for the quick fix!

Comment thread analysis/covid/2_covid_data_quality.R Outdated
Comment on lines +174 to +175
filter(!flag_same_day_same_product) |> # exclude same-day multiple-record combinations
filter(!flag_same_day_same_product) |> # exclude same-day multiple-record combinations

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @qiangao7. I am a bit confused by your reply because it doesn't match what I think is happening in the code. For example, consider patient 1033 in the dummy data, who has both a same product/same day flag in one campaign and a mixed product/same day in another campaign.

Image

Same product/same day: The code is removing all records of the same product on the same day. So, for patient 1033, both their sanofigsk_B1 records get removed (this seems contrary to your comment above?), which means ultimately they have nothing recorded for the spring 2025 campaign. I think we want to keep one record as you suggest, as it is relevant to the interval calculation and vax_product is not contentious.

Mixed product/same day Patient 1033 also happens to have mixed products on the same day during the autumn 2023 campaign, and both are kept and intervals calculated - seemingly just in the order that they were in the original dataset (this also seems contrary to your comment above?). For an interval calculation, I would probably replace this with a single record with the common date and mark the vax_product as 'conflicted'. We want to know something happened in autumn 2023, but we can't be sure what [We haven't really talked about having a conflicted option for some tables before but it could make sense here.]

Finally, from a code review point of view, the same line of code is repeated, so line 175 needs to be removed.

@qiangao7 qiangao7 merged commit 805f53b into main May 20, 2026
1 check passed
@qiangao7 qiangao7 deleted the fix-covid-dq-updates branch May 20, 2026 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants