Conversation
This adds the checks that need to happen on user action context, following Alg 2 of BigBird; not that it follows the latest version which has conversion check moved within the for loop over epochs.
in Algo 2 in Big Bird, safety limit deductions occur if and only if privacy budget also happens. Thus going to put the safety limits into the deduct privacy budget function (renamed as deduct privacy and safety budgets).
|
User actions quota counts -- for the spec I think we should stick closely to the paper on the safety limit quotas themselves, but for the user action quota counts I think we could do a simplification. The paper partitions the quota counts for a single user action by impression and conversion quotas with conversion quotas further partitioned by epoch. I think it would be simpler to just have one single quota count per user action. If we think that is okay for now I can simplify this PR a bit. Let me know if you have any thoughts on this @apasel422 @mt @csharrison @andyleiserson. |
This comment was marked as spam.
This comment was marked as spam.
|
Notes from meeting:
Follow ups:
|
remove conversion site quota and remove the store of user action contexts, replacing with a global boolean flag attached to the window
create function to calculate deductions for impression sites.
the simpler version was more than lacking optimizations; it would have under deducted in the single epoch but multiple impressions site case.
incorporate @apasel422 's feedback
|
@apasel422 thanks for the review! I think I incorporated all of you feedback |
|
@martinthomson I updated the user activation check to throw an exception instead of return a boolean, if you want to look anymore at that. Replied on a couple open comment threads. @apasel422 thanks for the second round of edits; I updated the PR with all of those except one I want to look into more. |
adding more prose to describe what we are doing with requiring user activation and some limitations with that.
| that could be maliciously triggered. | ||
|
|
||
|
|
||
| ### Attribution API Activation ### {#s-api-activation} |
There was a problem hiding this comment.
@martinthomson I added some more explanation to go with this section and links out to the HTML spec. Can you see if this captures what we want to say here?
martinthomson
left a comment
There was a problem hiding this comment.
I'm not following the whole business of impression quota computation. It might even be wrong. I think that this could be a lot simpler in that area.
| 1. Since calling the Attribution API consumes a user activation, the site would no longer have this | ||
| particular user activation to use for other APIs (e.g., opening popups). |
There was a problem hiding this comment.
Thankfully, this problem can be fixed by doing the hard work of doing our own activation tracking, which I think is going to be necessary.
api.bs
Outdated
| <p class=note>This approach allows a single user action to enable multiple | ||
| API invocations within the same session, while still requiring | ||
| an initial user gesture to activate the API. |
There was a problem hiding this comment.
| <p class=note>This approach allows a single user action to enable multiple | |
| API invocations within the same session, while still requiring | |
| an initial user gesture to activate the API. | |
| <p class=note>This approach allows a single user action | |
| to enable multiple API invocations within the same session, | |
| while only making the API available to one site | |
| per [=activation triggering input event|activation=]. |
There was a problem hiding this comment.
where should we define this term activation triggering input event?
api.bs
Outdated
| 1. If the [=impression site quota store=] does not [=map/contain=] |impressionQuotaKey| | ||
| and |siteDeduction| is greater than the [=impression site quota per epoch=], return false. |
There was a problem hiding this comment.
Fix the indent.
Also, the above comments.
| in the [=privacy budget store=]. | ||
| 1. Let |epoch| be the [=epoch index=] component of |key|. | ||
|
|
||
| 1. Let |sensitivity| be |l1Norm| if |l1Norm| is non-null, 2 * |value| otherwise. |
There was a problem hiding this comment.
This bit duplicates the computation you have in the impression store bit. I think we can factor out a "compute privacy budget deduction" process for finding the number.
There was a problem hiding this comment.
don't quite follow this suggestion
There was a problem hiding this comment.
The overall algorithm appears to compute a deduction amount twice. That's the part that I think can be factored differently.
I think that - at least for now - we want to have a single number that is deducted from all active budgets.
There was a problem hiding this comment.
yeah, we sort of do compute the deduction twice overall, once for the individual sensitivity in the single epoch case and then again for a global sensitivity in the case of multiple epochs.
address mt PR feedback and remove impression site map though we might want it back in the future to support w3c#377
martinthomson
left a comment
There was a problem hiding this comment.
Some more (major) problems. Sorry for not noticing these earlier.
| * The [=global privacy budget store=] records the state | ||
| of the per-[=epoch=] global [=privacy budget=] | ||
| that applies across all [=sites=]. |
There was a problem hiding this comment.
I just realized that there is another bug here.
The global privacy store is being indexed by an epoch index that only has relevance on a per-site basis.
For sites, we don't really try to hide when epochs start, though we don't publish a value either. If we have a single global store, we have to have a single value for when the store starts.
The obvious thing to do is pick a starting point when the global store is first used, but that leads to an interesting question: if this value might leak, then how do we prevent that from being used as a supercookie? After all, when clearing state, we want to retain the global budget state so that clearing state doesn't make privacy worse.
Another option is to align the cycle to a fixed point in time that is the same for everyone. That might work. We don't expect the limit to ever be hit, except for the very few people who have unusually highly active conversion use across many sites. The effect on skewing results might then be diffuse enough that sites using the API won't need to consider it.
So that would be my suggestion. When accessing this, use a value derived from a fixed reference point (the unix epoch is available) rather than the site-specific value.
That leads me to the next problem...
There was a problem hiding this comment.
yeah this is a good point. We've generally consider site-independent epoch start times as an option for epoch start time on the per-site budgets, but that doesn't let you align them all with the global epoch.
Fixing the global epoch for everyone and keeping per-site epoch start times independent seems reasonable. It probably throws some issues in the theory proofs, but at least it's implementable.
| * The [=impression site quota store=] records the state | ||
| of per-[=impression site=] and per-[=epoch=] quota [=privacy budgets=]. |
There was a problem hiding this comment.
The impression site quota store is indexed incorrectly as well. It uses the per-site "privacy budget key" and epoch.
For this, I think that we need to consider extensions to the epoch start store for impressions.
The alternative would be to reuse the epoch start store, so that impression site quotas refresh at exactly the same time as the per-site budget for that site. I can't see why that would be wrong, at least offhand. However, the quota is a cross-site store, which means it might require safeguards we haven't really considered for the per-site store. We essentially need to prevent the value from leaking, which isn't something we really try to do for the per-site budget (because it's just a random number that we generate for each site).
Either way, implementing this is tricky, because the key to this store will end up covering different periods. What is a single epoch query for one site might cover two epochs in the quota store.
There was a problem hiding this comment.
Yes, you're right the impression site quotas should be indexed by the epoch in which the impression(s) were stored. That should be doable to adjust; we need to get the epochs for every impression for a site.
We essentially need to prevent the value from leaking
Do you mean we don't want to leak the start time of the per-impression site quota epoch start time?
There was a problem hiding this comment.
That's right. The time that the per-impression site quota epoch starts will be the same across websites, which means that it will be a unique identifier for a browser.
There was a problem hiding this comment.
Thinking more about this, there are two things to work out:
- When to start each epoch. As a safeguard, we might assume that it isn't operating often, so the actual alignment of epochs won't need to be evenly distributed in the same way that we distribute the per-site budget. In that spirit, we might be able to do the same as what is done for the global budget: align it to a fixed point. Like the global budget, that can't be a per-browser fixed value, as that risks leaking a per-browser identifier, but we might be able to fix a value in the spec.
- What to do about the "single epoch" queries that don't end up hitting a single epoch against the per-impression site quota. I'm less sure about this part. It's tempting to suggest that we ignore the inter-epoch interactions for the quota. That would mean that if the per-site budget believes something to be single-quota, then that would affect impression site quotas less than if the per-site budget had to span two epochs.
There was a problem hiding this comment.
The time that the per-impression site quota epoch starts will be the same across websites, which means that it will be a unique identifier for a browser.
But we would never expose this directly to any websites; it's just private internal state for the browser. They could try to learn about it through DP queries indirectly.
In that spirit, we might be able to do the same as what is done for the global budget: align it to a fixed point.
Yes, setting the quota start times to be the same as the global makes sense as they are quotas for dispersing it so makes sense to have them aligned with the global budget.
There was a problem hiding this comment.
In what I said earlier
Yes, you're right the impression site quotas should be indexed by the epoch in which the impression(s) were stored. That should be doable to adjust; we need to get the epochs for every impression for a site.
I think we are doing this already. because in do attribution and fill a histogram we loop over all epochs in the attribution window, For each |epoch| from |startEpoch| to |currentEpoch|, inclusive:
There was a problem hiding this comment.
I think what we are doing right now is essentially to assume that all epochs are on the same cadence: per-site, global, quotas. That's the only way this for loop makes sense from |startEpoch| to |currentEpoch|.
If we change to have epoch cadence as:
- global, impression quotas on a fixed cadence
- per-site on a site independent cadence starting from a first visit
they I think we will need some way to map the attribution window look back into the |epochs| for each per-site budget that would be considered.
Is it possible to just keep everything on a fixed cadence for now an punt on having per-site epochs being independent?
There was a problem hiding this comment.
A fixed cadence for quotas and global is easiest, but you'll have to restructure the code a little. It will need the time when you are accessing multiple stores.
The basic rule is: Every time you index one of the epoch-based stores, you will need to translate |now| into an epoch specific to that store.
Creating a PR to add safety limits to the Attribution spec. This is based primarily on the BigBird algorithm from Section 4 of this paper https://arxiv.org/pdf/2506.05290. Algorithm 2 is the main algorithm that encompass both budget deduction and safety limit deduction.
This PR is still WIP but ready for some initial review.
Intended to address this open issue #237
Preview | Diff