Skip to content

Comments

[RFC] Safety limits#309

Open
bmcase wants to merge 37 commits intow3c:mainfrom
bmcase:safety_limits
Open

[RFC] Safety limits#309
bmcase wants to merge 37 commits intow3c:mainfrom
bmcase:safety_limits

Conversation

@bmcase
Copy link
Contributor

@bmcase bmcase commented Nov 12, 2025

Creating a PR to add safety limits to the Attribution spec. This is based primarily on the BigBird algorithm from Section 4 of this paper https://arxiv.org/pdf/2506.05290. Algorithm 2 is the main algorithm that encompass both budget deduction and safety limit deduction.

This PR is still WIP but ready for some initial review.

Intended to address this open issue #237


Preview | Diff

This adds the checks that need to happen on user action context, following Alg 2 of BigBird; not that it follows the latest version which has conversion check moved within the for loop over epochs.
in Algo 2 in Big Bird, safety limit deductions occur if and only if privacy budget also happens.
Thus going to put the safety limits into the deduct privacy budget function (renamed as deduct privacy and safety budgets).
@bmcase bmcase changed the title [WIP] Safety limits [RFC] Safety limits Dec 5, 2025
@bmcase
Copy link
Contributor Author

bmcase commented Dec 5, 2025

User actions quota counts -- for the spec I think we should stick closely to the paper on the safety limit quotas themselves, but for the user action quota counts I think we could do a simplification.

The paper partitions the quota counts for a single user action by impression and conversion quotas with conversion quotas further partitioned by epoch.

I think it would be simpler to just have one single quota count per user action. If we think that is okay for now I can simplify this PR a bit.

Let me know if you have any thoughts on this @apasel422 @mt @csharrison @andyleiserson.

mdmostakmia433-sys

This comment was marked as spam.

@mdmostakmia433-sys

This comment was marked as spam.

@bmcase
Copy link
Contributor Author

bmcase commented Dec 11, 2025

Notes from meeting:

  1. drop user action stores by many dimensions
  2. drop conversion site quota

Follow ups:

  1. set minimum recommended multipliers
  2. clear history
  3. locking for atomic transaction

bmcase added 10 commits February 6, 2026 10:06
remove conversion site quota and remove the store of user action contexts, replacing with a global boolean flag attached to the window
create function to calculate deductions for impression sites.
the simpler version was more than lacking optimizations; it would have under deducted in the single epoch but multiple impressions site case.
incorporate @apasel422 's feedback
@bmcase
Copy link
Contributor Author

bmcase commented Feb 10, 2026

@apasel422 thanks for the review! I think I incorporated all of you feedback

@bmcase
Copy link
Contributor Author

bmcase commented Feb 12, 2026

@martinthomson I updated the user activation check to throw an exception instead of return a boolean, if you want to look anymore at that. Replied on a couple open comment threads.

@apasel422 thanks for the second round of edits; I updated the PR with all of those except one I want to look into more.

adding more prose to describe what we are doing with requiring user activation and some limitations with that.
that could be maliciously triggered.


### Attribution API Activation ### {#s-api-activation}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martinthomson I added some more explanation to go with this section and links out to the HTML spec. Can you see if this captures what we want to say here?

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following the whole business of impression quota computation. It might even be wrong. I think that this could be a lot simpler in that area.

Comment on lines +1360 to +1361
1. Since calling the Attribution API consumes a user activation, the site would no longer have this
particular user activation to use for other APIs (e.g., opening popups).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thankfully, this problem can be fixed by doing the hard work of doing our own activation tracking, which I think is going to be necessary.

api.bs Outdated
Comment on lines 1379 to 1381
<p class=note>This approach allows a single user action to enable multiple
API invocations within the same session, while still requiring
an initial user gesture to activate the API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<p class=note>This approach allows a single user action to enable multiple
API invocations within the same session, while still requiring
an initial user gesture to activate the API.
<p class=note>This approach allows a single user action
to enable multiple API invocations within the same session,
while only making the API available to one site
per [=activation triggering input event|activation=].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where should we define this term activation triggering input event?

api.bs Outdated
Comment on lines 1255 to 1256
1. If the [=impression site quota store=] does not [=map/contain=] |impressionQuotaKey|
and |siteDeduction| is greater than the [=impression site quota per epoch=], return false.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the indent.

Also, the above comments.

in the [=privacy budget store=].
1. Let |epoch| be the [=epoch index=] component of |key|.

1. Let |sensitivity| be |l1Norm| if |l1Norm| is non-null, 2 * |value| otherwise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit duplicates the computation you have in the impression store bit. I think we can factor out a "compute privacy budget deduction" process for finding the number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't quite follow this suggestion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall algorithm appears to compute a deduction amount twice. That's the part that I think can be factored differently.

I think that - at least for now - we want to have a single number that is deducted from all active budgets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we sort of do compute the deduction twice overall, once for the individual sensitivity in the single epoch case and then again for a global sensitivity in the case of multiple epochs.

address mt PR feedback and remove impression site map though we might want it back in the future to support w3c#377
@martinthomson martinthomson linked an issue Feb 19, 2026 that may be closed by this pull request
Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more (major) problems. Sorry for not noticing these earlier.

Comment on lines +1092 to +1094
* The [=global privacy budget store=] records the state
of the per-[=epoch=] global [=privacy budget=]
that applies across all [=sites=].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that there is another bug here.

The global privacy store is being indexed by an epoch index that only has relevance on a per-site basis.

For sites, we don't really try to hide when epochs start, though we don't publish a value either. If we have a single global store, we have to have a single value for when the store starts.

The obvious thing to do is pick a starting point when the global store is first used, but that leads to an interesting question: if this value might leak, then how do we prevent that from being used as a supercookie? After all, when clearing state, we want to retain the global budget state so that clearing state doesn't make privacy worse.

Another option is to align the cycle to a fixed point in time that is the same for everyone. That might work. We don't expect the limit to ever be hit, except for the very few people who have unusually highly active conversion use across many sites. The effect on skewing results might then be diffuse enough that sites using the API won't need to consider it.

So that would be my suggestion. When accessing this, use a value derived from a fixed reference point (the unix epoch is available) rather than the site-specific value.

That leads me to the next problem...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is a good point. We've generally consider site-independent epoch start times as an option for epoch start time on the per-site budgets, but that doesn't let you align them all with the global epoch.

Fixing the global epoch for everyone and keeping per-site epoch start times independent seems reasonable. It probably throws some issues in the theory proofs, but at least it's implementable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continuing the discussion on #386

Comment on lines +1096 to +1097
* The [=impression site quota store=] records the state
of per-[=impression site=] and per-[=epoch=] quota [=privacy budgets=].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The impression site quota store is indexed incorrectly as well. It uses the per-site "privacy budget key" and epoch.

For this, I think that we need to consider extensions to the epoch start store for impressions.

The alternative would be to reuse the epoch start store, so that impression site quotas refresh at exactly the same time as the per-site budget for that site. I can't see why that would be wrong, at least offhand. However, the quota is a cross-site store, which means it might require safeguards we haven't really considered for the per-site store. We essentially need to prevent the value from leaking, which isn't something we really try to do for the per-site budget (because it's just a random number that we generate for each site).

Either way, implementing this is tricky, because the key to this store will end up covering different periods. What is a single epoch query for one site might cover two epochs in the quota store.

Copy link
Contributor Author

@bmcase bmcase Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right the impression site quotas should be indexed by the epoch in which the impression(s) were stored. That should be doable to adjust; we need to get the epochs for every impression for a site.

We essentially need to prevent the value from leaking

Do you mean we don't want to leak the start time of the per-impression site quota epoch start time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. The time that the per-impression site quota epoch starts will be the same across websites, which means that it will be a unique identifier for a browser.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this, there are two things to work out:

  1. When to start each epoch. As a safeguard, we might assume that it isn't operating often, so the actual alignment of epochs won't need to be evenly distributed in the same way that we distribute the per-site budget. In that spirit, we might be able to do the same as what is done for the global budget: align it to a fixed point. Like the global budget, that can't be a per-browser fixed value, as that risks leaking a per-browser identifier, but we might be able to fix a value in the spec.
  2. What to do about the "single epoch" queries that don't end up hitting a single epoch against the per-impression site quota. I'm less sure about this part. It's tempting to suggest that we ignore the inter-epoch interactions for the quota. That would mean that if the per-site budget believes something to be single-quota, then that would affect impression site quotas less than if the per-site budget had to span two epochs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time that the per-impression site quota epoch starts will be the same across websites, which means that it will be a unique identifier for a browser.

But we would never expose this directly to any websites; it's just private internal state for the browser. They could try to learn about it through DP queries indirectly.

In that spirit, we might be able to do the same as what is done for the global budget: align it to a fixed point.

Yes, setting the quota start times to be the same as the global makes sense as they are quotas for dispersing it so makes sense to have them aligned with the global budget.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what I said earlier

Yes, you're right the impression site quotas should be indexed by the epoch in which the impression(s) were stored. That should be doable to adjust; we need to get the epochs for every impression for a site.

I think we are doing this already. because in do attribution and fill a histogram we loop over all epochs in the attribution window, For each |epoch| from |startEpoch| to |currentEpoch|, inclusive:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we are doing right now is essentially to assume that all epochs are on the same cadence: per-site, global, quotas. That's the only way this for loop makes sense from |startEpoch| to |currentEpoch|.

If we change to have epoch cadence as:

  1. global, impression quotas on a fixed cadence
  2. per-site on a site independent cadence starting from a first visit

they I think we will need some way to map the attribution window look back into the |epochs| for each per-site budget that would be considered.

Is it possible to just keep everything on a fixed cadence for now an punt on having per-site epochs being independent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fixed cadence for quotas and global is easiest, but you'll have to restructure the code a little. It will need the time when you are accessing multiple stores.

The basic rule is: Every time you index one of the epoch-based stores, you will need to translate |now| into an epoch specific to that store.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Overly conservative check and deduct for budget Add global privacy budget and per-impression-site quotas

4 participants