Skip to content
Merged
55 changes: 55 additions & 0 deletions api.bs
Original file line number Diff line number Diff line change
Expand Up @@ -2338,6 +2338,45 @@ for a number of reasons:
Without allocating [=privacy budget=] for new data,
sites could exhaust their budget forever.

### Formal Analysis of Privacy Properties and Their Limitations ### {#formal-analysis}
The paper [[PPA-DP-2]] provides formal analysis of the mathematical privacy guarantees
afforded by *per-site budgets* and by *safety limits*. Per-site
budgets include [=site=] in the [=privacy unit=], whereas safety limits exclude it
thereby enforcing a global individual DP guarantee.

The analysis shows that *per-site individual DP guarantees* hold under a restricted system
model that makes two assumptions, which may not always be satisfied in practice:

1. *No cross-site adaptivity in data generation.* A site’s queryable data stream (impressions
and conversions) must be generated independently of past DP query results from other sites.
1. *No leakage through cross-site shared limits.* Queries from one site must not affect which
reports are emitted to others.

Assumption 1 is necessary because the system involves multiple sites that could interact
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assumption 1 is necessary because the system involves multiple sites that could interact
The assumption that sites cannot adapt their queries is necessary
because the system involves multiple sites that could interact

with the same user over time and change the ads they show to the user, or impact the
conversions the user has, based on each other’s DP measurements. For example, if one advertiser
learns, from DP measurements, to make an ad more effective, a user may convert on their site
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
learns, from DP measurements, to make an ad more effective, a user may convert on their site
learns generally-applicable information that helps them make their ads more effective,
that will make it more likely that their ads are attributed for conversions,
as opposed to a competitor.

Here, "DP measurements" refers to measureConversion() as well. Watch out for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a case where we are talking about aggregate results from many devices that you get back from the aggregation service.

I feel we need some sort of term for this in the spec as for what to call the final aggregate DP results that go back to the advertiser. Maybe "query" is too general but something like "DP attribution results" or "DP measurements" should be clear and maybe we need to define that somewhere in the intro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a stab at defining "attribution result" in the intro and trying to use this whenever we mean the final outputs learned by the advertiser.

rather than a competitor’s. In this case, the first site’s DP outputs -- counted only against
its own per-site budget -- alter the data (or absence of data) visible to the competitor, yet
this impact is not reflected in the competitor’s per-site budget. When Assumption 1 is violated,
the analysis shows that per-site guarantees cannot be achieved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the assumption, but I think that the main challenge here is different. Sites might gain an understanding that a particular visitor to each site in a set is the same person (due to federated login, same email address, or anything including navigation tracking which we can't stop). AND THEN they decide to pool their per-site budgets to use the API to extract more information about that person. In that case, we have no defense from the per-site budget. Sites are only limited by their ability to link activity across sites (which is too easy, as noted) and then the global budget.

So we should acknowledge that limitation as well as the more theoretical one here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martinthomson I added a paragraph at the end of this section to capture this additional challenge an adversary is faced with. Let me know if you think that still needs any adjusting.

In addition to facing safety limits discussed above, an attacker using multiple colluding sites to gain
more DP budget about users also face the practical limitation of being able to link a user across sites.
This is limitation does not itself provide a theoretical DP benefit but does impose a significant
challenge to the attacker when the user agent has made it difficult to link users across sites.


Assumption 2 is necessary when we have shared limits that span multiple sites. An example of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assumption 2 is necessary when we have shared limits that span multiple sites. An example of
An assumption that sites are unable to coordinate their use of the API is necessary
when we have shared limits that span multiple sites. An example of

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm trying to say here is just that if you want to have shared limits you have to make Assumption 2 for the per-site budgets to hold.

such shared limits are the global safety limits that aim to provide a global DP guarantee.
If queries from some sites cause a shared limit to be reached, reports to other sites may be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, queries... If you want to use that word, it might be necessary to explain it up front.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we can use measureConversion as this is talking about what happens on a single device.

filtered, creating dependencies across separate per-site privacy units and affecting the validity
of the per-site guarantees. Thus, care must be taken when introducing any new shared limit, such
as cross-site rate limiters on privacy loss. If only Assumption 2 is violated, it is unknown whether
per-site guarantees can still be preserved, for example via special designs of the shared limits.

These results suggest that per-site protections should be regarded as theoretically grounded approximations
of an ideal per-site individual DP guarantee that can be established only under certain assumptions.
The extent to which privacy protection from per-site budgets may be impacted in practice remains unknown.

By contrast, the analysis shows that *safety limits* -- which operate at global level,
excluding [=site=] from the [=privacy unit=] -- can be implemented to deliver *sound global individual
DP guarantees* regardless of whether either assumption is satisfied.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider defining this as [=global safety limits=], per above.

Suggested change
By contrast, the analysis shows that *safety limits* -- which operate at global level,
excluding [=site=] from the [=privacy unit=] -- can be implemented to deliver *sound global individual
DP guarantees* regardless of whether either assumption is satisfied.
The analysis shows that [=global safety limits=] --
which do not have a [=site=]-specific [=privacy unit=] --
deliver sound individual DP guarantees
regardless without relying on either of these assumptions.

Importantly, after introducing the analyses and some context, this is the first thing I would say. It's a simple statement that is easy to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, we can move this up top if we want to start with what holds without any assumptions.



### Browser Instances ### {#dp-instance}

Expand Down Expand Up @@ -3139,6 +3178,22 @@ spec:structured header; type:dfn; urlPrefix: https://httpwg.org/specs/rfc9651;
"title": "Cookie Monster: Efficient On-device Budgeting for Differentially-Private Ad-Measurement Systems",
"publisher": "SOSP'24"
},
"ppa-dp-2": {
"authors": [
"Pierre Tholoniat",
"Alison Caulfield",
"Giorgio Cavicchioli",
"Mark Chen",
"Nikos Goutzoulias",
"Benjamin Case",
"Asaf Cidon",
"Roxana Geambasu",
"Mathias Lécuyer",
"Martin Thomson"
],
"href": "https://arxiv.org/abs/2506.05290",
"title": "Big Bird: Privacy Budget Management for W3C's Privacy-Preserving Attribution API",
},
"prio": {
"authors": [
"Henry Corrigan-Gibbs",
Expand Down
Loading