Talk:Differential privacy

	This article is within the scope of WikiProject Cryptography, a collaborative effort to improve the coverage of Cryptography on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.CryptographyWikipedia:WikiProject CryptographyTemplate:WikiProject CryptographyCryptography articles
???	This article has not yet received a rating on the importance scale.
	This article is supported by WikiProject Computer science.

Mass surveillance

	Differential privacy is within the scope of WikiProject Mass surveillance, which aims to improve Wikipedia's coverage of mass surveillance and mass surveillance-related topics. If you would like to participate, visit the project page, or contribute to the discussion.Mass surveillanceWikipedia:WikiProject Mass surveillanceTemplate:WikiProject Mass surveillanceMass surveillance articles
???	This article has not yet received a rating on the project's importance scale.

Content could be extracted[edit]

Content could be extracted from another source based on the format and also the unique massive initial edit. Utopiah (talk) 19:04, 16 January 2010 (UTC)[reply]

@ 89.189.87.253 (talk) 21:07, 5 December 2023 (UTC)[reply]

Style Guide[edit]

I doubt that the style guide should be part of an encyclopedic article on differential privacy, but it still might be useful to follow the suggested capitalization rules for article writers. If I hear no objections, I'm planning to remove the style guide from the article and add it here. 141.201.13.113 (talk) 12:39, 26 November 2018 (UTC)[reply]

"coins of the algorithm"[edit]

What does "coins of the algorithm" mean? Is this correct usage? 129.6.223.113 (talk) 22:36, 27 February 2015 (UTC)[reply]

It is correct usage within the cryptography community, but I agree that it is super-confusing. I'll be getting it out over the next few weeks, as I continue in my total rewrite of this section.Simsong (talk) 02:30, 18 September 2018 (UTC)[reply]

Please finish example[edit]

Please finish the example and show how to make the diabetes database differentially private. — Preceding unsigned comment added by 129.6.223.113 (talk) 22:40, 27 February 2015 (UTC)[reply]

I'm probably going to replace this with a different example, one that is less loaded and easier to understand.Simsong (talk) 02:31, 18 September 2018 (UTC)[reply]

lay people cannot read the formular[edit]

the example should be understandable without the mathematical formular. — Preceding unsigned comment added by 87.78.208.167 (talk) 22:02, 13 August 2015 (UTC)[reply]

Agreed. After reading the article I still have no idea what differential privacy is or how it is going to counter the given de-anonymization efforts. 84.245.149.53 (talk) 13:39, 31 August 2016 (UTC)[reply]

2001:4898:80E8:C:0:0:0:563 (talk) 18:44, 2 May 2017 (UTC) In particular, I found the equation (1/4)(1-p) + (3/4)p = (1/4) + p/2 to be weird. The (1/4) + p/2 is reasonably obvious just from the description of the coin flips, but the longer equation on the left side isn't obvious at all. Worse, the obvious equation people want is not, given a p, what is the result, but rather, given the result, what is the p.[reply]

Perhaps this is a better formulation: Thus, if p is the true proportion of people with A, we can expect an actual result of 1/4 just from getting a tails on the first coin flip and then a tails, and p/2 when we get a heads on the first coin flip. Reversing the equation, given an actual result R, the best estimate of p is (R - 1/4) * 2.

2001:4898:80E8:C:0:0:0:563 (talk) 18:44, 2 May 2017 (UTC)[reply]

Definition of ε-stability[edit]

The definition of ε-stability assumes two datasets, $D_{1}$ and $D_{2}$ , that differ only on a single element. It is not specified what difference this should be, wheter that element is present or absent in either set or whether its attributes are simply different. However, the formula for the exact definition explicitly establishes an ordering in which the probability related to $D_{1}$ is bounded by that related to $D_{2}$ . Why this ordering?

Besides, I agree with some above statemente that the term "the coins of the algorithm" is not clear. Maybe a link to the wikipedia page that clarifies this would help. Elferdo (talk) 08:36, 19 August 2015 (UTC)[reply]

another famous example?[edit]

AOL search logs - does this count as another example? — Preceding unsigned comment added by Adsah98 (talk • contribs) 17:02, 7 February 2016 (UTC)[reply]

No, it is not a good example. It should not be in this article. It should be in an article on de-identification. The original author of this article was confused between the two concepts. Simsong (talk) 02:32, 18 September 2018 (UTC)[reply]

Link to reference not working anymore[edit]

The link to reference 21, differential privacy at iOS, does not work anymore. 185.87.72.149 (talk) 14:17, 21 August 2017 (UTC)[reply]

PATE algorithm and utility/privacy trade-off[edit]

The authors of the PATE algorithm (https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/0e08bda44d22e076d15edc45afcb2e1a7a231a84.pdf) claim that their approach can answer an unlimited number of queries, by training an ancillary model with a finite number of privacy preserving queries to an ensemble of models (the ancillary model gets only an obfuscated consensus among the members of the ensemble). Further queries to the ancillary model, according to them, do no imply additional loss of privacy. This seems to me to increase the applicability of DP.

Do we really want this article to reference every DP algorithm? There are hundreds of them now. Simsong (talk) 02:33, 18 September 2018 (UTC)[reply]

Synopsis[edit]

The current version of this differential privacy article begins by attributing differential privacy to a patent application by Dwork and McSherry.

The correct attribution is:

Dwork C., McSherry F., Nissim K., Smith A. (2006) Calibrating Noise to Sensitivity in Private Data Analysis. In: Halevi S., Rabin T. (eds) Theory of Cryptography. TCC 2006. Lecture Notes in Computer Science, vol 3876. Springer, Berlin, Heidelberg^[1]

In terms of the date, the submission deadline for this publication was September 6, 2005.^[2] That is over 3 months before the patent was submitted.

After the differential privacy took off, this original paper was revised and republished in 2017.^[3] In the 2017 version, the history of differential privacy is spelled out: "In the initial version of this paper, differential privacy was called indistinguishability. The name “differential privacy” was suggested by Michael Schroeder, and was first used in Dwork (2006)." (page 13, section 2.2)

The odd history with the name probably explains the confusion, but the wikipedia article should be corrected. --73.93.186.114 (talk) 05:56, 29 January 2019 (UTC)[reply]

I have deleted the synopsis. After removing the inaccurate info, there was nothing left. 73.93.186.114 (talk) 04:51, 30 January 2019 (UTC)[reply]

References

Differential privacy doesn't contradict the "coin toss" example[edit]

While it is true that differential privacy is sometimes considered to be design to protect the identity of the participants of the database, it is more of a design question than a property of differential privacy. The core difference rises from the notion of a neighboring database. One can define the neighboring databases as those where the only change is in the users' private information, and not in their identifying information (a reasonable use case being where the identity of the participants is widely known). Furthermore, a common variation of differential privacy, known as local differential privacy, talks about mechanisms that behave exactly like the "coin flip" mechanism, i.e. where users randomize responses themselves (instead of, for example, a trusted curator). Finally, the concept that differential privacy requires combining the data to a single output is misinterpreted in this paragraph, as this hints that things like synthetic databases do not conform to the differential privacy requirements, which is visibley not true, as the differential privacy techniques for generating a synthetic database is commonly studied and there exist provably working solutions for this problem. Vexlerneil (talk) 16:15, 4 April 2019 (UTC)[reply]

What algorithms are we talking about? Do you mean database queries? You mean omission of variables is an algorithm? Are you sure they are "algorithms"?[edit]

Another way to describe differential privacy is as a constraint on the algorithms used to publish aggregate information about a statistical database which limits the disclosure of private information of records whose information is in the database. 68.134.243.51 (talk) 14:22, 7 August 2022 (UTC)[reply]

Hypman[edit]

Hype 102.90.45.231 (talk) 11:18, 21 March 2023 (UTC)[reply]

@ 2A01:5EC0:1801:C567:FD00:AA66:596E:C57D (talk) 07:14, 26 March 2023 (UTC)[reply]

This article should not contain mathematical proofs; it is too technical[edit]

Wikipedia is not a math textbook. This article contains mathematical proofs of things like the Laplace notation, and it is far too technical. It needs dramatic simplification to be useful to the general wikipedia reader. Simsong (talk) 16:57, 30 September 2023 (UTC)[reply]

Randomized response should not be the first example of using DP[edit]

DP does really bad in local modem and randomized response. We should have better examples as the initial examples. Simsong (talk) 16:57, 30 September 2023 (UTC)[reply]

[1] ttps://link.springer.com/chapter/10.1007%2F11681878_14

[2] ttps://www.iacr.org/workshops/tcc2006/cfp.html

[3] ttps://journalprivacyconfidentiality.org/index.php/jpc/article/view/405

[1]

[2]

[3]