2019-05-16 03:30:21
2019-05-16 03:30:21
2019-05-16 03:30:21
8581642

... show more

#### Differential Privacy

## Comparing Truncation to Differential Privacy

Traditional methods of data de-identification obscuredatavalues. For

example, you might truncate a date to just the year.

Differential privacy obscuresqueryvalues by injecting enough noise

## Comparing Truncation to Differential Privacy

Traditional methods of data de-identification obscuredatavalues. For

example, you might truncate a date to just the year.

Differential privacy obscuresqueryvalues by injecting enough noise

to keep from revealing information on an individual.

Let’s compare two approaches for de-identifying a person’s age:

truncation and differential privacy.## Truncation

First consider truncating birth date to year. For example, anyone born

between January 1, 1955 and December 31, 1955 would be recorded as being

born in 1955. This effectively produces a 100% confidence interval that

is one year wide.

Next we’ll compare this to a 95% confidence interval using

ε-differential privacy.## Differential privacy

Differential privacy adds noise in proportion to the sensitivity Δ of a

query. Here sensitivity means the maximum impact that one record could

have on the result. For example, a query that counts records has

sensitivity 1.

Suppose people live to a maximum of 120 years. Then in a database withnrecords [1], one person’s presence in or absence from the database

would make a difference of no more than 120/nyears, the worst case

corresponding to the extremely unlikely event of a database ofn-1

newborns and one person 120 year old.

The Laplace mechanism implements ε-differential privacy by adding noise

with a Laplace(Δ/ε) distribution, which in our example means

Laplace(120/nε).

A 95% confidence interval for a Laplace distribution with scaleb

centered at 0 is

[blog 0.05, –blog 0.05]

which is very nearly

[-3b, 3b].

In our caseb= 120/nε, and so a 95% confidence interval for the

noise we add would be [-360/nε, 360/nε].

Whenn= 1000 and ε = 1, this means we’re adding noise that’s usually

between -0.36 and 0.36, i.e. we know the average age to within about 4

months. But ifn= 1, our confidence interval is the true age ± 360.

Since this is wider than the a priori bounds of [0, 120], we’d

truncate our answer to be between 0 and 120. So we could query for the

age of an individual, but we’d learn nothing.

The width of our confidence interval is 720/ε, and so to get a

confidence interval one year wide, as we get with truncation, we would

set ε = 720. Ordinarily ε is much smaller than 720 in application, say

between 1 and 10, which means differential privacy reveals far less

information than truncation does.

Even if you truncate age todecaderather than year, this still

reveals more information than differential privacy provided ε < 72.## Related posts

[1]Ordinarily even the number of records in the database is kept

private, but we’ll assume here that for some reason we know the number

of rows a priori.

{width="1"

height="1"}

http://feedproxy.google.com/~r/TheEndeavour/~3/gfZQTYEQB0s/

#johndcook #Math #Privacy #ProbabilityandStatistics## Comparing Truncation to Differential Privacy

Differential privacy reveals much less information than truncating values under typical configurations.^{feedproxy.google.com}