SoylentNews Comments | Can a Set of Equations Keep U.S. Census Data Private?

Can a Set of Equations Keep U.S. Census Data Private?

posted by Fnord666 on Tuesday January 08 2019, @05:11PM

from the making-a-difference dept.

Submitted via IRC for takyon

Can a set of equations keep U.S. census data private?

The U.S. Census Bureau is making waves among social scientists with what it calls a "sea change" in how it plans to safeguard the confidentiality of data it releases from the decennial census.
The agency announced in September 2018 that it will apply a mathematical concept called differential privacy to its release of 2020 census data after conducting experiments that suggest current approaches can't assure confidentiality. But critics of the new policy believe the Census Bureau is moving too quickly to fix a system that isn't broken. They also fear the changes will degrade the quality of the information used by thousands of researchers, businesses, and government agencies.
The move has implications that extend far beyond the research community. Proponents of differential privacy say a fierce, ongoing legal battle over plans to add a citizenship question to the 2020 census has only underscored the need to assure people that the government will protect their privacy.

[...] Differential privacy, first described in 2006, isn't a substitute for swapping and other ways to perturb the data. Rather, it allows someone—in this case, the Census Bureau—to measure the likelihood that enough information will "leak" from a public data set to open the door to reconstruction.
"Any time you release a statistic, you're leaking something," explains Jerry Reiter, a professor of statistics at Duke University in Durham, North Carolina, who has worked on differential privacy as a consultant with the Census Bureau. "The only way to absolutely ensure confidentiality is to release no data. So the question is, how much risk is OK? Differential privacy allows you to put a boundary" on that risk.
A database can be considered differentially protected if the information it yields about someone doesn't depend on whether that person is part of the database. Differential privacy was originally designed to apply to situations in which outsiders make a series of queries to extract information from a database. In that scenario, each query consumes a little bit of what the experts call a "privacy budget." After that budget is exhausted, queries are halted in order to prevent database reconstruction.
In the case of census data, however, the agency has already decided what information it will release, and the number of queries is unlimited. So its challenge is to calculate how much the data must be perturbed to prevent reconstruction.

Original Submission

Starting Score:	1		point
Karma-Bonus Modifier		+1

Total Score:		2

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Can a Set of Equations Keep U.S. Census Data Private?

Doesn't matter, really (Score: 2) by Runaway1956 on Wednesday January 09 2019, @03:06AM