Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Tuesday January 08 2019, @05:11PM   Printer-friendly
from the making-a-difference dept.

Submitted via IRC for takyon

Can a set of equations keep U.S. census data private?

The U.S. Census Bureau is making waves among social scientists with what it calls a "sea change" in how it plans to safeguard the confidentiality of data it releases from the decennial census.

The agency announced in September 2018 that it will apply a mathematical concept called differential privacy to its release of 2020 census data after conducting experiments that suggest current approaches can't assure confidentiality. But critics of the new policy believe the Census Bureau is moving too quickly to fix a system that isn't broken. They also fear the changes will degrade the quality of the information used by thousands of researchers, businesses, and government agencies.

The move has implications that extend far beyond the research community. Proponents of differential privacy say a fierce, ongoing legal battle over plans to add a citizenship question to the 2020 census has only underscored the need to assure people that the government will protect their privacy.

[...] Differential privacy, first described in 2006, isn't a substitute for swapping and other ways to perturb the data. Rather, it allows someone—in this case, the Census Bureau—to measure the likelihood that enough information will "leak" from a public data set to open the door to reconstruction.

"Any time you release a statistic, you're leaking something," explains Jerry Reiter, a professor of statistics at Duke University in Durham, North Carolina, who has worked on differential privacy as a consultant with the Census Bureau. "The only way to absolutely ensure confidentiality is to release no data. So the question is, how much risk is OK? Differential privacy allows you to put a boundary" on that risk.

A database can be considered differentially protected if the information it yields about someone doesn't depend on whether that person is part of the database. Differential privacy was originally designed to apply to situations in which outsiders make a series of queries to extract information from a database. In that scenario, each query consumes a little bit of what the experts call a "privacy budget." After that budget is exhausted, queries are halted in order to prevent database reconstruction.

In the case of census data, however, the agency has already decided what information it will release, and the number of queries is unlimited. So its challenge is to calculate how much the data must be perturbed to prevent reconstruction.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by Runaway1956 on Wednesday January 09 2019, @03:06AM

    by Runaway1956 (2926) Subscriber Badge on Wednesday January 09 2019, @03:06AM (#783958) Journal

    Supposing that the census bureau comes up with the bestest algorithms evah. They do perfect work. The next generation of researchers reverse engineers those algorithms - and we're right back where we started. Big corporations with lots of money to throw at the problem still identify the individuals who supplied information to the census.

    Meanwhile - those same corporations are busy mining data on their own, which can be correlated to the census data.

    It's just another arms race, which the common people can't possibly win.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2