Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Tuesday January 08 2019, @05:11PM   Printer-friendly
from the making-a-difference dept.

Submitted via IRC for takyon

Can a set of equations keep U.S. census data private?

The U.S. Census Bureau is making waves among social scientists with what it calls a "sea change" in how it plans to safeguard the confidentiality of data it releases from the decennial census.

The agency announced in September 2018 that it will apply a mathematical concept called differential privacy to its release of 2020 census data after conducting experiments that suggest current approaches can't assure confidentiality. But critics of the new policy believe the Census Bureau is moving too quickly to fix a system that isn't broken. They also fear the changes will degrade the quality of the information used by thousands of researchers, businesses, and government agencies.

The move has implications that extend far beyond the research community. Proponents of differential privacy say a fierce, ongoing legal battle over plans to add a citizenship question to the 2020 census has only underscored the need to assure people that the government will protect their privacy.

[...] Differential privacy, first described in 2006, isn't a substitute for swapping and other ways to perturb the data. Rather, it allows someone—in this case, the Census Bureau—to measure the likelihood that enough information will "leak" from a public data set to open the door to reconstruction.

"Any time you release a statistic, you're leaking something," explains Jerry Reiter, a professor of statistics at Duke University in Durham, North Carolina, who has worked on differential privacy as a consultant with the Census Bureau. "The only way to absolutely ensure confidentiality is to release no data. So the question is, how much risk is OK? Differential privacy allows you to put a boundary" on that risk.

A database can be considered differentially protected if the information it yields about someone doesn't depend on whether that person is part of the database. Differential privacy was originally designed to apply to situations in which outsiders make a series of queries to extract information from a database. In that scenario, each query consumes a little bit of what the experts call a "privacy budget." After that budget is exhausted, queries are halted in order to prevent database reconstruction.

In the case of census data, however, the agency has already decided what information it will release, and the number of queries is unlimited. So its challenge is to calculate how much the data must be perturbed to prevent reconstruction.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Interesting) by ikanreed on Tuesday January 08 2019, @07:12PM (2 children)

    by ikanreed (3164) Subscriber Badge on Tuesday January 08 2019, @07:12PM (#783800) Journal

    The census as currently conducted typically reports a large amount of information about individuals, demographics, job, family relationships, whether you've spent time in school or in jail in the last year, all of which one might imagine would be private to some people

    But the census before the most recent one, that is to say the 2000 one, included what was known as the "long form" where they collect all sorts of details that definitely could be considered private. They decided to detach that from the politicized process of counting population.

    Instead the Census now has a separate full-interview-based survey process that collects that kind of detailed information. Information such as:

    • Industry and occupation
    • Type of worker, and employer
    • Health insurance purchasing and usage
    • Details related to race and hispanic origin
    • Retirement plans
    • Weeks worked per year
    • How you commute
    • How you communicate and telephone usage
    • How you use the internet
    • Your relationship status
    • A cognitive abilities test
    • Wages
    • Parental place of birth and history
    • Income from rental property

    All of which they promise to keep anonymized. Every one of those you could imagine being sensitive. So yeah, what country you live in(also what state, county, city, district, since that's how they build the congressional districts), but also a shit-ton of stuff you might want private.

    Starting Score:    1  point
    Moderation   +1  
       Interesting=1, Total=1
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 0) by Anonymous Coward on Wednesday January 09 2019, @10:59AM (1 child)

    by Anonymous Coward on Wednesday January 09 2019, @10:59AM (#784048)

    Just lie.
    That's what Aussies do now on the census.
    The data is complete junk. But, who cares. The ABS can't be trusted. The government can't be trusted. Who would be stupid enough to give enough private information to a government body that admits it sells the data to third parties for a profit?

    • (Score: 2) by ikanreed on Wednesday January 09 2019, @03:47PM

      by ikanreed (3164) Subscriber Badge on Wednesday January 09 2019, @03:47PM (#784143) Journal

      Let me get idea straight. You conjecture that anonimyzing data ruins its utility, so therefor create intentionally bad data?