SoylentNews Comments | 'Anonymised' Data Can Never be Totally Anonymous, Says Study

'Anonymised' Data Can Never be Totally Anonymous, Says Study

posted by chromas on Monday July 29 2019, @09:09AM

from the submitted-anonymously dept.

'Anonymised' data can never be totally anonymous, says study

"Anonymised" data lies at the core of everything from modern medical research to personalised recommendations and modern AI techniques. Unfortunately, according to a paper, successfully anonymising data is practically impossible for any complex dataset.
An anonymised dataset is supposed to have had all personally identifiable information removed from it, while retaining a core of useful information for researchers to operate on without fear of invading privacy. For instance, a hospital may remove patients' names, addresses and dates of birth from a set of health records in the hope researchers may be able to use the large sets of records to uncover hidden links between conditions.
But in practice, data can be deanonymised in a number of ways. In 2008, an anonymised Netflix data set of film ratings was deanonymised by comparing the ratings with public scores on the IMDb film website in 2014; the home addresses of New York taxi drivers were uncovered from an anonymous data set of individual trips in the city; and an attempt by Australia's health department to offer anonymous medical billing data could be reidentified by cross-referencing "mundane facts" such as the year of birth for older mothers and their children, or for mothers with many children.
Now researchers from Belgium's Université catholique de Louvain (UCLouvain) and Imperial College London have built a model to estimate how easy it would be to deanonymise any arbitrary dataset. A dataset with 15 demographic attributes, for instance, "would render 99.98% of people in Massachusetts unique". And for smaller populations, it gets easier: if town-level location data is included, for instance, "it would not take much to reidentify people living in Harwich Port, Massachusetts, a city of fewer than 2,000 inhabitants".

Original Submission

Starting Score:

point

Moderation

Interesting=1, Informative=1, Total=2

Extra 'Interesting' Modifier

Karma-Bonus Modifier

Total Score:

This discussion has been archived. No new comments can be posted.

'Anonymised' Data Can Never be Totally Anonymous, Says Study | Log In/Create an Account | Top | 6 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Re:Stop collecting the data! (Score: 4, Interesting) by BsAtHome on Monday July 29 2019, @10:21AM

by BsAtHome (889) on Monday July 29 2019, @10:21AM (#872572)

Yes, stop collecting data. That will work,... not. The whole point of Big Data is that you can do stuff like identifying on basis of partial data. See how Paul revere [kieranhealy.org] is identified using a very sparse data set.

The simple fact is, everything leaves a trace. It is the combination of traces that are the danger. You cannot prevent data from being left at all kinds of places. It is the combination of the data, that makes it dangerous. The paper [nature.com] (open access) makes it very clear that only subsets are required. Therefore, maybe we should be focusing on how to stifle the statistics. What kind of data/noise will prevent successful identification?

Parent

Starting Score:	1		point
Moderation		+2
Interesting=1, Informative=1, Total=2
Extra 'Interesting' Modifier		0
Karma-Bonus Modifier		+1

Total Score:		4

Moderator Help

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

'Anonymised' Data Can Never be Totally Anonymous, Says Study

Re:Stop collecting the data! (Score: 4, Interesting) by BsAtHome on Monday July 29 2019, @10:21AM