Why the search for a privacy-preserving data sharing mechanism is failing:
From banking to communication our modern, daily lives are driven by data with ongoing concerns over privacy. Now, a new EPFL paper published in Nature Computational Science argues that many promises made around privacy-preserving mechanisms will never be fulfilled and that we need to accept these inherent limits and not chase the impossible.
Data-driven innovation in the form of personalized medicine, better public services or, for example, greener and more efficient industrial production promises to bring enormous benefits for people and our planet and widespread access to data is considered essential to drive this future. Yet, aggressive data collection and analysis practices raise the alarm over societal values and fundamental rights.
As a result, how to widen access to data while safeguarding the confidentiality of sensitive, personal information has become one of the most prevalent challenges in unleashing the potential of data-driven technologies and a new paper from EPFL's Security and Privacy Engineering Lab (SPRING) in the School of Comupter and Communication Sciences argues that the promise that any data use is solvable under both good utility and privacy is akin to chasing rainbows.
Head of the SPRING Lab and co-author of the paper, Assistant Professor Carmela Troncoso, says that there are two traditional approaches to preserving privacy, "There is the path of using privacy preserving cryptography, processing the data in a decrypted domain and getting a result. But the limitation is the need to design very targeted algorithms and not just undertake generic computations."
The problem with this type of privacy-preserving technology, the paper argues, is that they don't solve one of the key problems most relevant to practitioners: how to share high-quality individual-level data in a manner that preserves privacy but allows analysts to extract a dataset's full value in a highly flexible manner.
The second avenue that attempts to solve this challenge is the anonymization of data—that is, the removal of names, locations and postcodes but, Troncoso argues, often the problem is the data itself. "There is a famous Netflix example where the company decided to release datasets and run a public competition to produce better 'recommendation' algorithms. It removed the names of clients but when researchers compared movie ratings to other platforms where people rate movies, they were able to de-anonymize people."
[...] Another key message of the paper is the idea of a slower, more controlled release of technology. Today, ultra-fast deployment is the norm with a "we'll fix it later" mentality if things go wrong, an approach that Troncoso believes is very dangerous, "We need to start accepting that there are limits. Do we really want to continue this data driven free for all where there is no privacy and with big impacts on democracy? It's like Groundhog Day, we've been talking about this for 20 years and the same thing is now happening with machine learning. We put algorithms out there, they are biased and the hope is that later they will be fixed. But what if they can't be fixed?"
Journal Reference:
Stadler, Theresa, Troncoso, Carmela. Why the search for a privacy-preserving data sharing mechanism is failing, Nature Computational Science (DOI: 10.1038/s43588-022-00236-x)
(Score: 4, Touché) by Anonymous Coward on Saturday June 04 2022, @05:27PM (1 child)
PII that can be sold is the most valuable part of the dataset. Preventing exactly that is the core of any meaningful privacy protection mechanism.
(Score: 5, Insightful) by Immerman on Sunday June 05 2022, @05:03AM
Yep, privacy preserving data sharing is like dry water - if anyone ever cracks it they'll make a killing, but the smart money is on it being an inherent oxymoron, and anyone selling it a con artist that doesn't actually care about the privacy side of the equation.
(Score: 5, Insightful) by Anonymous Coward on Saturday June 04 2022, @05:55PM (1 child)
There is loads of data locked behind paywalls, research journals come to mind. Sharing all data from scientific experiments is still not the norm.
Next there is a lot of data gathered by governments, environmental data comes to mind, which is also not still shared as the norm.
Attempts to make above datasets publicly available struggle a lot, but somehow when its MY PRIVATE data and that of other people, we shouldn't expect that we could possible keep that data private. Sharing MY personal data is apparently the norm now.
This paper is pure bullshit. Add some significant penalties to violating privacy and it will quickly become the norm to stop intruding on peoples privacy. (Significant == jail time or fines to bankrupt companies)
(Score: 2, Insightful) by Anonymous Coward on Sunday June 05 2022, @03:06AM
Fines are just the cost of doing business. Start jailing C-suite executives and you'll see real change.
(Score: 5, Insightful) by MIRV888 on Saturday June 04 2022, @05:57PM (4 children)
'but allows analysts to extract a dataset's full value in a highly flexible manner.'
Well there's your problem right there.
(Score: 1, Insightful) by Anonymous Coward on Saturday June 04 2022, @07:15PM
Exactly, what a load of shit. "We can't give away your personal data without someone getting access to it." Duh?
I was going to suggest that each person can have an encrypted (if they want) storage device and take their data with them, why don't individuals already get their own charts easily? Why must our personal info be stored in centralized files for a long time?
We need enhanced privacy laws like yesterday.
(Score: 2) by acid andy on Saturday June 04 2022, @07:23PM
Welcome to your
post truth dystopiaadvanced utopia, citizen.error count exceeds 100; stopping compilation
(Score: 1, Insightful) by Anonymous Coward on Saturday June 04 2022, @07:26PM (1 child)
Beat me to it. Darn.
What exactly is the need for this high-quality individual-level data, which isn't targeted advertising or surveillance? It's like insisting on 64-bit resolution ADCs in a direct-conversion AM radio. Just get along with modelling populations with an accuracy of 5%, and stop being Evil because you're desperate for a job in academia and you're scabby enough to enter the field.
(Score: 0) by Anonymous Coward on Sunday June 05 2022, @05:11AM
Well put.
For anyone who didn't get the analogy "insisting on 64-bit resolution ADCs in a direct-conversion AM radio", it's like using a 64-pound bowling ball when all you need is a 13-pounder. AmIRite?
(Score: 4, Insightful) by Runaway1956 on Saturday June 04 2022, @07:20PM (3 children)
There is no solution to the problem stated. None.
Now, let us restate the problem, and it's inherent solution.
"There is no good purpose in collecting user's data. The only purpose for data collection is to sell the data forward, for profit. Data collection needs to end."
Or, to state it more succinctly, data sharing is anathema to privacy.
Seriously, people, NO ONE needs all that data. Not your insurance agent, not your doctor, not the DMV, not Town Hall, not the state, and certainly not any of the businesses from which you purchase goods and services.
Just say "NO!" Do not supply data to anyone.
Marital status: "None of your business."
Annual income: "None of your business."
Education level: "None of your business."
Children living with you: "None of your business."
Age: "None of your business."
SSN: "None of your business."
Phone #: "None of your business."
And, on it goes.
“I have become friends with many school shooters” - Tampon Tim Walz
(Score: 5, Interesting) by HiThere on Saturday June 04 2022, @08:25PM (1 child)
There are LOTS of good purposes that have nothing to do with either advertising, personal identification or selling (except after the product is developed).
E.g. there is a lot of information in what genetic codes are common in people who get which diseases. That could be quite useful in diagnosis, and also possibly in researching the causes of the diseases, and thus how to treat them. (The simpler assumptions are often incorrect, but they're also often correct.) Some of these have been/are being researched, such as Thalassaemia. But they are extremely sensitive personal identifiers.
Now, how do you collect the data when you haven't yet identified the genetic correlation. (I say correlation, because you need a lot more evidence to go from that to cause.) Basically all you can do is sample the genetic codes of people who have the disease and relatives that don't. But that's highly identifying personal information.
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 2, Insightful) by Anonymous Coward on Sunday June 05 2022, @03:49AM
When data collection advocates talk about "extract(ing) a dataset's full value" they invariably mean monetizing it, regardless of what excuses they make, and the most profitable sales channels are invariably the most nefarious. Knowledge is power, and the more you know about a population the more power you have over them, and governments and large corporations already have far too much power. Giving them a panopticon would make that power unassailable and is thus a direct and material threat to democracy.
(Score: 0, Troll) by Anonymous Coward on Saturday June 04 2022, @10:15PM
Or, you could not help but talk about your self so much at the minutest details of your life are know to all Soylentils. Medical centre, on Wednesday? Australian shepherds? The wife says, you say? What's for dinner?
(Score: 1, Insightful) by Anonymous Coward on Saturday June 04 2022, @08:36PM (2 children)
i agree that data privacy (umbrella term here) in the digital and interconnected world (computer and their networks) is very very difficult. i'll go as far to liken the effort required to mining bitcoins ir worse.
srsly enforcing data privacy will require ever increasing cost (i see some are drolling over this prospect already).
i propose going the other way: there is no data privacy. non. zero. nada. if it (your "privat" data) gets put into a computer, it's public.
this way you can look up the prez health, the nsa directors credit card billings, how many trash bags your neighbour is buying etc etc.
only if we keep up the semblance of "data privacy", that some with insight will have leverage over those that don't.
if all data entered into computers is automatically PUBLIC do we lose all the worries that come with the lie of " private data". and yes, the insurance study trying to exclude old people from moving to another insurance will also be public.
what we should think about is what rights do we have NOT HAVING TO BE ENTERED?
like, obviously we need a social clubbermint id. but maybe, sometimes we have the right to withhold information from being digitized?
all the "securiti" effort could be saved, and a pure paper department (with drawbacks of course (this will take two weeks for example)) for certain individuals could be arranged...
(Score: 4, Interesting) by Immerman on Sunday June 05 2022, @05:13AM (1 child)
Radical transparency.
It certainly seems to have much to say for it at first glance, though it's a bitter pill to swallow.
On deeper consideration though you'll note that even though you know everything about the powerful - there's not actually a whole lot you can do with that information. While the powerful on the other hand are perfectly positioned to exploit that information as mercilessly as they wish to make life miserable for anyone who threatens their position, or they just don't approve of.
You're a member of a political reformation movement? They know your plans almost as soon as you do, and you're not accomplishing anything. Gay? Unless you're so far in the closet even you're not sure you're really in there, every bigot in town will know it. Etc.
(Score: 0) by Anonymous Coward on Sunday June 05 2022, @01:08PM
yes, prolly gay 'cause i think pregnant woman should not work. somebody's gotta look after the base (either or u don't care). some things can be better accomplished by one gender ... stuff like that.
the problem is not about being swat-ed by "the elite". i am being everyday (and you prolly too, since you're prolly not elite). we can assume that it is secrit information that gives them that extra leg up?
i am saying that promising " data privacy" knowing full well that it will leak ... well zer0day hording (financed by your taxes), lobbying, endless update cycles, closed source ... basically trying to hide information in a world that has split the atom (global annhilation possible) and has computers and networks is shoe-horning a old idea onto a new better world, that isn't really compatible with it.
"too many secrets", not for a few but for all?
also, cost (energy and time) to keep secrets, where walking straight in ANY direction will lead back to the same place (prison; secrets allow for prisons inside prisons)
(Score: -1, Troll) by Anonymous Coward on Saturday June 04 2022, @09:44PM
I LOVE LETTING NIGGERS FUCK MY ANUS
(Score: -1, Troll) by Anonymous Coward on Saturday June 04 2022, @09:46PM (1 child)
GET OUT OF MY ASS AND SUCK MY COCK WHERE YOU BELONG
FUCK YOU NIGGERS AND QUEERS
YOUR FLAG SHOULD BE A NIGGER GETTING FUCKED UP THE ASS
(Score: 0) by Anonymous Coward on Saturday June 04 2022, @10:07PM
Try grindr brah, and leave the CIA out if your love life. Or is that your kink? No shame brah!
(Score: 3, Informative) by pdfernhout on Sunday June 05 2022, @01:57AM (3 children)
Lawrence Lessig wrote in Code 2.0 that there are at least four way of shaping human behavior: rules, norms, prices, and architecture.
Laws (rules) can control what data is stored where (as in Europe).
Norms (enforced by software engineers and managers and executives) can shape what is done.
Fines (prices) for misusing data or subsidies for using data wisely/ethically can make a difference. Advertising-based data colelction could be shaped like charging government fees for targeted advertising or whatever.
Better software and systems (architecture) that support privacy whether using local storage or peer-to-peer exchanges or some other approach like tagging and filtering via metadata or whatever can help make it easier to do the right thing.
Also along the lines of changing social architecture, there could be broader economic changes shifting business models -- like if the balance in society shifts from mainly exchange-based transactions more to subsistence production, gift exchange, and government-planned activities.
We can also ask, why is loss of data privacy an issue and try to address the root concerns related to jobs, relationships, financial security, and so on.
David Brin's "The Transparent Society" is also another path:
https://en.wikipedia.org/wiki/The_Transparent_Society [wikipedia.org]
"The Transparent Society (1998) is a non-fiction book by the science-fiction author David Brin in which he forecasts social transparency and some degree of erosion of privacy, as it is overtaken by low-cost surveillance, communication and database technology, and proposes new institutions and practices that he believes would provide benefits that would more than compensate for lost privacy. ... Brin argues that it will be good for society if the powers of surveillance are shared with the citizenry, allowing "sousveillance" or "viewing from below," enabling the public to watch the watchers. According to Brin, this only continues the same trend promoted by Adam Smith, John Locke, the US Constitutionalists and the western enlightenment, who held that any elite (whether commercial, governmental, or aristocratic) should experience constraints upon its power. And there is no power-equalizer greater than knowledge. ..."
The biggest challenge of the 21st century: the irony of technologies of abundance used by scarcity-minded people.
(Score: 5, Insightful) by Immerman on Sunday June 05 2022, @05:18AM (1 child)
The problem with transparency, is that the powerful are inherently in a much better position to abuse that information than the masses are.
(Score: 1, Interesting) by Anonymous Coward on Sunday June 05 2022, @12:45PM
chicken or egg?
they are powerful because they have information -or- they have information (others don't have) which makes them powerful.
i suppose it just boils down to: can you be powerful without information?
and then: do you use power to limit information (for others), like dumb-down the surrounding (i am looking abit sideway thru squinting eyes at social media here) or do you use the power to gain more information?
i just feel it's miserable to limit a "information storing, gathering and processing" device/technology.
it's like trying to force a fish to NOT swim.
so, "hacking" really is acquiring information thru unconventional means and a shitty interface :]
note: data reading should be public; data modification is a different can of worms.
(Score: 0) by Anonymous Coward on Monday June 06 2022, @05:30PM
As a society, we need to change our concepts of both privacy and consent.
Privacy is straightforward: You are not allowed to use your knowledge of me without my consent, no matter how you acquired that knowledge. If you do, I get to sue you into bankruptcy.
Consent is more complicated: We need a consolidated and streamlined way of granting consent. So, I have an agent in the cloud somewhere, which I have told my preferences, and when you need my consent to send me targeted advertising, for example, my agent will tell you yes, or no, or only if you pay me $0.02, or whatever other conditions I want to put on my consent. And then you have the option of either meeting my conditions, or not invading my privacy.