SoylentNews Comments | Two Methods to De-identify Large Patient Datasets Greatly Reduced the Risk of Re-identification

Two Methods to De-identify Large Patient Datasets Greatly Reduced the Risk of Re-identification

posted by Fnord666 on Tuesday August 01 2017, @07:11PM

from the who's-on-first dept.

"exec" writes:

Two de-identification methods, k-anonymization and adding a "fuzzy factor," significantly reduced the risk of re-identification of patients in a dataset of 5 million patient records from a large cervical cancer screening program in Norway, according to results published in Cancer Epidemiology, Biomarkers & Prevention, a journal of the American Association for Cancer Research.
"Researchers typically get access to de-identified data, that is, data without any personal identifying information, such as names, addresses, and Social Security numbers. However, this may not be sufficient to protect the privacy of individuals participating in a research study," said Giske Ursin, MD, PhD, director of Cancer Registry of Norway, Institute of Population-based Research.
Patient datasets often have sensitive data, such as information about a person's health and disease diagnosis that an individual may not want to share publicly, and data custodians are responsible for safeguarding such information, Ursin added. "People who have the permission to access such datasets have to abide by the laws and ethical guidelines, but there is always this concern that the data might fall into the wrong hands and be misused," she added. "As a data custodian, that's my worst nightmare."

http://www.aacr.org/Newsroom/Pages/News-Release-Detail.aspx?ItemID=1074

Journal reference:
Giske Ursin, Sagar Sen, Jean-Marie Mottu and Mari Nygård, Protecting Privacy in Large Datasets—First We Assess the Risk; Then We Fuzzy the Data, Cancer Epidemiology, Biomarkers & Prevention, http://dx.doi.org/10.1158/1055-9965.EPI-17-0172

-- submitted from IRC

Original Submission

Starting Score:

point

Moderation

Interesting=1, Total=1

Extra 'Interesting' Modifier

Karma-Bonus Modifier

Total Score:

This discussion has been archived. No new comments can be posted.

Two Methods to De-identify Large Patient Datasets Greatly Reduced the Risk of Re-identification | Log In/Create an Account | Top | 12 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

can't leak data that you don't collect can't leak data that you don't collect (Score: 3, Interesting) by Runaway1956 on Wednesday August 02 2017, @12:04AM (1 child)

by Runaway1956 (2926)

on Wednesday August 02 2017, @12:04AM (#547789) Journal

Why is it necessary to put all that identifying information into the database to start with? Your family doctor can treat you for whatever ails you, taking all of your information. Insurance, address, etc, ad nauseum. All those fields on his forms should just be flagged, so that those data bits never leave his office. If the data is never input into the database, the database can't leak the data.

Of course, it becomes a minor issue to determine what must and must not be included in the data. Age is pertinent to many medical research projects. Ethnic background is important to some others. Medical people often demand information that is probably irrelevant to a lot of research, such as place of birth, number of siblings, and more. Being a twin/trip/octo MIGHT be important to some research, but that bit of data need not be available to the entire world of medical personnel.

Clean up the input, and the output will require a lot less attention for "security".

Starting Score:	1		point
Moderation		+1
Interesting=1, Total=1
Extra 'Interesting' Modifier		0
Karma-Bonus Modifier		+1

Total Score:		3

Re:can't leak data that you don't collect (Score: 0) by Anonymous Coward on Wednesday August 02 2017, @03:13PM

by Anonymous Coward on Wednesday August 02 2017, @03:13PM (#547950)

Ah, just stick it in "The Cloud". Hey, ask IBM, like Sweden (Norway's neighbour) did. Certainly it will be fine, and secure!

Parent

Moderator Help

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Two Methods to De-identify Large Patient Datasets Greatly Reduced the Risk of Re-identification

can't leak data that you don't collect can't leak data that you don't collect (Score: 3, Interesting) by Runaway1956 on Wednesday August 02 2017, @12:04AM (1 child)

Re:can't leak data that you don't collect (Score: 0) by Anonymous Coward on Wednesday August 02 2017, @03:13PM