Will Data Aggregation For Precision Medicine Compromise Patient Privacy?

Posted on April 10, 2017 I Written By

Anne Zieger is a healthcare journalist who has written about the industry for 30 years. Her work has appeared in all of the leading healthcare industry publications, and she’s served as editor in chief of several healthcare B2B sites.

Like anyone else who follows medical research, I’m fascinated by the progress of precision medicine initiatives. I often find myself explaining to relatives that in the (perhaps far distant) future, their doctor may be able to offer treatments customized specifically for them. The prospect is awe-inspiring even for me, someone who’s been researching and writing about health data for decades.

That being the case, there are problems in bringing so much personal information together into a giant database, suggests Jennifer Kulynych in an article for OUPblog, which is published by Oxford University Press. In particular, bringing together a massive trove of individual medical histories and genomes may have serious privacy implications, she says.

In arguing her point, she makes a sobering observation that rings true for me:

“A growing number of experts, particularly re-identification scientists, believe it simply isn’t possible to de-identify the genomic data and medical information needed for precision medicine. To be useful, such information can’t be modified or stripped of identifiers to the point where there’s no real risk that the data could be linked back to a patient.”

As she points out, norms in the research community make it even more likely that patients could be individually identified. For example, while a doctor might need your permission to test your blood for care, in some states it’s quite legal for a researcher to take possession of blood not needed for that care, she says. Those researchers can then sequence your genome and place that data in a research database, and the patient may never have consented to this, or even know that it happened.

And there are other, perhaps even more troubling ways in which existing laws fail to protect the privacy of patients in researchers’ data stores. For example, current research and medical regs let review boards waive patient consent or even allow researchers to call DNA sequences “de-identified” data. This flies in the face of conventional wisdom that there’s no re-identification risk, she writes.

On top of all of this, the technology already exists to leverage this information for personal identification. For example, genome sequences can potentially be re-identified through comparison to a database of identified genomes. Law enforcement organizations have already used such data to predict key aspects of an individual’s face (such as eye color and race) from genomic data.

Then there’s the issue of what happens with EMR data storage. As the author notes, healthcare organizations are increasingly adding genomic data to their stores, and sharing it widely with individuals on their network. While such practices are largely confined to academic research institutions today, this type of data use is growing, and could also expose patients to involuntary identification.

Not everyone is as concerned as Kulynych about these issues. For example, a group of researchers recently concluded that a single patient anonymization algorithm could offer a “standard” level of privacy protection to patient, even when the organizations involved are sharing clinical data. They argue that larger clinical datasets that use this approach could protect patient privacy without generalizing or suppressing data in a manner that would undermine its usefulness.

But if nothing else, it’s hard to argue Kulynych’s central concern, that too few rules have been updated to reflect the realities of big genomic and medical data stories. Clearly, state and federal rules  need to address the emerging problems associated with big data and privacy. Otherwise, by the time a major privacy breach occurs, neither patients nor researchers will have any recourse.