RT Journal Article
SR Electronic
T1 Privacy-preserving generative deep neural networks support clinical data sharing
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 159756
DO 10.1101/159756
A1 Brett K. Beaulieu-Jones
A1 Zhiwei Steven Wu
A1 Chris Williams
A1 Ran Lee
A1 Sanjeev P. Bhavnani
A1 James Brian Byrd
A1 Casey S. Greene
YR 2018
UL http://biorxiv.org/content/early/2018/12/20/159756.abstract
AB Background Data sharing accelerates scientific progress but sharing individual level data while preserving patient privacy presents a barrier.Methods and Results Using pairs of deep neural networks, we generated simulated, synthetic “participants” that closely resemble participants of the SPRINT trial. We showed that such paired networks can be trained with differential privacy, a formal privacy framework that limits the likelihood that queries of the synthetic participants’ data could identify a real a participant in the trial. Machine-learning predictors built on the synthetic population generalize to the original dataset. This finding suggests that the synthetic data can be shared with others, enabling them to perform hypothesis-generating analyses as though they had the original trial data.Conclusions Deep neural networks that generate synthetic participants facilitate secondary analyses and reproducible investigation of clinical datasets by enhancing data sharing while preserving participant privacy.