RT Journal Article SR Electronic T1 FANCY: Fast Estimation of Privacy Risk in Functional Genomics Data JF bioRxiv FD Cold Spring Harbor Laboratory SP 775338 DO 10.1101/775338 A1 Gamze Gürsoy A1 Charlotte M. Brannon A1 Fabio C.P. Navarro A1 Mark Gerstein YR 2020 UL http://biorxiv.org/content/early/2020/02/11/775338.abstract AB Functional genomics data is becoming clinically actionable, raising privacy concerns. However, quantifying the privacy leakage by genotyping is difficult due to the heterogeneous nature of sequencing techniques. Thus, we present FANCY, a tool that rapidly estimates number of leaking variants from raw RNA-Seq, ATAC-Seq and ChIP-Seq reads, without explicit genotyping. FANCY employs supervised regression using overall sequencing statistics as features and provides an estimate of the overall privacy risk before data release. FANCY can predict the cumulative number of leaking SNVs with a 0.95 average R2 for all independent test sets. We acknowledged the importance of accurate prediction even when the number of leaked variants is low, so we developed a special version of model, which can make predictions with higher accuracy for only a few leaking variants. A python and MATLAB implementation of FANCY, as well as custom scripts to generate the features can be found at https://github.com/gersteinlab/FANCY. We also provide jupyter notebooks so that users can optimize the parameters in the regression model based on their own data. An easy-to-use webserver that takes inputs and displays results can be found at fancy.gersteinlab.org.