Abstract
Motivation Whole-genome DNA sequencing (WGS) enables the discovery of non-coding variants, but tools are lacking to prioritize the subset that functionally impacts human phenotypes. DNA sequence variants that disrupt or create transcription factor binding sites (TFBS) can modulate gene expression. find-tfbs efficiently scans phased WGS in large cohorts to identify and count TFBSs in regulatory sequences. This information can then be used in association testing to find putatively functional non-coding variants associated with complex human diseases or traits.
Results We applied find-tfbs to discover functional non-coding variants associated with hematological traits in the NHLBI Trans-Omics for Precision Medicine (TOPMed) WGS dataset (Nmax=44,709). We identified >2000 associations at P<1×10−9, implicating specific blood cell-types, transcription factors and causal genes. The vast majority of these associations are captured by variants identified in large genome-wide association studies (GWAS) for blood-cell traits. find-tfbs is computationally efficient and robust, allowing for the rapid identification of non-coding variants associated with multiple human phenotypes in very large sample size.
Contacts sebastian.meric.de.bellefon{at}umontreal.ca and guillaume.lettre{at}umontreal.ca
Supplementary information Supplementary data are available.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
↵# The TOPMed membership is available online at: https://www.nhlbiwgs.org/topmed-banner-authorship