Abstract
Over recent decades, genome-wide association studies (GWAS) have dramatically changed the understanding of human genetics. A recent genetic data release by UK Biobank has allowed many researchers worldwide to have comprehensive look into the genetic architecture of thousands of human phenotypes. In this study, we developed a novel statistical framework to assess phenome-wide significance and genetic pleiotropy across the human phenome based on GWAS summary statistics. We demonstrate widespread sharing of genetic architecture components between distinct groups of traits. Apart from known multiple associations inside the MHC locus, we discover high degree of pleiotropy for genes involved in immune system function, apoptosis, hemostasis cascades, as well as lipid and xenobiotic metabolism. We find several notable examples of novel pleiotropic loci (e.g., the MIR2113 microRNA broadly associated with cognition), and provide several possible mechanisms for these association signals. Our results allow for a functional phenome-wide look into the shared components of genetic architecture of human complex traits, and highlight crucial genes and pathways for their development.
Introduction
Over the recent decades, many advances have been made in uncovering the genetic architecture of human complex traits. Genome-wide association studies (GWAS), emerging at the onset of the millennium, have facilitated the discovery of susceptibility loci for both complex disease and quantitative traits (reviewed in Mills and Rahal1). Most recent GWAS for common phenotypes such as type 2 diabetes include up to 900,000 individuals and allow to identify numerous novel risk loci2. Many methods have been developed to interpret the GWAS signal on both single-variant and genome scale, including a group of specialized gene set enrichment tests3–5 aimed at discovery of biological pathways and mechanisms related to complex traits. Integration of GWAS and expression quantitative trait loci (eQTL) data has also shed light onto the role of gene expression in complex traits, and allowed for identification and validation of causal genes6.
Advances in biobanking, genetic data acquisition, and analysis have allowed first glimpse into the genetic architecture of multiple traits at once, with the UK Biobank (UKB) genetic dataset of 500,000 individuals being extensively used for diverse studies in statistical genetics7. These included, for example, estimation of natural selection effects on complex trait-associated variation8, and development of trait-specific risk scores for identification of individuals at high risk for complex disease9. Genetic associations in the UK Biobank data have attracted the attention of researchers worldwide, and several efforts have been made to aggregate genetic association data across all traits10–12. These studies highlighted the complexity of the human phenome, and pinpointed notable loci with multiple associations (e.g., the MHC locus).
Several attempts have been made to assess the association of individual variants or genes across the whole phenome (phenome-wide association studies, PheWAS) (e.g., Schmidt et al13), though no unified statistical framework to interpret such phenome-wide associations has been proposed. Pleiotropy of trait-associated variants in the human genome has also attracted lots of attention in the field; and Mendelian randomization based approaches have been proposed to detect pleiotropy in GWAS data14. At the same time, UK Biobank dataset offers an excellent opportunity to analyze shared components of the genetic architecture of human phenotypes on the phenome-scale, with global overviews suggesting widespread signals of pleiotropy in the UK Biobank data15,16. In this study, we developed a statistical framework to explore the landscape of phenome-wide associations in GWAS summary statistics derived from UK Biobank dataset, and identified multiple shared blocks of genetic architecture of diverse human complex traits.
Results
Complexity reduction by clustering of similar traits
To conduct a genome-wide scan for pleiotropic loci, we first obtained sets of significantly associated SNPs for all phenotypes in the UK Biobank data using pre-calculated GWAS summary statistics provided by Benjamin Neale’s lab (release 1, downloaded 2018-02-25). The dataset included both standard GWAS loci as well as imputed variants, totaling 10,894,597 variants. We focused our analysis only on 543 complex traits that have significant non-zero partitioned heritability estimates (h2). A total of 469,013 (4.27%) SNPs had at least one phenotype associated at genome-wide significant level, with an average of 4.34 phenotypes associated with each locus. Interestingly, we observed numerous multiple associations across the dataset, with 230296 (49.21%) of SNPs having > 1 associated phenotype, and more than 10 associated phenotypes for 57,856 (12.34%) SNPs. While many of these multiple associations could be true pleiotropic variants, much of the signal likely arises from statistical artifacts or multiple highly correlated phenotypes. Hence, we went on to cluster traits that share a significant proportion of their genetic architecture. We applied the Jaccard distance metric as a simple and efficient way that can group traits based on pre-selected significant SNPs. Hierarchical clustering based on Jaccard distance between phenotypes identified 307 independent trait clusters with 1.76 phenotypes on average. Among these, 29 clusters comprised 3 or more phenotypes (see Extended Data Figure 1). The clustering procedure has dramatically decreased the amount of SNPs having multiple associations (92,224 (19.7%) SNPs with more than one association compared to 230,296 in non-clustered data) (see Extended Data Figure 2); as well as the average number of associated trait clusters (1.33 cluster per variant). Notably the number of pleiotropic SNPs with more than 10 associations has dramatically dropped to 0.014% (64 variants) after performing clusterization procedure (Figure 1d-e). Interestingly, variants that have > 5 associated clusters also have higher minor allele frequencies, i.e. most rare variants are not pleiotropic (Figure 1e). Similar results were obtained when comparing SNPs with less or more than 15 associated individual phenotypes (see Extended Data Figure 3), indicating that the clustering procedure didn’t affect the relationship between MAF and pleiotropy.
Investigation of molecular pathways relevant to major trait clusters
We then focused on investigation of the genetic architecture of the major trait clusters. Many of the identified clusters comprised biologically similar complex traits or same phenotypes in different encoding (ICD-10 and self-reported disease status, etc.). As a way to decipher the molecular basis of traits and clusters, we used a modified version of the gene set enrichment analysis we call Locus Set Enrichment Analysis (LSEA) which accounts for linkage disequilibrium (LD) between SNPs and can work with few significantly associated SNPs for each trait (see Online methods for description of the method).
To focus solely on shared genetic architecture components in each cluster, we considered loci which were significantly associated with 2 or more phenotypes for each cluster. In contrast to other published methods3,5, LSEA approach sensitively identified enrichment for coagulation hallmark genes and specific coagulation pathways (e.g., intrinsic pathway, extrinsic pathway) for the cluster 35 comprising blood clotting and other related phenotypes (Figure 2a-b) accurately accounting for linkage between functionally related genes.
We next made functional annotation of all identified multi-phenotype clusters using the LSEA method. Cluster 21, comprising anthropometric traits, such as height, whole body weight, and properties of body parts and fat percentage, was enriched for diverse gene sets including extracellular matrix components and signalling pathways, such as GPCRs pathway, MAPK cascade and tyrosine kinases (see Extended Data Figure 4). For hypertension-related cluster 33, LSEA identified enrichments for platelet signalling, blood vessel formation, and hemostasis. Interestingly, we also observed highly significant enrichment for GPCR signaling pathway, which is concordant with investigation of GPCR signaling regulation in hypertension17. Yet another cluster comprised phenotypes characterized by high cholesterol level, with enriched gene sets predominantly related to lipid digestion, mobilization and transport, and small molecule metabolism. Genes from these gene sets were also overrepresented in the association signal of cluster 34, which included most heart pathologies, such as heart attack, angina, and acute myocardial infarction (see Extended Data Figure 7). Finally, cluster 36 related to autoimmune diseases such as asthma, hypo- and hyperthyroidism, and psoriasis, was expectedly enriched for immune system signaling pathways, e.g. cytokine-related ones (see Extended Data Figure 9).
Overall, we observed substantial overlap between gene sets enriched in major trait clusters. We identified 8393 gene sets that showed significant enrichment for at least 2 different phenotypes or clusters (760 of which belonged to the hallmark gene sets or canonical pathways set defined by Molecular Signatures Database (MSigDB)18,19. Of these, more than half had 50 or more phenotypes with enrichment signal (Figure 2c). We also investigated the enrichment statistics for another approach based on selection of gene sets enriched for at least 2 phenotypes in each cluster (Extended Data Figure 10). The results of such analysis were highly concordant with the ones obtained by using shared associated loci.
We then turned our attention to the analysis of gene sets which were highly enriched in the major trait clusters. To this end, we selected top-100 enriched gene sets for each cluster, and focused on gene sets having 2 or more enrichments after such filtering. Quite expectedly, we observed that most of the shared enriched gene sets belonged to the immune system, metabolism, and extracellular matrix (Figure 2d). Of these, hemostasis cascade elements, matrisome components, and xenobiotic metabolism genes were enriched in the largest number of trait clusters. We also assessed the enrichment of association signal inside two large regions of chromosome 6, the MHC region and the histone gene cluster 1, which is known to harbor multiple binding sites for the master regulator of MHC genes, CIITA20. We found 12 trait clusters to be significantly associated with variation inside the MHC locus, supporting its pivotal role in complex traits.
Analysis of highly pleiotropic loci
As we observed a large amount of shared gene sets in our functional annotation studies (Figure 2d), we went on to systematically identify specific variants and loci involved in multiple complex traits and trait clusters. To this end, we first constructed a statistical model based on Poisson distribution that accounts for the randomly expected number of associated clusters for SNPs with different minor allele frequency (MAF), with such model accurately predicting the proportion of signal-harboring SNPs for each MAF bin. We used the constructed model to obtain the empirical p-value of multiple associations for each SNP given its MAF, and selected variants that showed genome-wide significance (P < 5 * 10-9); i.e. the ones involved in many more associations than expected by a random coincidence. Such filtering resulted in 2,000 genome-wide significant pleiotropic loci corresponding to 53 independent chromosomal regions. Pathway analysis of such pleiotropic loci identified strong enrichment for diverse immunity-related molecular pathways including complement system, chromosome maintenance, and hemostatic cascades.
We then went on to analyze the functional impact of the identified pleiotropic variants. We identified a very strong enrichment for cis-eQTLs (data collected from the Genotype Tissue Expression (GTEx) project21) in the set of pleiotropic variants (1953/2191 compared to 247,551/469,013, p-value = 1.2 * 10-293), suggesting that variants with high pleiotropic effect tend to have a significant influence on gene expression (Figure 3c). These results are in good concordance with the other reports on genetic pleiotropy in the UKB dataset16. We also assessed the prevalence of certain functional variant types in the previously defined set of pleiotropic variants. We detected significant overrepresentation of missense variants (46/2191 compared to 3073/469013, Fisher p-value = 2.4 * 10-11), as well as the other functionally important variant classes, among the pleiotropic variants compared to variants associated with at least one trait (see Extended Data Figure 11). Furthermore, we identified a strong enrichment for open chromatin and promoter marks (H3K4Me3, H3K27Ac, TBP, DNAse hypersensitivity) in the set of pleiotropic loci using genomic track enrichment analysis (ChIP-Atlas22, Figure 3d).
We then set off to characterize the specific sets of loci shared by different clusters. Firstly, we investigated the overall structure of the pleiotropy network represented as weighted graph with nodes corresponding to trait clusters and edges corresponding to shared pleiotropic loci defined above. 62 trait clusters shared at least one pleiotropic locus with the others (see Extended Data Figure 12), with each pair of clusters shared 2.4 loci on average (see Extended Data Figure 13). The anthropometric trait-related cluster (21) shared at least one pleiotropic locus with all the others, while several other clusters (11, 33, and 36) shared loci with a large number of clusters.
We then continued by selecting highly connected subgraphs of the pleiotropy network (combinations of nodes sharing a specific set of loci) (see Methods). While the most noticeable sets of shared pleiotropic loci belonged to the MHC complex, we found numerous interesting examples of shared groups of loci (shown in Figure 3e). As expected, larger combinations of trait clusters tended to share fewer common loci (see Extended Data Figure 14).
One of the most notable loci included the FTO gene, which was found to be associated with 14 clusters. This finding is concordant with previous reports suggesting this gene to be connected to obesity and many obesity-related comorbidities23. Variants in the FTO gene affecting body composition and metabolic rates, are likely to increase risk of diabetes (clusters 14, 34), vascular problems (33), and lipid metabolism disorders (73). Another interesting single-gene pleiotropic locus comprised the MSRA gene associated with trait clusters related to anxiety (6), neuroticism (11), and sleep duration (7). This gene has already been linked to neuroticism24. We also found the LPA gene to be connected to clusters comprising diverse heart problems and increased cholesterol level, including myocardial infarction and chronic ischemic heart disease (belonging to cluster 34), high cholesterol level (73), and ezetimibe intake (24). The relationship between SNPs in lipoprotein(a) [Lp(a)] and major adverse cardiovascular (MACE) events has also previously been reported25.
Among the other genes, ABO was found to be associated with trait clusters related to a wide variety of vascular phenotypes such as warfarin treatment (72), blood pressure dysregulation (33), embolism (35), and heel bone mineral density (195). While the role of ABO in thrombosis has been excessively studied26, the overall significance of this locus to the majority of cardiovascular phenotypes has not been previously investigated. Yet another group of loci included APOE and TOMM40 genes associated with body mass index (21), diverse lipid-metabolism associated phenotypes (73), Alzheimer’s disease (25). Observed data is quite convergent with studies of APOE and TOMM40 alleles causing Alzheimer’s disease27, and further support a solid link between lipid metabolism and diverse complex phenotypes.
One of the most interesting pleiotropic loci included the MIR2113 gene encoding a microRNA of unknown function. This locus is associated with several phenotypes related to cognitive characteristics, including college or university degree qualifications (cluster 11) and fluid intelligence score (cluster 153), as well as ion concentrations (sodium in urine (181), potassium in urine (153)). Previously, MIR2113 was shown to play a role in memory processes28. However, the exact mechanism of its function has never been reported. Our data on its involvement in maintenance of ionic balance, together with a strong enrichment for potassium channel genes among its predicted targets according to miRBase (Extended Data Figure 15), suggest that MIR2113 regulates cognitive processes through regulation of ion channel gene expression, which is important for understanding the development of cognition-related traits.
Discussion
Large-scale genetic datasets, such as the UK Biobank or the Genome Aggregation Database (gnomAD)29 allow the first ever look into the ‘Holy Grail’ of human genetics research, i.e. the map of genotype-phenotype relationships. In our study, we aimed at identification of highly pleiotropic loci in the human genome and dissecting the key genetic architecture components relevant to human complex traits. The UKB genetic dataset contains a massive amount of similar and/or related traits which are commonly separated into several major domains16,30. To overcome this limitation and group traits that share a significant proportion of their genetic architecture independently of their arbitrary classification, we used a hierarchical clustering approach using the simple Jaccard distance metric calculated using sets of significantly associated variants. While such an approach could misclassify phenotypes with low signal-to-noise ratio, it efficiently clusters similar and identical phenotypes. The results of such clustering approach are concordant with the other efforts to computationally classify traits in the UKB dataset12,16. Notably, our clustering procedure identified close relationships between traits that have been demonstrated as comorbid to each other. For instance, deep venous thrombosis and pulmonary embolism combine in one cluster (35) with diverse thromboembolisms. It was clearly shown that deep venous thrombosis significantly increases the risk of pulmonary embolism25. Grouping of asthma, diabetes, and increased risk of nasal polyps (cluster 14) is also consistent with known data31,32. Therefore, our results are in good concordance with existing biological and clinical knowledge.
One of the major results presented in this work is the comprehensive functional annotation of GWAS signal for individual traits and trait clusters. To perform such functional annotation using a limited set of significant variants, we’ve constructed an LD-aware algorithm for GSEA (LSEA), which allowed us to identify crucial molecular pathways involved in multiple groups of phenotypes. For example, we discovered multiple associations for genes involved in innate and adaptive immunity (Figure 2d), as well as an overwhelming enrichment for variants inside the HLA locus. These results provide additional evidence that variation in genes and proteins involved in immune system functioning affects a large number of developmental and physiological processes. Similar observations have been made previously using site-level PheWAS33. Interestingly, we also discover high degree of pleiotropy for the loci inside the HIST1 gene cluster, which is known to harbor multiple binding sites for the CIITA transactivator protein that regulates MHC gene expression34. We also observe stronger association with traits from the smoking-related cluster (24) for variants inside the HIST1 gene cluster than for the MHC genes themselves. This association signal provides new mechanistic insights into the previously reported association of the histone genes with increased risk of squamous cell lung carcinoma35. Another set of genes involved in multiple complex trait clusters comprises the matrisome components. Matrisome genes are involved in the highest amount of individual phenotypes and clusters, including the anthropometric trait cluster 21. This is not surprising, regarding functional significance of ECM proteins in governing body compositions and their previously reported role in diverse diseases36.
Deciphering the mechanisms of complex traits and their co-morbidity is a capstone for GWAS data interpretation and further clinical application of the results37. In this regard, Mendelian randomization (MR) approaches could be quite useful when analyzing chains of causal effects between genetic variants and their epistatic interactions that lead to phenotypic manifestation. MR approaches have successfully been applied to investigation of genetic pleiotropy and identification of variants influencing multiple traits38. In our study, we applied a different statistical approach to investigate multiple association in genome-wide data. Noteworthy, the results of our network analysis of pleiotropic interactions (Figure 3) corroborate and expand previous MR-based findings. For example, we report much higher multiplicity of genetic associations for the FTO locus, comprising breast cancer, heel bone mineral density, anthropometric traits and sleep duration. We report additional associations for the diabetes-related GCKR gene, such as gout, high cholesterol levels and narcolepsy. We also expand the list of associated phenotypes for the APOE gene, adding vascular/heart problems (chronic ischemic heart disease and angina) and pulse rate. Among the sets of pleiotropic loci we found the SLC39A8 gene associated with osteoporosis, as also reported previously39. The same locus is reported to be associated with another osteoporosis- and osteoarthritis-related traits that were also investigated in a recent meta-analysis of UKB data40. These include wait-hip ratio, BMI, body fat percentage, strenuous sport, alcohol intake, and prospective memory result. The exact relationships between these traits, though, are still to be investigated.
We identify several key properties of the pleiotropic variation. First, SNPs with high amount of associated traits and clusters tend to be common (Figure 1e, Extended Data Figure 3). This is unexpected given that variants with larger effect sizes tend to be rare due to negative selection8. However, low degree of pleiotropy for rare variants might be explained by sample size limitations. Secondly, pleiotropic loci identified in this study are significantly enriched in open chromatin regions bearing H3K4Me3 and H3K27Ac histone modifications as identified by ChIP-Atlas search22 (Figure 3d). This is consistent with previous observations made for immune system41. Third, variants with high degree of pleiotropy have higher functional impact: these are enriched for both protein-altering missense variants and broader range eQTLs (Figure 3c). All of the aforementioned properties of pleiotropic loci are in very good concordance with the results of other global overviews of pleiotropy made using UKB data15,16.
Overall, our approach allowed us to detect multiple novel interrelations between complex traits, which can shed light on common molecular patterns driving the human phenome. Further investigations would help to determine causal relationships between key genes and molecular pathways that give rise to multigenic disease. Investigation of all possible genetic interactions between common and specific risk factors should help to decipher sophisticated molecular mechanisms behind complex traits.
Methods
Data acquisition and filtering criteria
We acquired data from the Benjamin Neale’s lab website (UK Biobank GWAS results imputed v2, downloaded at 2017-10-04; http://www.nealelab.is/uk-biobank). We focused our attention only on phenotypes with significant non-zero partitioned heritability estimates (p < 0.05, estimates provided by the authors of the dataset). After obtaining the list of heritable phenotypes we retained genome-wide significant variants for each phenotype (p-value < 5 * 10-9). Association summary statistics for each variant for each trait were then merged into a singly matrix for further analysis.
Clustering of traits
To group similar traits in space of pre-selected associated variants, we utilized the Jaccard distance metric calculated for each pair of traits as follows: , where D is the distance between two phenotypes, and T1 and T2 are sets of associated SNPs for two traits. We then performed hierarchical clustering together with the analysis of silhouettes to determine the most appropriate number of clusters. We then calculated the number of associated trait clusters for each variant. To this end, we considered an SNP to be associated with a cluster if this SNP is associated with at least one trait inside this cluster.
Description of the LSEA method
We implemented a novel Locus Set Enrichment Analysis (LSEA) method for gene set enrichment analysis of GWAS data. LSEA operates on the sets of pre-selected significant SNPs or summary statistics file for each trait. At first stage, LSEA groups associated SNPs into individual genomic loci. First, variants are clustered by LD using the clumping method in PLINK42. The resulting groups of SNPs are transformed to an interval list based on the leftmost and rightmost coordinate of variants in each clump. The resulting intervals are then merged if these overlap by more than 70%, or span different parts of the same gene, and intervals overlapping the MHC region or the HIST1 gene cluster (defined as chr6:28,866,528-33,775,446 and chr6:25,000,528-28,000,446, respectively, based on hg19 human genome annotation) are omitted, as suggested previously3. The resulting intervals are then intersected with the pre-computed universe of all possible loci bearing GWAS signal.
To construct such universe, we merged all loci that were identified for the whole UKB dataset. We then transformed all of the curated gene sets obtained from the MsigDB database (http://software.broadinstitute.org/gsea/msigdb/annotate.jsp) the following way: for each interval i in the intervals universe, we assigned i to a gene set if at least one gene in the gene set overlaps i. Such method efficiently corrects for functionally related genes that are genetically linked, as intervals spanning several genes belonging to each gene set are counted only once.
The enrichment statistic is computed for each gene set as follows: , where n is the number of intervals in the query set, ij = 1 if interval j belongs to a gene set of interest, and ij = 0 otherwise. Enrichment p-value is then computed from hypergeometric distribution. The resulting p-values are then adjusted for multiple comparisons using the Holm FDR correction.
Gene set enrichment analysis of individual traits and clusters
We used the LSEA method to analyze gene set enrichment for all individual phenotypes and trait clusters. We used two complementary approaches to make functional annotations of multi-trait clusters. The first approach was based on using variants that are significantly associated with at least 2 of the traits in each cluster (termed “union of SNPs” on Figure 2c). For the second approach, we used LSEA results for all individual phenotypes in each cluster, and retained only gene sets that were significantly enriched for 2 or more traits in a cluster (termed “union of gene sets” on Figure 2c.
As the HLA locus and the HIST1 gene cluster were omitted during gene set analysis with LSEA, we analyzed the overrepresentation of variants inside these loci separately. To this end, we created sets of intervals for the HIST1 and HLA regions in a way analogous to the LSEA method. We then obtained the overrepresentation p-values using hypergeometric distribution, with the test statistic being equal to the number of intervals in the query that overlap the regions of interest.
Identification of pleiotropic genetic variants
To assess the phenome-wide significance and identify key pleiotropic variants, we constructed a model to account for the null expectation of random overlap between association signals. If the association signals are randomly distributed across the genome for two individual phenotypes (or clusters), the number of per-SNP associations should follow a Poisson distribution in a manner similar to the sequencing read coverage43. As we noted that SNPs with different MAF differ in the average number of associations (Figure 1e, Extended Data Figure 3), we divided the variants into 500 bins based on MAF ranged from 0 to 0.5. We then calculated the mean number of associations for variants in each bin (λ, see Extended Data Figure 16). This parameter was then used to calculate the phenome-wide significance p-value: , where k is the number of traits or trait clusters associated with a given variant. Variants with p-value < 5 * 10-9 were selected as pleiotropic SNPs and used for further analyses.
Overrepresentation analysis for functional variant classes
To analyze the overrepresentation of cis-eQTLs, we used the GTEx data (REF). Sets of variants of interest (i.e., pleiotropic variants and variants with at least 1 association) were overlapped with significant cis-eQTLs (p < 5 * 10-8). Overrepresentation of cis-eQTLs in the set of pleiotropic SNPs was assessed by comparison of the resulting proportion of significant eQTLs using the hypergeometric distribution.
We also annotated all SNPs from the UKB dataset with SnpEff44 and separated the variants by their functional type (e.g., missense, intronic, etc. (the distributions are shown in Extended Data Figure 14)). We then compared the proportions of certain functional classes of variants in the sets of pleiotropic SNPs and SNPs bearing at least one association. The proportions were compared using the hypergeometric test.
To analyze the enrichment of epigenetic marks and chromatin states in the set of pleiotropic loci, we used the previously described ChIP-Atlas server22. For this analysis, sets of pleiotropic variants were converted to an interval list, and the resulting intervals were used as the input for ChIP-Atlas. To narrow down the search space, we narrowed the analysis down to a set of blood-derived genomic tracks. For each type of epigenetic mark, we retained the minimal enrichment p-value for further analysis.
Network analysis of pleiotropic loci
To group loci that share sets of associated clusters, we constructed a network of genetic associations for the identified pleiotropic loci. To this end, we created a graph representation of the data, with nodes corresponding to individual clusters and edge weight between a pair of adjacent nodes corresponding to the number of shared independent loci. To search for individual fully connected subgraphs of this network (i.e., nodes sharing the same sets of loci) we used a simple deterministic algorithm. At the first stage, we identified all possible large combinations of nodes connected by an individual pleiotropic locus by iterating over edges of the graph. We then merged loci that share the same combinations of associated nodes.
Data and source code availability
All data and scripts pertinent to the analysis presented here, as well as the source code for LSEA, are available through GitHub (https://github.com/bioinf/ukb_phewas).
Acknowledgements
We thank Mykyta N. Artomov for useful comments and suggestions. The authors declare no conflict of interest.
Footnotes
↵* These authors jointly supervised the work.