Abstract
Bronchodilator drugs are commonly prescribed for treatment and management of obstructive lung function present with diseases such as asthma. Administration of bronchodilator medication can partially or fully restore lung function as measured by pulmonary function tests. The genetics of baseline lung function measures taken prior to bronchodilator medication has been extensively studied, and the genetics of the bronchodilator response itself has received some attention. However, few studies have focused on the genetics of post-bronchodilator lung function. To address this gap, we analyzed lung function phenotypes in 1,103 subjects from the Study of African Americans, Asthma, Genes, and Environment (SAGE), a pediatric asthma case-control cohort, using an integrative genomic analysis approach that combined genotype, locus-specific genetic ancestry, and functional annotation information. We integrated genome-wide association study (GWAS) results with an admixture mapping scan of three pulmonary function tests (FEV1, FVC, and FEV1/FVC) taken before and after albuterol bronchodilator administration on the same subjects, yielding six traits. We identified 18 GWAS loci, and 5 additional loci from admixture mapping, spanning several known and novel lung function candidate genes. Most loci identified via admixture mapping exhibited wide variation in minor allele frequency across genotyped global populations. Functional fine-mapping revealed an enrichment of epigenetic annotations from peripheral blood mononuclear cells, fetal lung tissue, and lung fibroblasts. Our results point to three novel potential genetic drivers of pre- and post-bronchodilator lung function: ADAMTS1, RAD54B, and EGLN3.
Introduction
Asthma is a disease characterized by episodic obstruction of airways that affects nearly 339 million people worldwide (The Global Asthma Network, 2018) and is the most common chronic disease among children. Asthma constitutes a massive global economic burden, representing $81.9 billion in medical costs in the United States alone (Nurmagambetov et al., 2018). As a complex disease, asthma results from both environmental and genetic factors, with genetic heritability estimates ranging from 0.35 to 0.90 (Ober & Yao, 2011). The advent of genome-wide association studies (GWAS) (Risch & Merikangas, 1996), combined with progressively larger sample sizes in recent years, has enabled researchers to query the genetic basis of asthma at unprecedented scale, with numerous loci identified in autoimmune and inflammatory pathways (Demenais et al., 2018). However, these loci account for a small portion of asthma liability (Demenais et al., 2018).
Pulmonary function tests are recommended to guide in the diagnosis of asthma and monitor patient status (Asthma and Allergy Foundation of America, 2019). During these tests, patients breathe through a spirometer that captures key measures of lung function, including the forced expiratory volume in 1 second (FEV1), which measures initial forced exhalatory capacity; the forced vital capacity (FVC), which measures the maximum total volume of air that a patient can forcibly exhale; and their ratio (FEV1/FVC). Lung function measures can be population-normalized according to expected lung function values that account for age, sex, height and ethnicity of the patient (Hankinson et al., 1999). Spirometric measurements can be taken both before bronchodilator treatment (Pre-BD) and after (Post-BD) to further understand lung function status. Historically, baseline lung function is measured with Pre-BD measures, but among people with asthma, post-BD lung function may best reflect lung health (Brehm et al., 2015).
While the genetic contribution to asthma and lung function has been extensively studied via GWAS, most analyses have relied on subjects of European descent (Demenais et al., 2018; Johansson et al., 2019; Pickrell et al., 2016; Z. Zhu et al., 2018). This overrepresentation of ethnically white subjects in biomedical research has impaired the generalizability of genetic studies of complex disease (Burchard, 2014; Bustamante et al., 2011; Popejoy & Fullerton, 2016). Ethnic differences in lung function, particularly between African Americans and European Americans, have been reported for over 40 years (Binder et al., 1976; Glindmeyer et al., 1995; Hsi et al., 1983; Rossiter & Weill, 1974; Schwartz et al., 1988). Ethnic disparities in lung function were attributed to population differences in sitting height, as increased height leads to increased lung capacity. However, adjustment for sitting height only explains 42-50% of ethnic differences in lung function between African Americans and European Americans (Harik-Khan et al., 2004), suggesting that a simplistic reduction to ethnic differences in height cannot account for the observed disparity in lung function. Unequal socioeconomic conditions were also thought to contribute to ethnic differences in lung function (Braun, 2015; Quanjer, 2013, 2015), but socioeconomic factors only account for 7-10% of unexplained variance (Harik-Khan et al., 2004). Self-identified race or ethnicity are commonly used in the clinic to interpret lung function measures, but these are not ideal variables for understanding genetic differences in lung function between populations. Kumar et al. observed that the proportion of global African genetic ancestry is inversely correlated with lung function (Kumar et al., 2010). Spear et al. later observed population differences among African Americans, Mexican Americans, and Puerto Ricans in bronchodilator drug response to albuterol, the short-acting β2-adrenergic receptor agonist that is the most commonly prescribed drug for the treatment of acute asthma symptoms (Spear et al., 2019). Specifically, Spear et al. performed admixture mapping, a technique designed to identify regions of the genome where locus-specific ancestry drives variation in a disease trait (Shriner, 2013) that has been helpful in studies of complex diseases, including asthma and breast cancer (Féjerman et al., 2012; Pino-Yanes et al., 2015). However, admixture mapping studies comparing baseline and post-bronchodilator lung function have not yet been performed in African Americans. In this study, we address this gap in knowledge by evaluating the effect of locus-specific ancestry on both pre- and post-bronchodilator lung function measures in a pediatric case-control cohort of African Americans children and adolescents.
Methods
Study Population
The Study of African Americans, Asthma, Genes and Environments (SAGE) is a case-control cross-sectional cohort study of genetics and gene-environment interactions in African American children and adolescents in the USA. SAGE includes detailed clinical, social, and environmental data on both asthma and asthma-related conditions. Full details of the SAGE study protocols are described in detail elsewhere (Borrell et al., 2013; Nishimura et al., 2013; Thakur et al., 2013; Mak et al., 2018). Briefly, SAGE was initiated in 2006 and recruited participants with and without asthma through a combination of clinic- and community-based recruitment centers in the San Francisco Bay Area. All participants in SAGE self-identified as African American and self-reported that all four grandparents were African American.
Pulmonary function tests were taken prior to administration of albuterol bronchodilator medication for all individuals, both those with and without asthma. Post-bronchodilator spirometry measures were performed only for individuals with asthma. Analyses of pre-bronchodilator lung function measures included all 1,103 asthma cases and controls with complete covariate information. Post-bronchodilator analyses were performed on the 831 asthma cases with post-bronchodilator measurements.
Genotyping and Quality Control
DNA was isolated from whole blood collected from SAGE participants at the time of study enrollment as described previously (Borrell et al., 2013). DNA was extracted using the Wizard® Genomic DNA Purification kits (Promega, Fitchburg, WI). Samples were genotyped with the Affymetrix Axiom LAT1 array (World Array 4, Affymetrix, Santa Clara, CA).
Genotype quality control was performed in PLINK v1.9 (Chang et al., 2015). Of the 772,703 genotyped variants, 111,901 SNPs were excluded from analysis due to genotype missingness more than 5% (n = 28,211), minor allele frequency (MAF) less than 1% (n = 80,420) or deviation from Hardy Weinberg expectations (HWE) at p < 0.001 (n = 3,270). The final set of 660,802 genotyped markers. (Supplementary Table 1).
Genotyped SNPs were submitted to the Michigan Imputation Server (Das et al., 2016), phased using EAGLE v2.3 (Loh et al., 2016), and imputed from the 1000 Genomes Project reference panel (The 1000 Genomes Project Consortium, 2015) using Minimac3 (Das et al., 2016). Imputed SNPs with imputation R2 < 0.3, with deviation from Hardy-Weinberg equilibrium (HWE) p-value < 10−4, or with minor allele frequency (MAF) < 1% were discarded. Of the 47,101,126 imputed SNPs, a total of 31,146,322 were culled due to either low MAF (n = 31,095,418) or deviation from HWE (n = 50,904). All variants in the imputed set showed a genotype missingness of no more than 5%. The final number of SNPs used in association analyses was 15,954,804 (Supplementary Table 1).
Outcome Phenotypes
Pulmonary function testing was performed at the time of recruitment according to the American Thoracic Society / European Respiratory Society standards (Miller et al., 2005; Pellegrino et al., 2005; Wanger et al., 2005) with a KoKo PFT Spirometer (nSpire Health Inc., Louisville, CO). Spirometry was performed both before and 15 minutes after administration of four puffs of albuterol (90ug per puff) through a 5-cm plastic mouthpiece from a standard metered-dose inhaler. Patients were assessed for the following spirometric measures before and after bronchodilator drug usage (pre-BD and post-BD, respectively): (a) FEV1, (b) FVC, and (c) FEV1/FVC. A total of six phenotypes were assessed for genotype association: pre-BD FEV1 (Pre-FEV1), pre-BD FVC (Pre-FVC), pre-BD FEV1/FVC (Pre-FEV1/FVC), post-BD FEV1 (Post-FEV1), post-BD FVC (Post-FVC), and post-BD FEV1/FVC (Post-FEV1/FVC). All phenotype values were normalized based on the expected lung function values calculated from the Hankinson equations (Hankinson et al., 1999), which account for age, sex, height, and self-reported ethnicity. Phenotype distributions were checked for normality and to detect outliers. Outliers were determined using the method of Tukey fences (John Tukey, 1977). For each phenotype, we computed the first quartile value (Q1), the third quartile value (Q3), and the interquartile range (IQR). We declared as outliers all values outside of the range
Individuals with outlier values for a phenotype were removed from association analyses for that phenotype.
Covariates
Age, Sex, and BMI
Biometric covariates such as age, sex, body mass index (BMI), and height were measured directly at time of recruitment. BMI was categorized into underweight, normal, overweight, and obese, according to CDC guidelines for defining childhood obesity (Barlow, 2007; Cote et al., 2013; Whitlock et al., 2005). An overweight status was defined as a BMI at or above the 85th percentile for the general population of children of the same sex and in the same age group. An obese status was defined as a BMI at or above the 95th percentile. Underweight individuals (bottom 5th percentile, n = 9) were excluded from analysis.
Asthma status
Case status was defined as physician-diagnosed asthma supported by reported asthma medication use and symptoms of coughing, wheezing, or shortness of breath in the 2 years preceding enrollment.
Maternal Educational Attainment
Maternal educational attainment was measured at recruitment and included in analyses to control for socioeconomic status. It was coded as total years of education completed from the first grade: for example, a complete K-6 education was 6 years, a complete high school education was 12 years, and any additional years (college or trade school and beyond) were counted as 1 year each.
Genetic Ancestry
Global genetic ancestry was estimated for each individual with the ADMIXTURE software (Alexander et al., 2009) in supervised learning mode assuming one West African and one European ancestral population, with HapMap Phase III YRI and CEU populations as references (The International HapMap 3 Consortium, 2010). Local ancestry estimation was performed with RFMix (Maples et al., 2013; Spear et al., 2019) using the same two-way ancestry reference from HapMap Phase III.
Estimation of Genetic Relatedness and Genotype Principal Components
Genetic relatedness matrices (GRMs) were generated in R using GENESIS (Gogarten et al., 2019), which provides a computational pipeline for handling complex population structure. We used PCAir (Conomos et al., 2015) to correct for distant population structure accounting for relatedness, and PC-Relate (Conomos et al., 2016) to adjust for genetic relatedness in recently admixed populations. The resulting principal components provide better correction for population stratification in admixed populations compared to standard PCA on genotypes (Patterson et al., 2006).
Genetic Association Analyses
Genotype association testing was performed with the MLMA-LOCO algorithm from GCTA (Yang et al., 2011, 2014) to correct for population structure using GRMs generated with GENESIS. Association of outcome phenotypes with allele dosages at 15,954,804 biallelic SNPs was performed with a “leave one chromosome out” model to avoid double-fitting tested variants. Other variables included in models were age, sex, BMI, maternal educational attainment, and three genotype principal components. Models of Pre-BD also included asthma status.
The suggestive and significant association thresholds for each outcome phenotype were determined by the effective number of independent statistical tests (Meff) calculated with CODA (Plummer et al., 2006). CODA computes Meff using the autocorrelation of p-values from GWAS. For our analyses, Meff ranged from 488,819 to 507,975 (Supplementary Table 2). The Bonferroni corrected genome-wide significant threshold was computed as 0.05/Meff, while the suggestive threshold was computed as (1/Meff), yielding a single pair of thresholds for all six outcome phenotypes considered: p < 1.99 × 10−6 for suggestive association, and p < 9.95 × 10−8 for significant association (Supplementary Table 2).
Admixture mapping analyses were performed using linear regression models in R and local ancestry calls from RFMix for 454,322 genotyped SNPs. Counts of 0, 1, or 2 alleles of African descent were computed for each person at each SNP. Phenotypes were then regressed onto ancestral allele counts for each SNP while including age, sex, height, BMI, maternal educational attainment (as a proxy for socioeconomic status), and global African genetic ancestry proportion as covariates. Analyses with pre-BD outcome measures also included asthma status as a covariate.
Fine-mapping genetic associations
Functional fine-mapping with PAINTOR (Kichaev et al., 2014) was used to identify putative causal variants in novel loci deemed statistically significant by admixture mapping. PAINTOR applies a Bayesian probabilistic framework to integrate functional annotations, association summary statistics (Z-scores), and linkage disequilibrium information for each locus to prioritize the most likely causal variants in a given region. Functional annotations were selected per locus as recommended by the authors of PAINTOR (Kichaev, 2017). A subset of lung- and blood-related functional annotations from the Roadmap Epigenomics Project (Roadmap Epigenomics Consortium et al., 2015) and the ENCODE Consortium (ENCODE Project Consortium, 2012) were assessed for their individual improvement to the posterior probability of causality; the top 5 minimally correlated annotations were selected for each locus.
Annotation Tools
The NHGRI/EBI GWAS Catalog (Buniello et al., 2019), Ensembl Genome Browser release 98 (Cunningham et al., 2019) and gnomAD browser v3.0 (Karczewski et al., 2019) were used to look up known associations at significant loci according to our analyses. Annotation lookups in the gnomAD browser v3.0 used hg38 coordinates translated from our hg19-aligned genotypes via liftOver (Hinrichs et al., 2006). Data management, statistical analysis, and figure generation made extensive use of GNU parallel (Tange, 2018) and several R packages, including data.table, doParallel, optparse, ggplot2, and the tidyverse bundle (Calaway et al., 2018; Davis et al., 2019; Dowle et al., 2019; Wickham, Hadley, 2016, p. 2; Wickham, Hadley & Grolemund, Garrett, 2017).
Results
Cohort Characteristics
Characteristics of all SAGE participants included in analyses are shown in Table 1. Distributions of each lung function measure stratified by case/control status and bronchodilator administration (pre-BD vs. post-BD) are shown in Supplementary Figure 1. FVC showed no significant difference between asthma cases and controls (Kruskal-Wallis p-value = 0.073), while stratification by case/control status yielded significantly different distributions for FEV1 (Kruskal-Wallis p-value = 4.8 x 10-7) and FEV1/FVC (Kruskal-Wallis p-value = 1.5 x 10-7). Among cases, statistically significant differences were observed between distributions of pre-BD and post-BD measures of FEV1 (Kruskal-Wallis p-value = 1.2 x 10-38), FVC (Kruskal-Wallis p-value = 5.4 x 10-16), and FEV1/FVC (Kruskal-Wallis p-value = 4.0 x 10−29), illustrating a measurable effect of bronchodilator medication on lung function.
The global African genetic ancestry proportion in our sample varied from 30.7% to 100%, with an average proportion of 80.2% (Supplementary Figure 2), concordant with empirically observed averages (Baharian et al., 2016). Global ancestry contained the same information as the first genotype principal component (R2 = 0.984, Supplementary Figure 3).
Genetic association testing finds novel and known loci
Figure 1 shows results from GWAS performed on pre-BD phenotypes (Pre-FEV1, Pre-FVC, and Pre-FEV1/FVC) and post-BD phenotypes (Post-FEV1, Post-FVC, and Post-FEV1/FVC) using linear mixed modeling. The association results showed no evidence of genomic inflation, with genetic control λ ranging from 0.98 to 0.99 (Supplementary Table 2, Supplementary Figure 4). Table 2 lists the 18 genome-wide significant associations found, each associated with exactly one of the six lung function measures. An additional 252 variants were suggestively associated with at least one phenotype (Supplementary Tables 3–8). Of the 18 variants, 4 variants on chromosome 13 in a region spanned by the gene ATP8A2 were associated with Pre-FEV1/FVC (Supplementary Figure 6). Two variants on chromosome 16 that were associated with Pre-FVC flanked the promoter region of IRX3 (Supplementary Figure 7). A third variant associated with Pre-FVC was located on chromosome 20 near THBD (Supplementary Figure 8), a gene linked to venous thromboembolism in African American and Afro-Caribbean individuals (Hernandez et al., 2016). Two variants associated with Post-FVC were in a gene-rich region on chromosome 19 (Supplementary Figure 9), with the peak near TMIGD2 and SHD, while eight other variants pointed to a second gene-rich region on chromosome 11 near CXCR5 and HYOU1 (Supplementary Figure 10). Post-FEV1/FVC was associated with a region on chromosome 15 near the genes AKAP13 and ADAMTS7P4 (Supplementary Figure 11).
Among the suggestive associations, a variant on chromosome 12 associated with Post-FEV1 was near BTBD11 (Supplementary Figure 12), a gene previously associated with Post-FEV1, Post-FEV1/FVC, and ΔFEV1, the change in lung function due to bronchodilator administration (Hardin et al., 2016; Lutz et al., 2015), as well as BMI (Kichaev et al., 2019). A suggestive association with Pre-FEV1 on chromosome 12 fell near SCARB1 (Supplementary Figure 13), which was previously associated with FEV1 and FVC (Wyss et al., 2018) and HDL cholesterol levels (Wojcik et al., 2019). Another suggestive association with Pre-FEV1 on chromosome 20 was near the gene PTPRT (Supplementary Figure 14), which was previously associated with thromboembolism susceptibility in 5,334 African American individuals (Heit et al., 2017).
Admixture mapping identified five novel loci not found by GWAS
Table 3 shows five regions where admixture proportions were statistically significantly associated with one of the six phenotypes. The three pre-BD phenotypes (Pre-FEV1, Pre-FVC, Pre-FEV1/FVC) were each associated with one region, while Post-FVC was associated with two distinct regions. Post-FEV1 and Post-FEV1/FVC had no significant associations. None of the regions overlapped with those significant in our GWAS, and none showed large deviations from mean genome-wide African genetic ancestry. A small region on chromosome 21 that was significantly associated with Pre-FEV1 flanked the genes ADAMTS1 and ADAMTS5 (Supplementary Figure 15). The region on chromosome 4 associated with Pre-FVC pointed to two candidate genes, RCHY1 and THAP6, that had no prior lung disease associations (Supplementary Figure 17). A region on chromosome 19 associated with Pre-FEV1/FVC spanned the genes ZNF557 and INSR (Supplementary Figure 18). Post-FVC was associated with two regions, one on chromosome 8 spanning the genes ESRP1, INTS8, TP53INP1 and NDUFAF6 (Supplementary Figure 19), and another on chromosome 14 encompassing EGLN3 and SNX6 (Supplementary Figure 20).
Functional fine-mapping found three novel putatively causal loci for lung function phenotypes
Table 4 lists the most probable causal SNP for each of the five admixture mapping loci according to PAINTOR. SNP rs13615 showed the highest probability of causality (0.630) with Pre-FEV1 on locus 1 (Figure 2). This variant falls within the 3’ untranslated region of ADAMTS1, suggesting that ADAMTS1 drives the admixture mapping association and not its physical neighbor ADAMTS5. The minor allele frequency of rs13615 in African and African diaspora populations was lower than every other global population (2.6% AFR vs. 7.0 – 54.5% other populations, gnomAD v3; see Supplementary Figure 21). The SNP rs10857225 emerged as the most likely causal variant (probability 0.361) for the association of Pre-FVC with locus 2 on chromosome 4 (Figure 3). This variant is located within an intron of the gene THAP6, suggesting that THAP6 is more likely the causal gene behind the association with Pre-FVC. In contrast to locus 1, the minor allele frequency of rs10857225 is highest in global African populations and markedly lower in other global populations (59.1% AFR vs. 28.1 – 38.4% other populations, gnomAD v3). Locus 3 on chromosome 19 associated with Pre-FEV1/FVC, and locus 4 on chromosome 8 associated with Post-FVC, showed little information gain from functional fine-mapping. The driving variant for locus 3, SNP rs72986681, was located in the 3’ untranslated region of ZNG557, but showed a low probability of causality (0.168, Supplementary Figure 22). The most probable marker for locus 4, the SNP rs2470740, which is located in intron 2 of RAD54B, showed an even lower probability of causality (0.109, Supplementary Figure 23). Functional fine-mapping of locus 5, a region on chromosome 14 associated with Post-FVC, yielded the SNP rs1351618 with a moderate probability of causality (0.390, Figure 4). rs1351618 is located in an intron of EGLN3. As with locus 2, rs1351618 had a much higher minor allele frequency in populations of African ancestry versus other global populations (12.4% AFR vs. < 2.2% other populations, gnomAD v3).
Discussion
We analyzed the genetic basis of six lung function phenotypes in 1,103 African American children with and without asthma. The phenotypes consisted of three standard spirometric measures – FEV1, FVC, and FEV1/FVC – measured before and after administration of bronchodilator medication. Our GWAS identified 18 genome-wide significant loci, while our integrative genetic analysis approach that layered GWAS, admixture mapping, and functional fine-mapping identified another 5 putatively causal loci that could drive differences between pre- and post-bronchodilator lung function.
The four variants on chromosome 13 associated with Pre-FEV1/FVC pointed to ATP8A2 as a candidate gene. ATP8A2 encodes an ATPase involved in phospholipid transport and is highly expressed in brain tissue, testes, and the adrenal glands, and to a lesser degree, lung (Fagerberg et al., 2014). Mutations in ATP8A2 have been linked to several neurological disorders (Martín-Hernández et al., 2016). Two variants on chromosome 16 that were associated with Pre-FVC point to IRX3 as a candidate gene. IRX3 encodes a homeobox protein crucial for neural development and its promoters previously showed a long-range interaction with the FTO gene. Expression levels of the FTO gene are known to influence BMI and are of great interest in type II diabetes and obesity research (Smemo et al., 2014). Post-FVC showed associations on two chromosomes. Notable genes near the association peak on chromosome 19 included MAP2K2 and ZBTB7A, genes associated with variation in BMI, visceral adiposity, and eosinophil counts (Kichaev et al., 2019; Pulit et al., 2019; Rüeger et al., 2018); and CHAF1A, HDGFL2, PLIN4, ANKRD24, MPND, and SH3GL1, previously associated with corpuscular volume and hemoglobin concentration (Astle et al., 2016; Kichaev et al., 2019; van Rooij et al., 2017). Interestingly, the two genes closest to the association peak, TMIGD2 and SHD, have not been previously associated with any traits. TMIGD2 is involved in T-cell co-stimulation and the immune response through an interaction with HHLA2, suggesting that it could possibly play an immune or allergic response role in lung function (Y. Zhu et al., 2013). Among the genes within or near the chromosome 11 peak, two emerge as potentially key loci. The first is CXCR5, which has been linked to increased risk of childhood onset asthma (M. A. R. Ferreira et al., 2019; Johansson et al., 2019; Pividori et al., 2019) and respiratory disease (Kichaev et al., 2019), as well as related allergic and immunological conditions such as eczema, leukocyte count, rheumatoid arthritis, and Sjögren’s syndrome (M. A. Ferreira et al., 2017; Kichaev et al., 2019; Laufer et al., 2019; Lessard et al., 2013). The second is HYOU1, which has been associated with BMI and Post-FEV1/FVC (Lutz et al., 2015; Pulit et al., 2019). Post-FEV1/FVC was associated with two genes, AKAP13 and ADAMTS7P4. AKAP13 has been previously associated with numerous conditions, including interstitial lung disease and psoriasis in European individuals (Fingerlin et al., 2013; Tsoi et al., 2015) as well as weight, BMI, and cardiovascular traits such as blood pressure and hemoglobin count in multiple populations (Giri et al., 2019; Kichaev et al., 2019). ADAMTS7P4 was previously associated with red blood cell volume (Kichaev et al., 2019). The statistically suggestive association of Post-FEV1 with BTBD11 pointed to previous associations with various lung function measures, including Post-FEV1, Post-FEV1/FVC, and ΔFEV1 (Hardin et al., 2016; Lutz et al., 2015). These previously detected associations were based on much larger sample sizes than what was available to us: the associations with Post-FEV1 and Post-FEV1/FVC found by Lutz et al. were discovered in a population of 10,094 European and 3,260 African American smokers with chronic obstructive pulmonary disorder (COPD), while the association with ΔFEV1 found by Hardin et al. was based on 5,766 Europeans and 811 African Americans with COPD, suggesting that our inability to reach genome-wide significance in our sample was insufficient statistical power.
Among significant and suggestive GWAS loci, the association of Post-FVC with variants in or near CXCR5 and HYOU1 is the only one that replicates known lung function loci: CXCR5 was previously associated with asthma (M. A. R. Ferreira et al., 2019; Johansson et al., 2019; Pividori et al., 2019), and HYOU1 was previously associated with Post-FEV1/FVC (Lutz et al., 2015). The association with Post-FEV1/FVC comes from an adult COPD cohort ascertained by smoking status; in contrast, SAGE is a pediatric asthma cohort. The mechanism by which HYOU1 affects lung function in both youth and and adults is unclear. Nevertheless, the overlap of post-bronchodilator pulmonary function measures at this locus suggests that the region encompassing CXCR5 and HYOU1 plays a role in lung disease among people with obstructive lung function.
Admixture mapping identified 5 genomic regions where variation in genetic ancestry was significantly associated with phenotypic variation. Locus 1 on chromosome 21 spanned the genes ADAMTS1 and ADAMTS5, which encode extracellular proteases within the same protein family but with different consequences for disease. Although both genes have been linked to blood protein levels (Suhre et al., 2017), ADAMTS1 has been associated with pre-FVC (Kichaev et al., 2019) and is expressed in arterial, adipose, and lung tissue, while ADAMTS5 is not appreciably expressed in the lung (Supplementary Figure 16). Further fine-mapping with PAINTOR places the most likely causal SNP (rs13615) within the 3’ UTR of ADAMTS1, suggesting that ADAMTS1 may be the gene that is functionally related to lung function. Interestingly, the association of ADAMTS1 with pre-FVC (Kichaev et al., 2019) was discovered in a European sample of substantially larger size than our cohort, highlighting the ability of admixture mapping to detect associations in scenarios with low statistical power. Locus 3, spanning a region on chromosome 19 that was associated with Pre-FEV1/FVC, contains the genes ZNF557 and INSR. ZNF557 has not been previously associated with any traits, while INSR is the well-known insulin receptor that has been previously associated with childhood onset asthma in our own cohort (White et al., 2016), as well as blood pressure levels, triglyceride levels, HDL cholesterol levels, and hypothyroidism across multiple populations (Bentley et al., 2019; Ehret et al., 2016; Kichaev et al., 2019; Klarin et al., 2018). Post-FVC showed two distinct admixture mapping signals. The first region on chromosome 8, which we call Locus 4, includes the genes ESRP1, INTS8, TP53INP1 and NDUFAF6 was previously associated with type II diabetes and eosinophil counts (Kichaev et al., 2019; Mahajan et al., 2018). The second region, Locus 5, spans EGLN3 and SNX6, both of which show previous associations with blood phenotypes such as blood pressure and hematocrit levels (Astle et al., 2016; Evangelou et al., 2018).
Overall, evaluation of our GWAS and admixture mapping lung function results suggests that genetics of this trait underlies some pleiotropy observed across pulmonary, hematological, cardiovascular, and obesity-related traits. Such pleiotropy has been observed in UK BioBank participants: as lung function decreases, BMI and type II diabetes incidence increases, as well as levels of eosinophils and neutrophils, both of which are common biomarkers for allergic disease (Supplementary Figure 25)(McInnes et al., 2019; Stanford Global Biobank Engine, 2020). The link between obesity and lung function is particularly interesting since obesity is a known asthma comorbidity, and lung function may play a role in obese asthma (Baffi et al., 2015)(Gruchała-Niedoszytko et al., 2015). Our findings suggest that genetically-based differences in lung function may provide a link between obesity and asthma.
It is curious that the regions identified by admixture mapping and subjected to functional fine-mapping did not overlap with the statistically significant GWAS loci. We attribute this in part to the different types of information used by each approach: GWAS analyzes how allelic variation affects a trait, while admixture mapping analyzes the phenotypic consequences of variation in genetic ancestry. By integrating GWAS summary statistics with loci identified via admixture mapping, we found that three of the admixture mapping-based loci -- ADAMTS1, THAP6, and EGLN3 -- had evidence of causal effects. Each of the sentinel SNPs tagging these genes showed a notable difference in ancestral allele frequency: populations of African descent had either the highest or the lowest minor allele frequency among all global populations, likely the result of admixture mapping prioritizing loci that varied by genetic ancestry. None of these loci have been previously associated with lung traits, highlighting the strength of our integrative analysis. The association with EGLN3 is particularly curious since it has been previously associated with a variety of traits, including heart rate response to beta-blocker therapy (Shahin et al., 2018). Short-acting beta-2 agonists such as albuterol selectively target beta-2 receptors in the lungs, while the first-generation beta-blockers taken for cardiac conditions bind to both beta-1 and beta-2 receptors, affecting the heart as well the lungs. Bronchospasm and FEV1 reduction are clinically significant side effects of first-generation beta-1 selective and nonselective beta-blockers for cardiac conditions. Consequently, these non-selective beta-blockers must be initiated with caution and close monitoring in patients with asthma (Christiansen & Zuraw, 2019). Beta-blockers lower blood pressure by reducing heart rate and cardiac contractility and are less effective in people with high levels of African genetic ancestry (Brewster & Seedat, 2013; Whelton et al., 2018). It has been previously observed that African Americans with asthma demonstrate lower bronchodilator drug response than European Americans (Blake et al., 2008), suggesting a possible pharmacological interaction between beta-2 receptors and African ancestry. Furthermore, EGLN3 is strongly expressed in cardiac tissue, suggesting that EGLN3 could possibly influence Post-FVC through cardiac phenotypes (Supplementary Figure 24). Further functional studies are required to elucidate the role of EGLN3 on lung function and bronchodilator drug response.
Our integrative analysis approach leverages available functional annotations and genetic ancestry estimates in the absence of molecular data to yield some promise for discovery of novel loci. Our study is limited to three tiers – genotypes, genetic ancestry, and functional annotations – and makes use of gene expression results from GTEx v8. However, it does not directly incorporate any transcriptomic, metabolomic, proteomic, or methylomic information. As large multi-omic datasets from NHLBI TOPMed, UK Biobank, and the NIH Million Veterans Program become available, the need for integrative genomic approaches to studying complex diseases will increase. Future multi-omic models of complex diseases, including obstructive lung function disorders, may deliver on the promise of precision medicine and provide actionable clinical translation of biomedical and pharmacogenomic insights into novel therapies.
Author Contributions
PCG, KLK, and MJW designed the study. SSO, SS, CE, and EGB recruited study subjects and generated the data. ACYM, JRE, DU, and SH cleaned and organized the data and provided analytic support. PCG, KLK, EYL, OR, MGC, and MJW performed the analysis. AKL and LSB provided clinical pharmacological expertise for interpretation of results. EGB and BEH funded the study. EGB supervised all recruitment. All authors contributed to manuscript writing and editing.
Conflicts of Interest
The authors declare no competing financial interests.
Supplementary Tables and Figures
Acknowledgements
This work was supported in part by the Sandler Family Foundation, the American Asthma Foundation, the RWJF Amos Medical Faculty Development Program, the Harry Wm. and Diana V. Hind Distinguished Professor in Pharmaceutical Sciences II, the National Heart, Lung, and Blood Institute (NHLBI) grants R01HL117004, R01HL128439, R01HL135156, X01HL134589, R01HL141992, R01HL104608, R01HL141845, and U01HL138626, the National Human Genome Research Institute (NHGRI) grants U01HG007419 and U01HG009080, the National Institute of Environmental Health Sciences grants R01ES015794, R21ES24844, the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) grant R01HD085993, the National Institute on Minority Health and Health Disparities (NIMHD) grants P60MD006902, R01MD010443, RL5GM118984, R56MD013312, and R56MD013312, and the Tobacco-Related Disease Research Program under Award Numbers 24RT-0025 and 27IR-0030. Research reported in this article was funded by the National Institutes of Health Common Fund and Office of Scientific Workforce Diversity under three linked awards RL5GM118984, TL4GM118986, 1UL1GM118985 administered by the National Institute of General Medical Sciences (NIGMS).
The authors wish to acknowledge the following SAGE co-investigators for subject recruitment, sample processing and quality control: Luisa N. Borrell, DDS, PhD, Emerita Brigino-Buenaventura, MD, Adam Davis, MA, MPH, Michael A. LeNoir, MD, Kelley Meade, MD, Fred Lurmann, MS and Harold J. Farber, MD, MSPH. The authors also wish to thank the staff and participants who contributed to the SAGE study.
This manuscript uses results and visualization provided by the Stanford Global Biobank Engine. The authors would like to thank the Rivas Lab for making the resource available.
PCG was additionally funded by NHGRI training grant T32 HG000044 to the Department of Genetics at Stanford University.
KLK was additionally supported by NHLBI grant supplement R01HL135156-S1, the UCSF Bakar Computational Health Sciences Institute, the Gordon and Betty Moore Foundation grant GBMF3834, and the Alfred P. Sloan Foundation grant 2013-10-27 to UC Berkeley through the Moore-Sloan Data Sciences Environment initiative at the Berkeley Institute for Data Science (BIDS). The logistical space, technical support, administrative assistance, and indefatigable good humor of the members and staff at BIDS are gratefully acknowledged.
EYL, LSB, and AKL were supported by a National Research Service Award grant T32GM007546 from the NIGMS. MGC was additionally supported by NIH MARC U-STAR grant T34GM008574 at San Francisco State University. MJW was additionally supported by NHLBI grant supplement R01HL117004-S1, an NIGMS Institutional Research and Academic Career Development Award K12GM081266, and an NHLBI Research Career Development Award K01HL140218.