Summary
We present the results of the largest genome wide association study (GWAS) performed so far in dilated cardiomyopathy (DCM), a leading cause of systolic heart failure and cardiovascular death, with 2,719 cases and 4,440 controls in the discovery population. We identified and replicated two new DCM-associated loci, one on chromosome 3p25.1 (lead SNP rs62232870, p = 8.7 × 10−11 and 7.7 × 10−4 in the discovery and replication step, respectively) and the second on chromosome 22q11.23 (lead SNP rs7284877, p = 3.3 × 10−8 and 1.4 × 10−3 in the discovery and replication step, respectively) while confirming two previously identified DCM loci on chromosome 10 and 1, BAG3 and HSPB7. The genetic risk score constructed from the number of lead risk-alleles at these four DCM loci revealed that individuals with 8 risk-alleles were at a 27% increased risk of DCM compared to individuals with 5 risk alleles (median of the referral population). We estimated the genome wide heritability at 31% ± 8%.
In silico annotation and functional 4C-sequencing analysis on iPSC-derived cardiomyocytes strongly suggest SLC6A6 as the most likely DCM gene at the 3p25.1 locus. This gene encodes a taurine and beta-alanine transporter whose involvement in myocardial dysfunction and DCM is supported by recent observations in humans and mice. Although less easy to discriminate the better candidate at the 22q11.23 locus, SMARCB1 appears as the strongest one.
This study provides both a better understanding of the genetic architecture of DCM and new knowledge on novel biological pathways underlying heart failure, with the potential for a therapeutic perspective.
Introduction
Dilated cardiomyopathy (DCM) is a heart muscle disease characterized by left ventricular dilatation and systolic dysfunction in the absence of abnormal loading conditions or coronary artery disease (CAD)(1,2). It is a major cause of systolic heart failure, the leading indication for heart transplantation, and therefore a major public health problem due to the important cardiovascular morbidity and mortality(1,2). The understanding of the genetic basis of DCM has improved during the past years with the role of both rare and common variants resulting in a complex genetic architecture of the disease(3,4). More than 50 genes(4–6) with rare pathogenic mutations have been reported as causing DCM, mainly inherited as dominant with variable penetrance. Several large scale association studies in sporadic cases have been performed to identify common DCM-associated alleles(7–13) including four genome wide association studies (GWAS)(9,10,12,13). Altogether, these genetic investigations have so far robustly identified 2 loci harboring common susceptibility alleles: a locus on chromosome 1, encompassing multiple candidate genes in strong linkage disequilibrium (LD), including ZBZTB17/MIZ-1, HSPB7 and CLCNKA(7–12); and a second locus on chromosome 10 for which the culprit gene, BAG3, is also involved in familial forms of DCM(9,14). The analysis of 116,855 common coding variants in an exome wide association study (EWAS) has also suggested the existence of six potential additional DCM loci(11) but, in absence of proper replication, their true contribution to DCM has not been definitively established. Here, we report the results of a new GWAS for sporadic DCM performed on 2,719 cases and 4,440 controls. Individual genotype data were imputed for the 1000Genomes reference panel and the main findings replicated in two independent case-control samples totaling 584 cases and 986 controls. Then, in silico annotation and functional analyses were performed in candidate loci to identify the better candidates. We also estimated the global genetic heritability, and built a genetic risk score (GRS) for DCM.
Material and Methods
Populations inclusion and samples collection
DCM patients and controls from five populations (France, Germany, USA, Italy and UK) were included in the GWAS (a detailed description of the cohorts is provided in Supplemental Material). Sporadic DCM was diagnosed according to standard criteria(1,4,15,16) by reduced ejection fraction (EF, echocardiography: <45% or MRI: <2 standard deviations (SDs) below the age- and sex-adjusted mean (16)) and an enlarged left ventricle end-diastolic volume/diameter (LVEDD >117% of value predicted from age and body surface area on echocardiography, or >2 SDs from the age- and sex-adjusted mean by MRI(16)) in the absence of significant coronary artery disease or intrinsic valvular disease, documented myocarditis, systemic disease, sustained arterial hypertension, or congenital malformation. All patients signed informed consent and the study protocol was approved by local ethics committees. In total, 2,719 cases and 4,440 controls were included in the discovery GWAS analysis.
Two case-control samples were available for replication of the main discovery GWAS findings, a Dutch population composed of 145 DCM cases and 527 controls(17) and a German collection of 439 patients with left ventricle dilation and/or hypokinetic non-dilated cardiomyopathy (HNDC) and 439 controls(18,19) (detailed descriptions are provided in Supplemental Material).
Genotyping, genotype calling, and imputation
Genotyping was performed with high-density arrays for all samples and was further imputed with the 1,000 Genomes reference dataset. Summary descriptions of specific genotyping arrays, QC filtering, and imputation methods are given in Supplemental Material and Supplementary Table 1.
Association analysis in the discovery phase
Association of imputed SNPs with DCM was performed using the Mach2dat software(20,21) implementing a logistic regression model adjusted for sex and genome-wide genotype-derived principal components under the assumption of additive allele effects. Only bi-allelic SNPs with imputation quality, r2, greater than 0.5 and minor allele frequency (MAF) higher than 0.005 were kept for association analysis. A statistical threshold of 5 × 10−8 was used to declare genome-wide significance.
To check the existence of several independent DCM-associated SNPs at loci with genome-wide statistical significance, we conducted conditional analyses adjusting for the lead SNP at each identified locus. In the case of more than one significant SNP at a given locus, additional haplotype analyses were conducted using the THESIAS software(22).
Association analysis in the replication phase
SNPs selected for replication were tested using the same statistical model as that used in the discovery stage, separately in each of the two replication cohorts, adopting a one-tailed hypothesis and applying a Bonferroni correction procedure to declare statistical replication while controlling for the number of tested SNPs. Results of the replication studies were subsequently meta-analyzed using a fixed-effects model based on the inverse-weighting method as implemented in the Metal software(23). A similar meta-analysis framework was applied to combine the results of the discovery and replication phase. Heterogeneity across studies (between the two replication studies or between the discovery and replication stage) was tested using the Cochran’s Q statistic and the I2 index was used to express its magnitude.
Regional association plot
At each of the replicated associated genetic loci, a regional association plot was performed using the LocusZoom online tool (http://locuszoom.sph.umich.edu/).
Genetic risk score analysis
A genetic risk score (GRS) was built upon the two SNPs already known to associate with DCM, HSPB7-rs10927886(7–12) and BAG3-rs2234962(9,14) and the independent replicated genome-wide significant DCM associated SNPs. That score was tested, using logistic regression analysis, for association with DCM on a continuous scale and with quintile repartition. For each individual i, the GRS was defined as
, where K is the number of variants composing the score, βk is the log-odds ratio for DCM associated with variant k obtained in the replication phase and Gik is the allele dosage at variant k for individual i.
Genetic heritability
The linkage disequilibrium (LD) score regression approach(24) was used to get an estimate of the genome-wide genetic heritability underlying DCM. We also used this methodology to calculate the genetic correlation between DCM and several cardiovascular traits capitalizing on the GWAS results available at the LD Hub (http://ldsc.broadinstitute.org/ldhub/).
Candidates selection strategy at associated loci
For each newly identified DCM locus, a downstream fine-mapping strategy was deployed using in silico and experimental functional data to select the best candidates at each locus.
Cis-regulation of associated SNPs
At each locus, DCM associated SNPs (p-value ≤ 5 × 10−8 and/or presenting a high LD (r2>0.7) with the lead SNP) were used to define the associated “LD block”. We checked whether those blocks could overlap with DNA regulatory elements by using the UCSC Genome Browser(25) in human assembly hg19 (http://genome.ucsc.edu/) by visualizing the ENCODE3 DNase hypersensitivity sites (DHS)(26) and transcription factor (TF) chromatin immunoprecipitation sequencing (ChIP-seq)(27) tracks produced on 125 and 130 cell lines, respectively. To detect left ventricle (LV)-specific putative regulatory regions we enriched those tracks with H3K27ac (active chromatin), H3K4me1 (enhancer) and H3K4me3 (promotor) histone marks of ENCODE heart LV samples (GSM910575, GSM910580, GSM908951)(28). To support the potential regulatory/promotor role of those regions, we looked at regulatory elements predicted from ORegAnno(29) and checked sequence conservation in a subset of the vertebrates(30).
Topologically associating domains (TAD) and intra-TAD chromatin interactions
TADs are domains of preferential chromatin interaction separated by insulators where genes are accessible to intra-TAD regulatory elements, such as enhancers(31). Taking advantage of public resources describing those TADs, we delimited the region comprising the LD block, thus the subset of candidate genes, that are jointly regulated. Regional candidate genes were defined based on published LV TADs described in Leung et al.(28), as well as preferential chromatin interaction measured via promoter chromatin Hi-C (PCHi-C) on iPSC-derived cardiomyocytes (iPSC-CM) by Montefiori et al.(32).
Those published TAD boundaries were confirmed by in-house circular chromatin conformation capture (4C)-sequencing data produced on an iPSC-derived cardiomyocyte line from a donor (a full description is enclosed in the Supplemental Material).
Candidate genes’ biological insights
The heart expression level of each candidate gene in an associated region was evaluated via RNA-sequencing cardiac expression data issued from the Genotype-Tissue Expression (GTEx) project database22 (http://www.gtexportal.org/home/) (LV and atrial appendage) and from LV explants of DCM cases and controls produced by Henig et al.(33). That later work also enabled us to check whether differential expression existed between the 97 DCM patients and the 108 healthy donors. For the subset of gene displaying interesting expression features, we then scrutinized publicly available resources for gene annotation and functions.
Annotation of associated SNPs
The annotation of SNPs’ blocks was realized with Annovar software(34) and bioinformatics prediction of effects using RegulomeDB(35), Regulatory Mendelian mutation (ReMM)(36) and the GRCh37-v1.4 Combined Annotation Dependent Depletion (CADD) model(37) (https://cadd.gs.washington.edu/snv). Each SNP of an association block being prone to be implicated in an alteration of local regulatory function, we went through various in silico data in an effort to identify regulatory ones. We thus screened GTEx database for expression and splicing quantitative trait loci (eQTL and sQTL) in human cardiac and skeletal muscle tissues, checked whether some of those SNPs could be associated with blood DNA methylation levels (mQTL)(38) and looked at their location in putative enhancer or promotor region (H3K27ac, H3K4me1, and H3K4me3 histone marks).
Results
Main statistical findings
After quality controls, 9,152,885 SNPs (including 8,945,131 autosomal and 207,754 X chromosome SNPs) were tested for association with DCM in a total of 2,651 cases and 4,329 controls. Results of the discovery GWAS are summarized in a Manhattan and a Quantile-Quantile plot (Figure 1 and Supplementary Figure 1). The main findings are summarized in Table 1. Five loci reached genome-wide significance. Two were already known to associate with DCM, BAG3 (p = 4.7 × 10−14 for rs61869036) and HSPB7 (p-value = 2.12 10−13 for rs10927886). Of note, the BAG3 rs61869036 is in complete LD (r2∼1) with the nonsynonymous rs2234962 already reported to associate with DCM(9,11) that was thus further used as BAG3 lead SNP (p = 5.6 × 10−14). Three new loci reached genome-wide significance on chr3p25.1 (rs62232870, p = 8.7 × 10−11) downstream LSM3, on chr16p13.3 (PKD1 rs2519236, p = 3.0 10−8) and on chr22q11.23 (SMARCB1 rs7284877, p = 3.3 10−8). Regional association plots at these 5 loci are shown in Supplementary Figures 2-6.
Conditional GWAS adjusted for the 5 lead SNPs did not reveal any new genome-wide association signal (Supplementary Figure 7 and 8).
At chr3p25 locus, a second SNP, rs4684185, showed a strong statistical association (p = 8.4 10−9) and is in negative LD (r2 = 0.12, D’ = −0.95) with the lead rs62232870. While the rs62232870-A allele with frequency ∼0.21 was associated with an increased OR for DCM of 1.36 [1.24 - 1.49], the common rs4684185-C allele with frequency ∼0.70 was associated with an OR of 1.28 [1.17 – 1.40]. After adjusting on rs62232870 lead SNP, the association of rs4684185 is no longer significant but a residual signal remained (p = 5.10−4) suggesting a more complex interaction. A detailed haplotype analysis of these SNPs (Supplementary Table 2) showed that, compared to the most frequent rs62232870-G/rs4684185-C haplotype, the rs62232870- A/rs4684185-C haplotype was associated with an increased risk of DCM (OR = 1.22 [1.11 – 1.33]) while its yin-yang haplotype defined by the rs6223870-G/rs4684185-T alleles was protective against DCM (OR = 0.82 [0.76 – 0.89]).
We sought to replicate the observed associations at chromosome 3, 16 and 22 loci in two independent studies totaling 584 DCM patients and 966 controls. The PKD1 rs148248535 did not show any statistical evidence for replication (p = 0.11) but we confirmed the associations observed at chr3p25.1(p = 7.70 10−4 and p = 6.10−3 for rs6223870 and rs4684185, respectively), including the yin-yang haplotype association (Supplementary Table 2), and at chr22q11.23 (p = 1.40 10−3 for rs7284877) (Table 1).
In a combined meta-analysis of the discovery and replication findings, the resulting ORs for DCM were 1.36 [1.25 - 1.48] (p = 5.3 10−13) and 1.27 [1.18 - 1.37] (p = 4.8 10−10) for chr3p25.1 rs6223870 and rs4684185, respectively and 1.33 [1.22 - 1.46] (p = 5.0 10−10) for chr22q11.23 SMARCB1 rs7284877, with no evidence for heterogeneity across studies (Table 1).
Genetic risk score analysis
We built both an unweighted and a weighted genetic risk score (GRS) using the 4 lead SNPs at the two already known loci (BAG3 and HSPB7) and those at the two replication loci, chr3p25.1 (rs62232870) and chr22q11.23 (rs7284877).
GRS findings are summarized in Figure 2A and B and Table 2A and 2B. Briefly, the risk of DCM in the discovery cohort for the unweighted GRS (Table 2A) was increased by 27% for the subjects with 8 risk alleles (1.27 [1.14-1.42]) and decreased by 21% for those having only one risk allele (0.79 [0.66-0.95]) as compared with the 5 risk alleles reference group (Figure 2A). Similar results are found for the weighted GRS (30% increase (1.30 [1.16-1.45]) and 19% decreased (0.81 [0.67-0.97] for those having the lowest and highest score as compared with the 1.6 score reference group (Figure 2B). An OR increase with each risk allele increment is still visible, but not significant, in the replication cohorts (Supplementary Table 3). To improve size homogeneity between the groups, a quintile repartition was also realized (Supplementary Figure 9).
Weighing was realized taking the sub-meta-analysis (two replication cohorts) beta value
Unweighted Genetic Risk Score for the 6,980 individuals of the discovery cohort and associated OR taking score 5 (presence of 5 risk alleles) as reference
*Score of each SNP weighted by the beta value of this SNP in the sub meta-analysis of the two replication cohorts
Heritability
Using our GWAS summary statistics, the estimated genome-wide DCM heritability in our European populations was around 31% ±8.4. We also examined the genetic correlation between DCM and various cardiovascular traits (e.g. coronary artery disease, heart diseases) but did not observe much shared genetic heritability. The strongest genetic correlation was observed with lipid- and obesity-related traits, e.g. waist circumference (ρ = 0.47, p = 3.2 10−8), whole-body mass fat (ρ =0.46, p = 6 10−8), or adiponectin (ρ =0.38, p = 7 10−6).
Candidate culprit gene selection strategy at chr3p25.1
As shown in Figure 3A, the top SNP, rs62232870, is located at the edge of an active enhancer region lying distal to LSM3 as evidenced by H3K27ac and H3K4me3 histone marks from ENCODE human LV samples (GSM910575, GSM910580, GSM908951). Of note, this enhancer region is absent from the seven default ENCODE non-cardiomyocyte cell lines suggesting cardiac tissue-specific expression. Other remarkable features support this region as regulatory active, such as significant sequence conservation in a subset of vertebrates, predicted regulatory elements from ORegAnno, hypersensitive sites for DNAseI, and multiple transcription factor (TF) chromatin immunoprecipitation sequencing (ChIP-seq).
The associated region is located in a gene desert with long non-coding RNA (lncRNA) sequences for chr3p25.1 [chr3:14,257,356-14,307,016] but covered MMP11, SMARCB1, DERL3 and lncRNA at chr22q11.23 [chr22:24,110,180-24,182,174]. All SNPs with association p-value < 5 10−8 and/or in LD (r2 ≥ 0.7) with the lead SNPs (rs62232870 in red; rs4684185 in dark blue; rs7284877 in dark green) are indicated. The SNPs in LD with the lead SNPs are colored orange, light blue and light green, respectively. Features associated with regulatory sequence elements are aligned under the SNPs track (Associated SNPs track) and show that the loci contain enhancer signature according to H3K27ac, H3K4me1 and H3K4me3 ChIP-seq signals prediction for left ventricle enhancers by Leung et al25, positive OregAnno regulatory element score26, conservation between vertebrates species27, DNaseI hypersensitivity and ChiP-seq signal for chromatin interacting proteins linked to transcription activity. Vertical blue lines highlight SNPs with Regulome (http://www.regulomedb.org/)33 prediction score below 4 indicating significant potential for being regulatory variants (see Table 4 for more details)
The associated block of SNPs covers ∼50kb [chr3:14,257,356-14,307,016] overlapping with another partially independent block of SNPs (in LD with the second lead SNP rs4684185, r2>0.87) (Supplementary Figure 3 and Supplementary Table 4) where predicted LV H3K27ac and H3K4me1 enhancer marks, reported by Leung et al.(28), are present. It is located in a clearly delimited TAD spanning [chr3:14,160,000-14,680,000] (Figure 4Ab) which encompasses 6 genes (CHCHD4, TMEM43, XPC, LSM3, SLC6A6 and GRIP2) (Supplementary Table 5) whose promoters interact with H3K27ac/H3K4me1 enhancer marks in iPSC-CM (Figure 4Af). Interestingly, promoter Hi-C data analysis revealed specific interactions of SLC6A6 and GRIP2 promoters with the chr3p25.1 lead SNP bait.
* Lead SNP at chr3p25 .1 (rs62232870) and chr22q11.23 (rs7284877) or proxies whose r2 with the lead, >0.6, is given into brackets
To delimitate the chromatin Topology Associated Domains (TADs) where genes are accessible to intra-TAD regulatory elements such as enhancers(28), we first looked for publicly available Left Ventricle Topology Associating Domain25 (track b). TAD boundaries were comforted by the results of in-house Circular Chromatin Conformation Capture (4C)-Sequencing data produced on a iPSC-derived cardiomyocyte line from a donor (fully described in Supplemental Material iPSC reprogramming paragraph); 4C baits localization is schematized as a vertical black bar (track g). The p-values for interaction below 10−8 are shown as a blue scale colored bar given below. We then looked at the preferential chromatin interactions measured via PCHi-C (Promoter Chromatin Hi-C)29 on iPS derived cardiomyocytes (gene promoters predicted from GenHancer (https://www.genecards.org/)) that revealed preferential contact inside TADs as shown by the red curves (track f) and defined the intra-TAD candidate gene list (track c) prone to be regulated in cis by associated regions (blue highlight, track a. The color code for the SNP is identical to that of close figure 3). Specific DNA interactions are joining associated regions with histone enhancer marks (H3K27ac; H3K4me1, track d and e respectively
This excel table is given in a supplemental excel file.
The 4C-seq results from in-house iPSC-CM cell lines presented in Figure 4Ag, show more extended significant interaction (p<10−8) between the associated region bait [chr3:14,257,356-14,307,016] and intra-TAD regional promoters, confirming TAD boundaries and enhancer function of the chromosome 3p25.1 locus. The strongest interaction signals (dark blue scale figure 4Ag; p < 10−50, Supplementary Table 6) are localized on SLC6A6, XPC/LSM3 region as well as in the intergenic region between those two genes where the bait is localized. Several baits specific of the 4C region and interacting with the associated SNPs block were designed to test for reverse interactions and confirmed the specificity of those cis-interaction signals (data not shown).
For each Supplementary Table 5 positional candidate gene, we filtered out the intra-TAD better candidates based on cardiac expression data from GTEx, Montefiori et al.(32) and Henig et al.(33) as well as publicly available resources for gene annotation.
Looking at LV and atrial appendage expression, all those genes are expressed with from the most to the less expressed TMEM43, CHCHD4, LSM3, SLC6A6, XPC and GRIP2 (Supplementary Table 7A). Moreover, XPC (p = 8.3 10−15) and SLC6A6 (p = 6.9 10−6) LV expressions were significantly increased in DCM patients compared to healthy donors (Supplementary Table 7A) while LSM3 expression was significantly decreased (p = 7.6 10−8). Functional candidate at this locus may include TMEM43, implied in arrhythmogenic right ventricular cardiomyopathy (ARVC)(39–41), and SLC6A6, for which a homozygous deletion affecting a splice site was found by whole-exome sequencing in a patient with idiopathic DCM(42).
Among the associated SNPs’ block, we then screened the GTEx database and Lemire et al results for eQTL, sQTL and mQTL(38). Even though we did not observe any evidence that the lead SNP, rs62232870, could influence the expression of nearby genes in relevant heart and skeletal muscle tissues, Table 3 presents DCM associated SNPs, in moderate LD, associated with SLC6A6 expression in heart atrial appendage. For example, rs7612736 (r2 = 0.71, D’ = 0.90, association p-value 1.6 10−9) moderately associates with that gene expression (p = 5.8 10−5). No sQTL was present but all the SNPs are associated with methylation level of several neighbor genes (Table 4). rs62232870 and the SNPs in strong LD with it are strongly associated with the methylation level of CpG site in SLC6A6 and TMEM43 (cg08926287 and cg15025640 with p-value around 10−30 and 10−16, respectively) while the rs4684185 LD SNP block associates primarily with SLC6A6 and XPC methylation levels (cg08926287 and cg23070574 with p-value around 10−70 and 10−36, respectively).
Combining all the data available so far (Table 4), the best candidate to take shape at this locus is SLC6A6 which is expressed in heart, differently between DCM and controls, and for which associated SNPs modulates methylation and expression (mQTLs and eQTLs).
Candidate culprit gene selection strategy at chr22q11.23 locus
The LD block of DCM-associated SNPs (Supplementary Figure 6 - Supplementary Table 4) extends over 70kb from the 5’ region of MMP11 and CHCHD10 to the 5’ region of DERL3 including SMARCB1 where the lead SNP maps to [chr22:24,110,180-24,182,174]. Multiple features witnessing the regulatory role of that locus, including LV enhancers(28), are observed (Figure 3B) and published ChIP-Seq experiments demonstrate that it is also strongly associated with H3K27ac and H3K4me1 LV marks (Figure 3B) providing support for an active enhancer function of that locus. TAD analysis at chromosome 22 showed that the associated SNPs block is located at the edge of 2 CM predicted TADs covering 1.2Mb [chr22:23,480,001-24,680,000] (Figure 4B). As multiple and strong intra- and inter-TADs interactions are found in that region (Figure 4B), the 21 genes covered by the two TADs should be considered as positional candidates (Supplementary Table 5).
Cardiomyocytes 4C-seq using the bait at chromosome 22 associated region revealed strong interactions with enhancer (H3K27ac/H3K4me1 positive) elements located close by (SMARCB1, DERL3, MMP11, SLC2A11, and CHCHD10 principally), and to a lesser extent with the BCR gene locus and an intergenic 50kb 3’ of IGLL1 and 18kb 3’ of RGL4 and with the promoter rich region adjacent to the bait up to CABIN1 (Figure 4Bd-f). The strongest 4C interaction signals (Figure 4Bg), are found for SMARCB1 and DERL3 (Supplementary Table 8; p < 10−50). Reverse interactions were tested with several baits confirming the specificity of those cis-interaction signals (data not shown).
Cardiac expression data showed that the most strongly expressed gene was CHCHD10 (Supplementary Table 7B) followed by GSTT1, DDT and SMARCB1 and, to a lesser extent, CABIN1 and SLC2A11. The other 15 genes (MIF, ZNF70, DERL3, MMP11 and further genes) appeared to be lower or not expressed. In LV tissue, CHCHD10 and to a lesser extent DDT, SMARCB1, SLC2A11, and CABIN1 were found to be differentially expressed between DCM and healthy donors (Supplementary Table 7B).
Among the association block, eQTL and sQTL were checked in GTEx database. Table 3 presented the significant eSNP in relevant cardiac and skeletal tissues. From these 6 candidate genes, only SMARCB1 expression was observed to be consistently strongly influenced by DCM associated SNPs, including the lead rs7284877 in the heart (Supplementary Figure 10) and skeletal muscle tissues. Three other genes, MMP11, VPREB3, and DERL3, even though not strongly expressed in heart, are also influenced by those associated SNPs in heart and/or skeletal muscle tissues (Table 3). No sQTL was present but rs7284877 and the SNPs in LD with it are associated with methylation level variation of several nearby genes (Table 4). The strongest regulation signals are detected for SMARCB1 and DERL3 (cg08219923 and cg25907215) with the best p-values below 10−200.
Combining all the data available so far (Table 4), SMARCB1 that is expressed in heart, differentially in DCM and control heart explants and for which associated SNPs modulates methylation and expression (mQTLs and eQTLs) appears to be the strongest candidate.
Discussion
By adopting a GWAS strategy, performed in the largest population of DCM assembled so far, we identified and replicated two new susceptibility loci for DCM while confirming two previously reported DCM associated ones, HSPB7 and BAG3.
The first novel locus maps to chr3p25.1 where the minor A allele of the lead rs62232870 was associated with an increased OR of 1.36 [1.25 – 1.48] (p = 5.3 10−13) in the combined discovery and replication samples. The pattern of association observed at this locus extends over several genes among which two, TMEM43 and SLC6A6, are expressed in heart and have previously been suspected to be involved in human structural cardiac disorders. Rare pathogenic mutations in the TMEM43, transmembrane protein 43, have been reported in arrhythmogenic right ventricular cardiomyopathy(39–41) while a homozygous splice site deletion has been observed in SLC6A6 in an idiopathic DCM patient(42). Taurine (TAU) is a highly concentrated amino-acide with cyto-protective action in numerous tissues, especially in contractile ones such as heart(43). Oral TAU supplementation is cardio-protective, while depletion associates with dilated cardiomyopathy in several mammalian species(44,45). Intracellular TAU concentration relies on its transmembrane transporter (TauT) expression and activity encoded by SLC6A6. Accordingly, mice KO for slc6a6 present tissue TAU level depletion and dilated cardiomyopathy(46) and a recent publication revealed that long-term treatment of dystrophic mice with taurine could prevent late heart dysfunction (47). It is thus tempting to speculate that SLC6A6 expression regulation in humans could influence cardiac function as well. Our finding of a genetic association of SLC6A6 alleles with DCM is remarkable in this context. Moreover, based on LV transcriptomic data, we observed that SLC6A6 expression was decreased in DCM patients compared to controls and that, for several SNPs in LD with rs62232870-A lead-SNP, the allele associated with the rs62232870-A risk allele, showed a slight decrease in SLC6A6 expression in heart atrial appendage (i.e. rs62231957 A allele is associated with the rs62232870-A risk allele and showed a decreased SLC6A6 expression (p=1.9 10−5) (Table 3 and Supplementary Figure 11).
As a whole, in silico annotation, chromatin interactions analysis showing specific interaction of the SNP LD block with SLC6A6 regulatory elements and the well-reported role of taurine in cardiac function strongly suggest SLC6A6 as the most likely gene influencing DCM phenotype at the 3p25.1 locus. The underlying pathway leading to heart failure remains to be fully studied in humans, but our results may suggest the potential for therapeutic perspective through taurine administration or up-regulation.
The second novel DCM locus maps to chr22q11.23, where the rs7284877 lead SNP, intronic to SMARCB1, is associated with DCM, the C-risk allele presenting an OR of 1.33 [1.22 – 1.46] (p = 5.0 10−10). Our fine-mapping revealed that, among the positional candidates, six demonstrated significant expression in heart and difference in LV gene expression between DCM and healthy heart (from the highest to the lowest: CHCHD10, GSTT1, DDT, SMARCB1, CABIN1, and SLC2A11) among which only SMARCB1 (SWI/SNF related matrix-associated actin-dependent regulator of chromatin subfamily b member 1) was under the influence of the lead rs7284877 in LV. Interestingly, the lead SNP is in complete LD with SMARCB1-rs5760054 recently reported to associate with LV internal dimension in systole and fractional shortening in a multi-trait GWAS conducted in 162,255 Japanese individuals(48). Even though not regulated by rs7284877, two other highlighted genes were already implied in cardiovascular disorders and/or phenotypes. First, GSTT1, glutathione S-transferase theta 1, has been suggested to associate with various cardiovascular diseases(49,50). It is shown via RNA-seq studies to be differentially expressed in heart depending on the pathological state: overexpression in male left ventricular hypertrophy (LVF) but downregulation in DCM heart compared to control(51); overexpression in DCM or non-failure vs. ischemic heart(52); Second, CABIN1, calcineurin binding protein 1, is a protein that binds to and inhibits calcineurin (CaN), a calcium-regulated phosphatase. CABIN1 is cleaved by calpain that is upregulated in heart failure LV. Calpain upregulation may increase cabin1 cleavage and thus lead to CaN overexpression in HF patients which can contribute to cardiac remodeling (53,54). Two other genes at this locus, DERL3 and MMP11, were not included in our candidates because of their low cardiac expression but could nonetheless be interesting candidates. DERL3-rs5760061, in complete LD with rs7284877, associated with LV internal dimension and fractional shortening in the Japanese GWAS. Another SNP of this gene, rs6003909, in high LD (r2=0.8) with rs7284877, also associated with LV geometric remodeling in a UK Biobank GWAS of LV image-derived phenotypes (55). We also found that another SNP, rs2186370, in complete LD with the lead SNP, is a mQTL strongly associated (p=8.4 10−71) with the methylation level of CpG cg25907215 lying in the 3’UTR of DERL3. DERLIN3 (DERL3) plays a role in heart homeostasis maintenance (56,57) and its expression in skeletal muscle is under the control of DCM associated SNPs including the lead. MMP11 has an expression consistently influenced by rs7284877. Interestingly, unlike the 10 other rs7284877 cis-regulated expressions, the eQTL activity on MMP11 is quasi restricted to cardiac tissues (p= 4 10−20, 5.7 10−15 and 7.4 10−5 in heart left ventricle, heart atrial appendage, and cells-cultured fibroblast, respectively). MMP11, matrix metallopeptidase 11, encodes an enzyme implied in the regulation of extracellular matrix and in the development of myocardial fibrosis and ventricular remodeling(58,59) functions that could be related with DCM. Even though we did not select a striking candidate at this locus, SMARCB1, whose expression and methylation are directly regulated by the lead SNP and associates with various cardiac phenotypes seems to be the stronger.
However, for both of these DNA loci with a regulatory function, it is also plausible that a single gene does not explain the whole functional activity detected(60). Given the strong bi-allelic activity of the enhancer regions in both loci and interaction shown in 4C data, multiple genes expressed in cardiomyocytes might be regulated in parallel to produce DCM phenotype.
Beyond the discovery of two novel DCM-associated loci, this GWAS investigation provided an innovative estimate of the heritability of the disease in European descent individuals, h2 = 31% ±8%, a value consistent with that (h2 ∼30%) recently reported in a population of African origin(12). However, the four independent lead SNPs (BAG3, HSPB7, chr3p25.1-SLC6A6, and chr22q11.23-SMARCB1) reaching genome-wide significance only contribute to 2% of the heritability suggesting the role of additional genetic factors or gene/gene and gene/environment interactions yet to be identified. Despite their modest contribution to the DCM heritability, these 4 SNPs may have clinical utility. Indeed, a GRS constructed on these 4 SNPs revealed that individuals with 8 risk alleles were at 30% increased risk of DCM and those with 1 risk allele at 19% reduced risk of DCM compared to individuals with 5 the risk alleles referral. This GRS, the first of its kind developed in the field of DCM, may have practical implications for better management of subjects at risk for DCM or systolic dysfunction, such as patients with drugs increasing the risk of myocardial dysfunction, or relatives in DCM families. Further clinical studies are warranted to validate its clinical utility.
Despite its innovative findings in the context of DCM, this study suffers from some limitations. First, we robustly identified two new DCM loci but were not able to fully demonstrate which culprit variant(s) and gene(s) are directly responsible for the observed susceptibility to the disease. Further intensive molecular and cellular investigations are needed to fill this gap. Second, despite being the largest GWAS ever performed on DCM, with both a discovery and a replication phase, our GWAS may have been suboptimal in identifying common susceptibility DCM alleles due to the absence of perfectly matched healthy controls for the British and US populations. Therefore, we performed our discovery GWAS on combined individual data while handling any potential hidden population stratification through adjustment on genetically derived principal components. Nevertheless, the robust replication of 2 out of 3 genome-wide significant associations in two case-control studies of Dutch and German origins provides strong support for the validity of our general strategy. The replication of the reported genetic associations in non-European ancestry populations is now needed. Similarly, the in-depth interrogation of electronic health records database with GWAS data, such as UK biobank (https://www.ukbiobank.ac.uk/)(61) or the Million Veteran Program (https://www.research.va.gov/mvp/)(62), and DCM information, may increase the power to detect additional DCM associated loci.
In conclusion, we identified two new genetic loci associated with DCM. While SLC6A6 clearly stands out as the most likely candidate at the chr3p25.1 locus, several plausible candidates SMARCB1, DERL3, and MMP11 could be proposed at chr22q11.23 locus. A genetic risk score was built with a potential clinical application for the prediction of systolic heart failure. These findings add both on the understanding of the genetic architecture of DCM and on potential new players/pathways involved in the pathophysiology of systolic heart failure.
Funding
This work was supported by grants from the GENMED Laboratory of Excellence on Medical Genomics (ANR-10-LABX-0013), DETECTIN-HF project (ERA-CVD framework), Assistance Publique-Hôpitaux de Paris (PHRC programme hospitalier de recherche Clinique, AOM04141), Délégation à la recherche clinique AP-HP (EMUL and PHRC n°AOM95082), the ‘Fondation LEDUCQ’ (Eurogene Heart Failure network; 11CVD 01), the PROMEX charitable foundation, the Société Française de Cardiologie/Fédération Française de Cardiologie.
The SFB-TR19 registry was supported by the Deutsche Forschungsgemeinschaft (DFG). The Study of Health in Pomerania (SHIP) is part of the Community Medicine Research net of the University of Greifswald, Germany, funded by the Federal Ministry of Education and Research (Grants 01ZZ9603, 01ZZ0103, and 01ZZ0403), the Ministry of Cultural Affairs and the Social Ministry of the Federal State of Mecklenburg-West Pomerania. This work was also funded in part by grants from the German Center for Cardiovascular Research (DZHK).
The KORA study was initiated and financed by the Helmholtz Zentrum München – German Research Center for Environmental Health, which is funded by the German Federal Ministry of Education and Research (BMBF) and by the State of Bavaria. Furthermore, KORA research was supported within the Munich Center of Health Sciences (MC-Health), Ludwig-Maximilians-Universität, as part of LMUinnovativ
Benjamin Meder is supported by grants from the Deutsches Zentrum für Herz-Kreislauf-Forschung (German Center for Cardiovascular Research, DZHK), the German Ministry of Education and Research (CaRNAtion, FKZ 031L0075B), Informatics for Life (Klaus Tschira Foundation), the Deutsche Forschungsgemeinschaft (DFG) and by an excellence fellowship of the Else Kröner Fresenius Foundation.
Folkert Asselbergs is supported by UCL Hospitals NIHR Biomedical Research Centre and Magdalena Harakalova by the NWO VENI grant (no. 016.176.136).
Supplementary Tables and Figures
Supplementary Figure 1. QQplot summarizing the discovery GWAS results
Supplementary Figure 2 to 6. Regional-association-plot at chromosome 1, 3, 10, 16 and 22
Supplementary Figure 7-8. Manhattan and QQ summarizing the discovery GWAS conditional-analysis results
Supplementary Figure 9. Barplot of the weighted scores grouped by quintile
Supplementary Figure 10. Violin-plot showing the regulation of SMARCB1 (a) and MMP11 (b) expression by rs7284877 lead SNP
Supplementary Figure 11. Violin-plot showing the regulation of SLC6A by rs62231950
Supplementary Table 1. Genotyping methods and quality controls
Supplementary Table 2. Haplotypic-association signal at chromosome 3 locus
Supplementary Table 3. Unweighted (A) and weighted (B) scores replications
Supplementary Table 4. SNPs’ LD block at chromosome 3 and 22 loci
Supplementary Table 5. List of the gene at chromosome 3 and 22 loci
Supplementary Table 6. 4C interaction at chromosome 3 locus
Supplementary Table 7. Gene Expression level at chromosome 3 and 22 loci
Supplementary Table 8. 4C interaction at chromosome 22 locus
Supplementary Table 9. 4C-seq primers
Table 4: SNP annotation. This excel table is given in a supplemental excel file.
Acknowledgment
Footnotes
↵# Member of European Reference Networks for rare, low prevalence and complex diseases of the heart (ERN GUARD-Heart)












