Polygenic Link Between Blood Lipids And Amyotrophic Lateral Sclerosis

Dyslipidemia is common among patients with amyotrophic lateral sclerosis (ALS). We aimed to test the association and causality between blood lipids and ALS, using polygenic analyses on the summary results of genome-wide association studies. Polygenic risk scores (PRS) based on low-density lipoprotein cholesterol (LDL-C) and total cholesterol (TC) risk alleles were significantly associated with a higher risk of ALS. Using single nucleotide polymorphisms (SNPs) specifically associated with LDL-C and TC as the instrumental variables, statistically significant causal effects of LDL-C and TC on ALS risk were identified in Mendelian randomization analysis. No significant association was noted between PRS based on triglycerides or high-density lipoprotein cholesterol risk alleles and ALS, and the PRS based on ALS risk alleles were not associated with any studied lipids. This study supports that high levels of LDL-C and TC are risk factors for ALS, and it also suggests a causal relationship of LDL-C and TC to ALS.


Introduction
Dyslipidemia is common among patients with amyotrophic lateral sclerosis (ALS). Higher-than-expected levels of low-density lipoprotein cholesterol (LDL-C), total cholesterol (TC), triglycerides (TG) and low/high-density lipoprotein cholesterol ratio (LDL-C/HDL-C) have been reported among patients with ALS by many although not all studies. [1][2][3][4][5][6] In a recent observational study, we found that patients with ALS had increased levels of LDL-C and apolipoprotein B up to 20 years before diagnosis. 7 Whether or not there is a genetic contribution to the noted dyslipidemia in ALS is unknown, and it is also unclear whether dyslipidemia plays a causal role in ALS.
By using single nucleotide polymorphisms (SNPs) from large-scale genome-wide association studies (GWAS), polygenic methods have been developed to investigate the genetic etiology and causality between human complex traits. 8 SNPs derived polygenic risk scores (PRS) of lipids have for instance been shown to be predictive of circulating lipid trajectories. 9,10 Mendelian randomization (MR) study using SNPs that are specifically associated with LDL-C and TG (but not with other risk factors or confounders) has further clearly suggested causal relationships of LDL-C and TG to coronary artery diseases (CAD). 11,12 In this study, we aimed to test the association and causality between blood lipids and ALS, by using polygenic methods on the latest GWAS results among Europeans.

Data
All data used in this study were summary GWAS results in European ancestry populations. Because the association and causality between blood lipids and CAD have been established by polygenic methods in previous study, we used CAD as a "point of reference" outcome to verify the data quality and parameter setting of the present analysis on lipids and ALS.

Polygenic analyses
Bi-directional PRS analyses were performed to check the association between blood lipids and ALS, in which blood lipids were used first as base (exposure) and then as target (outcome). In the base data, correlated SNPs were clumped by using HapMap_ceu_all genotypes (release 22, 60 individuals, 3.96 million SNPs) as the linkage disequilibrium (LD) reference, with the recommended parameter settings in PRSice v1.25: including a clumping threshold of p1=p2=0.5, a LD threshold of r 2 =0.05, and a distance threshold of 300Kb; independent SNPs were then included in quantiles with gradually increasing P-value thresholds (PT, ranging from 0 to 0.5, in steps of 1x10 -4 per quantile); the PT of the quantile that explained the largest variance of the target was defined as the best-fitted PT. 16 When a significant association was identified between certain lipid and ALS in the PRS analyses, we conducted a MR analysis to test the potential causality. 17 PhenoScanner database was used to identify the specific loci for the lipid in question: the lead SNP in each locus achieving genome-wide significance (P-value<5x10 -8 ) and are not associated with other lipids or potential confounders of the lipid-ALS relationship (e.g., lifestyle factors and body mass index, P-value>5x10 -8 ). 18 The lead SNPs (or their proxy SNPs) in the selected loci were then used as instrumental variables (Fig. 1), and the causal effect of the exposure (lipid) on outcome (ALS) was calculated and plotted by using the "gtx v0.0.8" package in R 3.2.5.

Bi-directional PRS analyses
The best-fitted PRS based on LDL-C and TC risk alleles were both statistically significantly associated with ALS risk ( Table 1). The odds ratios (ORs) were 1.16 (95% confidence interval [CI] 1.10-1.22, P=8.82x10 -8 ) and 1.15 (95% CI 1.09-1.21, P=8.17x10 -7 ) per standard deviation of the PRS based on LDL-C and TC risk alleles, respectively. The best-fitted PT was the same (5x10 -5 ) for LDL-C and TC, with SNPs in the corresponding quantiles explaining 0.08% and 0.07% of the variance of ALS. The associations of PRS based on LDL-C and TC risk alleles with ALS were similar and statistically significant across quantiles with PT≤5x10 -5 ( Fig. 2A-B). Similar results were noted for the quantiles when including only genome-wide significance SNPs (PT≤5x10 -8 ), and the variance of ALS explained by these SNPs was similar to the variance explained by the best-fitted quantile ( Fig. 2C-D).
In contrast, PRS based on TG and HDL-C risk alleles were not associated with ALS (Table 1). PRS based on LDL-C, TC and TG risk alleles were all statistically significantly associated with a higher risk of CAD, and PRS based on HDL-C risk alleles was associated with a lower CAD risk. In the reverse PRS analyses, PRS based on ALS risk alleles were not significantly associated with levels of any studied lipids. PRS based on CAD risk alleles were weakly but significantly associated with higher levels of LDL-C, TC, and TG whereas lower level of HDL-C.

Causality tested by MR analyses
Based on the PRS results, we conducted MR analyses between LDL-C, TC and ALS. Because LDL-C and TC are known to be highly correlated, we chose the lead (or proxy) SNPs in all of the 27 independent loci (Supplementary Table 1) reported to be specifically associated with both LDL-C and TC, but with neither HDL-C nor TG. 13 The combined effect of these SNPs on ALS was statistically significant (OR=1.01, 95% CI 1.00-1.02, P=0.008; Supplementary Table 2). Using these 27 SNPs as instrumental variables, the causal effects of LDL-C and TC on ALS were shown to be statistically significant (OR≥1.21, P-value≤0.001, Fig. 3A; Supplementary Fig.1A). We conducted a further analysis using 13 of the 27 SNPs, which are specific for LDL-C and TC but not associated with any other traits according to the current findings of GWAS. The causal effects of LDL-C and TC on ALS risk remained similar and statistically significant (OR≥1.26, P-value<0.03), and the P-value for heterogeneity became non-significant ( Fig.  3B; Supplementary Fig.1B). Stronger and significant causal effects of LDL-C and TC on CAD were also identified using these sets of instrumental variables (OR≥1.43, P-value<5x10 -5 , Supplementary Fig.2-3).

Discussion
This is the first study using polygenic methods on summary GWAS results to test the association and causality between blood lipids and ALS. When using unrestricted but independent SNPs across the genome, we find PRS based on LDL-C and TC risk alleles to be significantly associated with ALS risk. When using restricted and independent SNPs that are specifically associated with LDL-C and TC but not with other lipids or risk factors in the MR analysis setting, we further identified evidence for a causal relationship between LDL-C, TC and ALS.
In the bi-directional PRS analyses, we found that PRS based on LDL and TC risk alleles were significantly associated with ALS risk, whereas PRS based on ALS risk alleles were not associated with any of the studied lipids. One major reason might be the smaller sample size of ALS GWAS compared to GWAS of blood lipids and CAD, which would influence the precision of the estimated effect sizes and thereby their predictive ability. Moreover, SNPs in the best-fitted quantiles explain extremely low variance (close to zero) of the target, whereas in the analysis of lipids and CAD, SNPs in the best-fitted quantiles explain 0.1~0.3% of the target in both direction (Table 1). A first explanation is similarly the inadequate sample size of ALS GWAS, with less than eight genome-wide significant loci have been identified to be associated with ALS in the latest study. 14 Another explanation might be the special polygenic rare variant architecture of ALS. Unlike most common complex diseases, low-frequency or rare alleles may explain proportionally more heritability of ALS. 14 Because GWAS only capture contributions from common SNPs with minor allele frequency over 1% in the population, it is insensitive to capture effects from rare variants. This explain partly the reason why SNPbased heritability of ALS estimated from GWAS was less than 9%, whereas previous family and twin studies reported a heritability of ALS as high as 60%. 19 Such a possibility is further supported by the fact that the estimates of OR and explained variance of ALS stay at a stable level from PT=5x10 -19 to 5x10 -5 , but become smaller and not statistically significant as additional and less informative or less specific SNPs (PT>5x10 -5 ) were included (Fig. 2).
In the MR analyses, we identified statistically significant causal effects of LDL-C and TC on ALS, when using specific genetic variants for LDL-C and TC as the instrumental variables. These genetic variants are in the loci close to the genes involved in the cholesterol transportation or lipid modification processes, which may point to potential polygenic pathways between dyslipidemia and ALS (Supplementary Table 2).
Limitations of the presented polygenic methods should be acknowledged. First, in the PRS analyses, we grouped independent SNPs (by LD clumping) into 5000 quantiles with gradually increasing P-value thresholds (PT from 0 to 0.5) across the whole genome. The threshold that explains the largest variance was defined as the best-fitted PT. Such optimization over a particular parameter may introduce some degree of over-fitting (winners curse). However, even if we did apply overly conservative Bonferroni correction with a new significance level at 1x10 -5 , our findings would still remain statistically significant, arguing against the possibility that our conclusion is sensitive to such potential over-fitting. Second, we should bear in mind that the causality tested by MR analysis is based on certain assumptions. The most important one is that instrumental variables should be specific for exposure but not associated with other confounding factors. 12 To test the potential influence of this assumption, we used PhenoScanner to further select 13 SNPs that are specific for LDL-C and TC and not associated with any other traits according to current GWAS findings. As the results are largely similar ( Fig.3; Supplementary Fig.1-3), the concern about MR assumption was relieved.
Although our study found clear polygenic evidence that LDL-C and TC are causally associated with ALS, larger GWAS and deep sequencing for rare variants might be essential to further understand the genetic mechanisms of dyslipidemia in ALS. In parallel, functional molecular studies are needed to shed light on potential pathogenic mechanisms of the suggested polygenic pathways relating lipid processing to ALS risk.

Disclosure statement
The authors have no conflicts of interest to disclose. Standardized polygenic risk score (PRS) based on the risk alleles of base phenotype was used to predict the target phenotype, by using summary results from European ancestry genome-wide association studies. PT: P-value threshold of the association between single nucleotide polymorphisms (SNPs) and the base phenotype, the best PT was defined as the threshold corresponding to the largest explained variance. Because independent SNPs were grouped into different quantiles with gradually increasing PT (ranging from 0 to 0.5, in steps of 1x10 -4 per quantile), there were 5000 PT quantiles for each pair. log OR: log transformed odds ratio per standard deviation of PRS; SE: standard error; nSNP: number of independent SNPs included in the best PT quantile; r 2 : Nagelkerke r 2 , the proportion of the target variation explained by SNPs in the best PT quantile. LDL-C and HDL-C: low-and high-density lipoprotein cholesterol; TC: total cholesterol; TG: triglycerides; ALS: amyotrophic lateral sclerosis; CAD: coronary artery disease.