Genetics of human plasma lipidome: Understanding lipid metabolism and its link to diseases beyond traditional lipids

Aim Genetic investigation of human plasma lipidome to get insights into lipid-related disorders beyond traditional lipid measures. Methods and Results We performed a genome-wide association study (GWAS) of 141 lipid species (n=2,181 individuals), followed by phenome-wide scans (PheWAS) with 44 clinical endpoints related to cardiometabolic, psychiatric and gastrointestinal disorders (n=456,941 individuals). SNP-based heritability for lipid species ranged from 0.10-0.54. Lipids with long-chain polyunsaturated fatty acids showed higher heritability and genetic sharing, suggesting considerable genetic regulation at acyl chains levels. We identified 35 genomic regions associated with at least one lipid species (P<5×10−8), revealing 37 new SNP-lipid species pair associations e.g. new association between ABCG5/8 and CE(20:2;0). PheWAS of lipid-species-associated loci suggested new associations of BLK with obesity, FADS2 with thrombophlebitis, and BLK and SPTLC3 with gallbladder disease (false discovery rate <0.05). The association patterns of lipid-species-associated loci supplied clues to their probable roles in lipid metabolism e.g. suggestive role of SYNGR1, MIR100HG, and PTPRN2 in desaturation and/or elongation of fatty acids. At known lipid loci (FADS2, APOA5 and LPL), genetic associations provided detailed insights to their roles in lipid biology and diseases. We also show that traditional lipid measures may fail to capture lipids such as lysophospatidylcholines (LPCs) and phosphatidylcholines (PCs) that are potential disease risk factors, but are not included in routine screens. The full genome-wide association statistics are available on the web-based database (http://35.205.141.92). Conclusion Our study reveals genetic regulation of plasma lipidome and highlights the potential of lipidomic profiling in disease gene mapping.


Introduction
Plasma lipids are risk factors for various complex disorders like cardiovascular disease (CVD) and type 2 diabetes. 1 other risk factors for type 2 diabetes and CVD events. [4][5][6][7][8][9] Moreover, the fatty acid compositions of phospholipids have been implicated in coronary heart disease beyond traditional lipid measures. 10 Genetic screens have identified over 250 genomic loci associated with traditional lipid levels. 11,12 These genetic findings have helped to understand lipid biology and physiological processes underlying CVD and other diseases. For the majority of the genomic loci, however, the causal genes and/or their effects on detailed lipidomes beyond traditional lipid measures are unknown. Only a few studies have reported genetic associations for lipid species either through studies on subsets of the lipidome 13,14 or GWASs on metabolomic measures. [15][16][17][18][19][20] In most of these studies, however, the lipids have not been resolved on the molecular level of fatty acid or acyl chain composition (molecular lipid species).
We considered that the genetic investigation of detailed lipidomic profiles could provide better insight into lipid metabolism and its link to clinical outcomes surpassing traditional lipid measures. To test this, we carried out a GWAS of lipidomic profiles of 2,181 individuals followed by PheWAS with 44 clinical end-points related to cardiometabolic, psychiatric and gastro-intestinal disorders in an independent dataset of 456,941 individuals from the Finnish and UK Biobanks ( Figure 1). We aimed to identify genomic loci influencing plasma levels of lipid species and their effects on the risk of diseases. We also set out to answer the following questions: (1) how heritable are the lipid species and do they share genetic components (2) can we gain mechanistic insights to pathways linking genetic variation to disease risk through detailed measures of the lipid species and (3) could detailed lipid profiles provide additional biological insights to genetic regulation of lipid metabolism for the previously identified lipid loci.

Study cohorts
The detailed description of the subject recruitment and measurements is provided in Supplementary Data. Briefly, the study included participant from the following cohorts:

EUFAM: The European Multicenter Study on Familial Dyslipidemias in Patients with Premature
Coronary Heart Disease (EUFAM) study cohort is comprised of the Finnish familial combined hyperlipidemia families. 21 The families in EUFAM study were identified via probands admitted to Finnish university hospitals with a diagnosis of premature coronary heart disease. For the lipidomic profiling, 1,039 EUFAM participants for which serum samples were available were included (Supplementary Table 1).

FINRISK:
The Finnish National FINRISK study is a population-based survey conducted every 5 years since 1972 (detailed in Supplementary Data). 22 Lipidomic profiling was performed for 1,142 participants that were randomly selected from the FINRISK 2012 survey (Supplementary Table 1).

Finnish Biobank:
The Finnish Biobank data is composed of 47,980 Finnish participants with 807 phenotypes derived from ICD codes in Finnish national hospital registries and cause-of-death registry as a part of FinnGen project.

UK Biobank:
The UK Biobank data is comprised of >500,000 participants based in UK and aged 40-69 years, annotated for over 2,000 phenotypes. 23 The PheWAS analyses in the present study included 408,961 samples from white British participants.
Written informed consent was obtained from all the study participants. The study protocols were approved by the ethics committees of the participating centers. The study was conducted in accordance with the principles of the Helsinki declaration.

Lipidomic profiling
Mass spectrometry-based lipid analysis of 2,181 participants was performed in three batches-353  Table 2) detected consistently in three batches were included in further analyses. The total amounts of lipid classes were calculated by summing up respective lipid species.
The measured concentrations of the lipid species and calculated class total were transformed to normal distribution by rank-based inverse normal transformation.

Genotyping and imputation
Genotyping for both EUFAM and FINRISK cohorts was performed using the HumanCoreExome Quality control and imputation were performed using the same pipeline as described above.
Genotyping for the majority of the UK Biobank participants was done using the Affymetrix UK Biobank Axiom Array, while a subset of participants was genotyped using the Affymetrix UK The heritability estimates of lipid species in different groups were compared using Mann-Whitney U test. The phenotypic correlations based on the plasma levels between all the pairs of the lipid species and traditional lipid measures were calculated using Pearson's correlation coefficient. The heatmaps and hierarchical clustering based on genetic and phenotypic correlations were generated using heatmap.2 in R.

Lipidomics GWAS and meta-analysis
We performed univariate association tests for 141 individual lipid species, 12 total lipid classes and 4 traditional lipid measures (HDL-C, LDL-C, total cholesterol and triglycerides), in all batches to control for possible batch effects and combined the summary statistics by meta-analysis. The association analyses for the EUFAM cohort were performed using linear mixed models including the above-mentioned covariates as fixed effects and kinship matrix as random effect as implemented in MMM. 28 The kinship matrices for the GWAS analyses were computed separately for each chromosome to include the variants from the other chromosomes using directly genotyped variants with MAF >0.01 and missingness <2%. The FINRISK cohort was analyzed with linear regression model adjusting for age, sex, first ten PCs, lipid medication and diabetes using SNPTEST v2.5. 29 Meta-analyses were performed using the inverse variance weighted method for fixed effects adjusted for genomic inflation factor in METAL. 30 In addition, analyses adjusting for the traditional lipid measures (in addition to above-mentioned covariates) were also performed for the identified variants to determine the independent effect on lipid species.
Test statistics were adjusted for λ values if >1.0 before meta-analyses. Genomic inflation factor (λ) ranged from 0.98-1.19 across the batches whereas the final λ values for meta-analysis ranged from 0.998 to 1.045 (Supplementary Table 3

PheWAS
We identified 95 disease phenotypes that have previously been linked to lipid levels including cardiometabolic, psychiatric and gastrointestinal disorders from the derived phenotypes in the Finnish Biobank. We manually mapped 44 of these 95 phenotypes to UK Biobank phenotypes (Supplementary Table 4). For the Finnish cohort, the associations between the 35 lead variants from the identified loci and these 44 phenotypes were obtained from the ongoing PheWAS as a part of the FinnGen project. The associations were tested using saddle point approximation method adjusting for age, sex, and first 10 PCs as implemented in SPAtest R package. 31 The association between these 44 phenotypes and 35 lead variants in UK Biobank were obtained from Zhou et al.
that were tested using logistic mixed model in SAIGE with a saddle point approximation and adjusting for first four principal components, age and sex (https://www.leelabsg.org/resources). 32 In addition, associations with obesity and body mass index (BMI) were also tested using logistic and linear regression models respectively with the same covariates as mentioned above, both for Finnish and UK Biobank cohorts. Individuals with BMI ≥30kg/m 2 were categorized as obese. Metaanalyses of both cohorts were performed using the inverse variance weighted method for fixed effects model in METAL. All the PheWAS associations with false discovery rate (FDR) <5% evaluated using the Benjamini-Hochberg method were considered significant.

Variance explained
To determine the variance explained by the known loci for traditional lipid measures, we included all the lead variants with MAF >0.005 in 250 genomic loci that have previously been associated with one or more of the four traditional lipid measures (Supplementary Table 5). Of the 636 reported variants, 557 variants with MAF >0.005 (including six proxies) were available in our QC passed imputed genotype data. A genetic relationship matrix (GRM) based on these 557 variants was generated using GCTA that was used to determine the variance in plasma levels of all lipid species explained by the known variants using variance component analysis in biMM.

Genetic contribution to lipidome
SNP based heritability estimates ranged from 0.10 to 0.54 ( Figure 2A, Supplementary Table 2), with considerable variation across lipid classes ( Figure 2B), with similar trends as reported previously. 33 Figure 2B), which is similar to a previous study that reported higher heritability for sphingolipids ranging from 0.28 to 0.53 estimated based on pedigrees. 33 Lipids containing polyunsaturated fatty acids, particularly C20:4, C20:5 and C22:6, had significantly higher heritability compared to other lipid species ( Figure 2C).  Table 6). This can be seen in the hierarchical clustering based on genetic correlations that segregate TAG subspecies into two clusters based on carbon content and degree of unsaturation ( Figure 2D).
These patterns were not seen in phenotypic correlations that were estimated based on the plasma It is to be noted that this sample size might not provide sufficient power for heritability estimations in unrelated samples. However, our study also included the family samples which 1 provides higher statistical power in heritability estimation than unrelated samples. Moreover, lipid species with genome-wide significant association had higher heritability estimates compared to the lipid species with no significant association (Supplementary Figure 3).
Furthermore, we found that APOA5 rs964184-C and LPL rs964184-T were associated with reduced levels of medium length TAGs (C50 to C56), with strongest associations with TAG(52:3;0). The striking similar patterns of associations of APOA5 rs964184-C and LPL rs11570891-T with TAG species suggest that both these variants might lead to more efficient hydrolysis of medium length TAGs ( Figure 5). To test this, we determined the effect of LPL rs11570891-T on LPL enzymatic activity and relationship between LPL activity and TAG subspecies using post-heparin LPL measured in EUFAM cohort. We found that LPL rs11570891-T (an eQTL increasing LPL expression) was associated with the increased LPL activity which in turn was associated with TAG species with higher effect on medium length TAGs than other TAGs ( Figure 5).

Lipid species associated loci and disease risks
PheWAS revealed associations of lead variants from six lipid species associated loci (APOA5, ABCG5/8, BLK, LPL, FADS2 and SPTLC3) with at least one of the clinical endpoints (FDR<5%) ( Table 2, Supplementary Table 9). These included novel associations of variants at FADS2, BLK and SPTLC3 with various disease outcomes. FADS1-2-3 is a well-known lipid modifying locus, however, like many other known lipid loci, its effects on CVD risk has been unclear. We found association of FADS2 rs28456-G with lower risk of phlebitis and thrombophlebitis.
The PC(16:0;0-16:0;0) associated locus-BLK (rs1478898-A), which is an eQTL for BLK, showed association with decreased risk of obesity and hypertension, and increased risk of gallbladder disorders. In addition to its role in B-cell receptor signaling and B-cell development, BLK stimulates insulin synthesis and secretion in response to glucose and enhances the expression of several pancreatic beta-cell transcription factors. 37 Consistent to its physiological roles, BLK has previously been implicated in autoimmune diseases such as systemic lupus erythematosus, 38 maturity-onset diabetes of the young (MODY), 37 and hypertension, 39 however its role in obesity and gallbladder disorder has not been described before.

Discussion
Our study integrates lipidome, genome and phenome to reveal detailed description of genetic regulation of lipid metabolism and its effect on disease risk. To the best of our knowledge, this is the first large-scale study of genetics of lipidomics presenting the SNP based heritability, genetic sharing of the lipid species, and new genomic loci associated with one or several lipid species and disease risks in humans. The detailed profiling also provided clues to probable molecular mechanisms for genetic variants both at new and previously reported loci.
The results presented here allow us to draw several conclusions. First, despite the influence of dietary intake on the circulatory levels of lipids, plasma levels of lipid species are found to be heritable, suggesting considerable role of endogenous regulation in lipid metabolism. Importantly, genetic mechanisms seem not to regulate all lipid species in a lipid class in the same way, as also observed in recent mice lipidomics studies. 41,42 Longer and more unsaturated lipid species from different lipid classes clearly display a greater genetic sharing. These observations are consistent with a previous study based on family pedigree. 33 Our finding is important in the light of the proposed role of lipids containing PUFAs in CVD, diabetes, and neurological disorders. [43][44][45] Identification of genetic factors regulating these particular lipids is important for understanding the subtleties of lipid metabolism and devising disease preventive strategies; including dietary interventions for these common complex diseases that cause an enormous public health burden globally. Our study provides multiple leads in this direction by identifying 11 genomic loci (KLHL17, APOA1, CD33, SHTN1, FADS2, LIPC, MBOAT7, MIR100HG, PTPRN1, and TMEM86B) associated with long, polyunsaturated lipids. Of these, FADS2, APOA1, and LPL variants were also associated with cardiovascular related phenotypes in our PheWAS analysis ( Figure 5). We discuss below how identification of genetic variants in these lipid species associated loci has helped us to provide new insights to their role in lipid metabolism.
Second, we identified genetic variants associated with 74 lipid species from 11 lipid classes.
Individual lipid species from several lipid classes including CERs, CEs, TAGs, and PCs have been shown to predict risk for CVD and diabetes. [4][5][6][7][8][9][10] This knowledge can directly fuel studies on disease markers or drug target discovery. For example, Cer(d18:1/24:0) is recently reported to be associated with increased risk of CVD events. 46 We have identified a variant associated with Cer(42:1;2) (this species presumably includes Cer(d18:1/24:0) molecular species) near ZNF385D. The ZNF385D rs13070110-C was associated with increased levels of Cer(42:1;2). We also observed nominal association of rs13070110-C with increased risk of arterial and venous thrombosis (Supplementary Table 9). CEs have also been reported to modulate the risk of CVD events. 4,8 Our study revealed three loci associated with CEs, including two novel loci-ABCG5/8 and SYNGR1. The rs76866386-C at ABCG5/8, which codes for ABC sterol transporters G5 and G8, has previously been associated with TC, LDL-C and CEs in LDL, 11 codes for long non-coding RNAs that act as regulators of cell proliferation. 50 The MIR100HG rs10790495 is an eQTL for the heat shock protein HSPA8 that also has a role in cell proliferation. 51 However, it is not known if PTPRN2 and MIR100HG or HSPA8 have any role in lipid metabolism.
Our results suggest that these variants might have role in negative regulation of either elongation and desaturation of fatty acids or incorporation of long chain unsaturated fatty acids during TAG biosynthesis.
Fourth, our results point to probable risk/protective variants for diseases, highlighting the potential of using detailed lipidomics profiles in disease gene mapping. Two lipid species associated loci-BLK and SPTLC3 were found to be associated with cholelithiasis (gallstones). BLK was also identified as a new susceptibility locus for obesity in the present study. Cholelithiasis is one of the most prevalent gastrointestinal diseases with up to 15% prevalence in adult populations. Although up to two thirds of patients do not suffer any symptoms, cholelithiasis is the most significant risk factor for acute cholecystitis. 52 Risk factors of cholelithiasis include obesity, hyperlipidemia and type 2 diabetes. Also, the pathogenesis of cholelithiasis is now recognized to be influenced by the immune system. 53 Owing to its role in immune response, insulin synthesis and insulin secretion, BLK seems to be a potential risk modifying gene for obesity and cholelithiasis. Relationship between ceramides and cholelithiasis has also been suggested previously, 40 and given the role of SPTLC3 in ceramide biosynthesis, the SPTLC3 variant might influence the risk for gallstones.
However, the associations of these variants with disease risks warrant further investigation. profiles. 54 The UK Biobank cohort is reported to have "healthy volunteer" effect, 55 which may affect the PheWAS results. However, given the large sample size, the selection bias is unlikely to have substantial effect on genetic case-control association analyses. Furthermore, lipidomic profiles were measured in whole plasma comprising of all lipoprotein classes and particle sizes, which does not provide information at the level of individual lipoprotein subclasses and limits our ability to gain detailed mechanistic insights. We also excluded poorly detected lipid species from all the analyses to ensure high data quality that narrowed the spectrum of lipidomic profiles. Further advances in lipidomics platforms might help to capture more comprehensive and complete lipidomic profiles, including the position of fatty acyl chains in glycerol backbone of TAGs and glycerophospholipids and detection of sphingosine-1-P species and several other species, that would allow to overcome these limitations.
In conclusion, our study demonstrates that lipidomics enable deeper insights to the genetic regulation of lipid metabolism than clinically used lipid measures, which in turn might help guide future biomarker and drug target discovery and disease prevention.