Genome-wide association study of circulating liver enzymes reveals an expanded role for manganese transporter SLC30A10 in liver health

To better understand molecular pathways underlying liver health and disease, we performed genome-wide association studies (GWAS) on circulating levels of alanine aminotransferase (ALT) and aspartate aminotransferase (AST) across 408,300 subjects from four ethnic groups in the UK Biobank, focusing on variants associating with both enzymes. Of these variants, the strongest effect is a rare (MAF in White British = 0.12%) missense variant in the gene encoding manganese efflux transporter SLC30A10, Thr95Ile (rs188273166), associating with a 5.9% increase in ALT and a 4.2% increase in AST. Carriers have higher prevalence of all-cause liver disease (OR = 1.70; 95% CI = 1.24 to 2.34) and higher prevalence of extrahepatic bile duct cancer (OR = 23.8; 95% CI = 9.1 to 62.1) compared to non-carriers. Over 4% of the cases of extrahepatic cholangiocarcinoma in the UK Biobank carry SLC30A10 Thr95Ile. Unlike variants in SLC30A10 known to cause the recessive syndrome hypermanganesemia with dystonia-1 (HMNDYT1), the Thr95Ile variant has a detectable effect even in the heterozygous state. Also unlike HMNDYT1-causing variants, Thr95Ile results in a protein that is properly trafficked to the plasma membrane when expressed in HeLa cells. These results suggest that coding variation in SLC30A10 impacts liver health in more individuals than the small population of HMNDYT1 patients.


Introduction
Liver disease remains an area of high unmet medical need, and better characterizing the environmental and genetic determinants of liver disease is key to developing new therapeutic strategies. In addition, liver injury is a common side effect of drugs, and is a frequent reason that drugs fail to progress through the development pipeline; understanding the molecular mechanisms of liver injury can aid in rational drug design to avoid off-target effects. Circulating liver enzymes are sensitive biomarkers of liver injury; in particular, alanine aminotransferase (ALT) and aspartate aminotransferase (AST) are released into the circulation during damage to hepatocyte membranes 1,2 . One powerful approach for understanding the molecular basis of liver disease has been to perform genome-wide association studies (GWAS) of levels of circulating liver enzymes across large population samples 1, [3][4][5][6][7][8][9][10][11][12][13] . Combined GWAS of ALT and AST have previously revealed genetic associations providing potential therapeutic targets for liver disease such as PNPLA3 14 and HSD17B13 15 . To further study the genetics of hepatocellular damage, we performed GWAS on circulating levels of ALT and AST in 408,300 subjects, meta-analyzed across four ethnic groups in the UK Biobank.

GWAS summary
We performed a GWAS of ALT and AST in four sub-populations in the UK Biobank (Asian or  Figures 1 and 2). After meta-analyzing across sub-populations to obtain a single set of genome-wide p-values for each enzyme (Manhattan plots, Figure 1), we found 244 and 277 independent loci associating at p < 5 x 10 -8 with ALT and AST, respectively, defined by lead SNPs separated by at least 500 kilobases and pairwise linkage disequilibrium (LD) r 2 less than 0.2.
Enzyme levels were strongly associated with coding variants in the genes encoding the enzymes, representing strong protein quantitative trait loci in cis (cis-pQTLs). For example rs147998249, a missense variant Val452Leu in GPT (glutamic-pyruvic transaminase) encoding ALT, strongly associates with ALT (p < 10 -300 ) and rs11076256, a missense variant Gly188Ser in GOT2 (glutamic-oxaloacetic transaminase 2) encoding the mitochondrial isoform of AST, strongly associates with AST (p = 6.3 x 10 -62 ). While these strong cis-pQTL effects validated our ability to detect direct genetic influences on ALT and AST levels, the aim of this study was to detect genetic determinants of liver health that in turn had downstream effects on both ALT and AST due to hepatocellular damage; therefore we focused the remainder of our analyses on the variants only associated with levels of both enzymes (labeled with black text on Figure 1). AST  for shared signals between the two GWAS; for clarity, the shared signals are marked only once, on the plot for the GWAS in which the more significant association is detected. Cis-pQTLs (at GPT and GOT2) are labeled in blue. Loci with shared signals are labeled (for clarity, only when p < 10-25 and only on the GWAS for which the association is most significant). Loci previously reported to associate with both ALT and AST are named in bold. SLC30A10, the main topic of this report, is labeled in red on both plots.
Focusing only on loci with both ALT and AST GWAS signals (lead SNPs from either GWAS were identical or shared proxies with r 2 ³ 0.8), we found a total of 100 independent loci associated with both enzymes (Figure 2, Supplementary  with an asterisk, loci are named by the closest protein-coding gene. "Coding" indicates that one of the variants linked to the lead SNP is predicted to have a moderate or high impact on a protein-coding gene. "Liver eQTL" and "Muscle or kidney eQTL" indicate that one of the variants linked to the lead SNP is the strongest eQTL for a gene in those tissues by GTEx. The strongest estimated effect of the lead SNPs on either enzyme was a novel association: and a 1.3% increase in AST (95% CI, 1.3% to 1.4%; p < 10 -300 ).

Association of ALT-and AST-associated variants with liver disease risk
We tested the 100 lead SNPs from the ALT and AST GWAS analysis for association with a combined liver disease phenotype (ICD10 codes K70-77, I85, and C22; 14,648 cases and 487,968 controls across the entire biobank), meta-analyzing liver disease association results across all four sub-populations (Supplementary Table 3). Of the 100 lead SNPs, 26 SNPs associate with liver disease with p < 0.05. As expected, variants associated with an increase in ALT and AST tend to be associated with a proportional increase in liver disease risk (across all lead SNPs, effect size r = 0.73 for ALT, r = 0.71 for AST; Figure 3). Consistent with SLC30A10 Thr95Ile (rs188273166) having the strongest effect on ALT and AST of all lead SNPs, it also has the strongest estimated effect on liver disease risk (OR = 1.70; 95% CI, 1.24 to 2.34; p = 7.4 x 10 -3 ). Because SLC30A10 Thr95Ile had the strongest effect of all of our lead SNPs and also has not been reported as being associated with any phenotypes in the literature, we centered the following analyses on better understanding its function.

Validation of SLC30A10 Thr95Ile genotype and replication of ALT and AST associations
Because rare variants are especially prone to errors in array genotyping 25 Table 5). Meta-analyzing the association results across these two outgroups revealed higher mean ALT and AST in the small sample of additional Thr95Ile carriers, although the effect was not statistically significant for ALT (ALT increase = 3.7%; 95% CI, -1.1% to 8.4%, p = 0.084; AST increase = 4.1%; 95% CI, 1.1% to 7.2%; p = 1.7 x 10 -3 ) (Supplementary Figure   3).

Magnitude of ALT and AST elevation in SLC30A10 Thr95Ile carriers
After establishing the association between SLC30A10 Thr95Ile and ALT and AST, we sought to further explore the relationship between genotype and enzyme levels to understand clinical relevance. Carriers of Thr95Ile had a mean ALT of 27.37 U/L vs 23.54 U/L for noncarriers, and a mean AST of 28.85 U/L vs 26.22 U/L for noncarriers. Counting individuals with both ALT and AST elevated above 40 U/L, a commonly-used value for the upper limit of normal (ULN) 2 , 5.6% of carriers vs 3.6% of noncarriers had both enzymes elevated at the time of their UK Biobank sample collection, a relative increased risk of 58% (Supplementary Table 6).

Independence of SLC30A10 Thr95Ile from neighboring ALT and AST associations
Because we applied distance and LD pruning to the results of the genome-wide scan, it was unclear how many independent association signals existed at the SLC30A10 locus. Revisiting trans-ethnic association results in a window including 1 Mb flanking sequence upstream and downstream of  To test for independence between these three loci, we performed ALT and AST association tests for each of the three array-typed SNPs while including the genotype of either one or both of the others as covariates. Associations were similar in these conditional analyses, suggesting that each of these three associations are not confounded by linkage disequilibrium with the other regional association signals (Supplementary Table 7.) Therefore SLC30A10 Thr195Ile has a novel independent association with ALT and AST levels.

Linkage of Thr95Ile to GWAS SNPs at SLC30A10
A GWAS of circulating toxic metals 28 discovered an association between a common intronic SNP in SLC30A10 (rs1776029; MAF in White British, 19.5%) and blood manganese levels, where the reference allele -which is the minor allele -is associated with increased circulating manganese.
We calculated linkage disequilibrium statistics between rs1776029 and Thr95Ile and found that the minor allele of Thr95Ile (A) was in almost perfect linkage with the minor allele of rs1776029  Table 8); however genotypes of Thr95Ile in the manganese GWAS or manganese measurements in the UK Biobank would be needed in order to perform conditional analysis. We then systematically tested nearby SNPs reported in the GWAS Catalog for any phenotype for linkage to Thr95Ile, measured by high |D'|. Combining GWAS Catalog information and |D'| calculations, we find nearly perfect linkage (|D'| > 0.90) between rs188273166-A (rare missense Thr95Ile) with rs1776029-A (intronic), rs2275707-C (3'UTR), and rs884127-G (intronic), all within the gene body of SLC30A10 (Supplementary Table 9). In addition to increased blood Mn 28 , these three common alleles have been associated with decreased magnesium/calcium ratio in urine 29 , decreased mean corpuscular hemoglobin (MCH) 30-32 , increased red blood cell distribution width [30][31][32] , and increased heel bone mineral density (BMD) [32][33][34][35] .
A recent study not yet in the GWAS catalog reported an association between another common intronic SNP in SLC30A10 (rs759359281; MAF in White British, 5.6%) and liver MRI-derived cT1 measures, a proxy for liver fibrosis and steatohepatitis 36 . However, the reported cT1increasing allele (liver disease risk allele) of rs759359281, which is the minor allele, is in complete linkage (D' = 1) with the major allele of Thr95Ile (rs188273166); in other words, the cT1increasing allele and Thr95Ile liver disease risk allele occur on different haplotypes, suggesting that the mechanism of this reported cT1 association is independent of Thr95Ile.

Phenome-wide associations of SLC30A10 Thr95Ile
To explore other phenotypes associated with SLC30A10 Thr95Ile, we tested for association with 135 quantitative traits and 4,533 ICD10 diagnosis codes within the White British population    To test whether the association with extrahepatic bile duct cancer was driven by a nearby association, we performed an association study of the diagnosis with variants within a window including 1 Mb flanking sequence upstream and downstream of SLC30A10; Thr95Ile was the strongest association (Supplementary Figure 4).

Bioinformatic characterization of SLC30A10 Thr95Ile
To understand potential functional mechanisms of the Thr95Ile variant, we examined bioinformatic annotations of SLC30A10 Thr85Ile. The UNIPROT database shows that Thr95Ile occurs in the third of six transmembrane domains and shares a domain with a variant known to cause HMNDYT1 (Figure 5). In silico analysis predicts that Thr95Ile is a damaging mutation; it has a CADD PHRED score of 23.9, placing it in the top 1% of deleteriousness scores for genome wide potential variants; the algorithm SIFT predicts it as deleterious; and the algorithm PolyPhen-2 gives it a HumDiv score of 0.996 (probably damaging) and a HumVar score of 0.900 (possibly damaging.) Cross-species protein sequence alignment in PolyPhen-2 shows only threonine or serine at position 95 across animals. These properties suggest that Thr95Ile substitution ought to have an effect the function of the SLC30A10 protein.

Characterization of SLC30A10 variants in vitro
To test the protein localization of SLC30A10 harboring Thr95Ile as well as other variants, we created constructs with Thr95Ile (rs188273166) and the HMNDYT1-causing variants Leu89Pro (rs281860284) and del105-107 and transfected these constructs into HeLa cells.
Immunofluorescence staining revealed membrane localization for wild-type (WT) SLC30A10 which was abolished by the two HMNDYT1 variants, consistent with previous reports which showed that the HMNDYT1 variants proteins are mislocalized in the endoplasmic reticulum (ER) 38 . In contrast, Thr95Ile showed membrane localization similar to WT, suggesting that Thr95Ile does not cause a deficit in protein trafficking to the membrane (Figure 6). Thr95Ile may affect SLC30A10 function in other ways, for example altering Mn transport efficiency. Among the novel loci are many that had been previously identified as risk loci for liver disease, but had never been explicitly associated through ALT or AST GWAS, such as SERPINA1

Properties of SLC30A10 Thr95Ile
The variant with the strongest predicted effect on ALT and AST, SLC30A10 Thr95Ile  Table 5). The increased frequency we see in European-ancestry populations is not merely due to those populations' overrepresentation in the UK Biobank, but is also consistent with global allele frequency data catalogued in dbSNP 52 .
The Thr95Ile variant occurs in the third of six transmembrane domains of the SLC30A10 protein 53 , the same domain affected by a previously-reported loss-of-function variant causing HMNDY1 (hypermanganesemia with dystonia 1), Leu89Pro (rs281860284) 38 (Figure 6). In vitro, Leu89Pro abolishes trafficking of SLC30A10 to the membrane 38 , and another study pointed to a functional role of polar or charged residues in the transmembrane domains of SLC30A10 for manganese transport function 54 . Bioinformatic analysis suggest that Thr95Ile should impact protein function.
Our site-directed mutagenesis experiment of SLC30A10 shows that Thr95Ile, unlike reported HMNDYT1-causing variants, results in a protein that is properly trafficked to the cell membrane.
Further biochemical studies will be required to investigate whether the Thr95Ile variant of SLC30A10 has reduced manganese efflux activity, which would be consistent with our genetic observations.

Comparison of SLC30A10 Thr95Ile phenotypes to HMNDYT1 phenotypes
SLC30A10 (also known as ZNT10, and initially identified through sequence homology to zinc transporters 16 ) encodes a cation diffusion facilitator expressed in hepatocytes, the bile duct epithelium, enterocytes, and neurons 22 that is essential for excretion of manganese from the liver into the bile and intestine 17,22 . Homozygous loss-of-function of SLC30A10 was recently identified as the cause of the rare disease HMNDYT1, which in addition to hypermanganesemia and dystonia is characterized by liver cirrhosis, polycythemia, and Mn deposition in the brain 19

Comparison of Thr95Ile phenotypes to SLC30A10 common variant phenotypes
Apart from rare variants in SLC30A10 causing HMNDYT1, Thr95Ile can also be compared to common SNPs in SLC30A10 that have been associated with phenotypes by GWAS. We find that the minor allele of Thr95Ile is in almost complete linkage with a common intronic SNP associated with increased blood manganese. Other GWAS SNPs in almost perfect linkage with Thr95Ile associate with decreased MCH, increased RBC distribution width, decreased magnesium/calcium ratio, and increased heel bone mineral density (BMD). Decreased MCH could reflect the anemia experienced by HMNDYT1 patients, caused by the closely linked homeostatic regulation of manganese and iron 17 . Increased BMD may reflect the protective role of manganese in bone maintenance 60,61 . Looking for the subset of these phenotypes available in our scan of Thr95Ile, we do find a nominally significant increase in BMD and a suggestive increase in MCH but no detectable increase in erythrocyte distribution width. By contrast, we find that the a common intronic SNP in SLC30A10 recently reported to associate with liver MRI cT1, a sign of steatohepatitis and fibrosis 36 , is in complete linkage with the major allele of Thr95Ile, suggesting an independent genetic mechanism but also providing independent evidence of the role of

Clinical relevance: genome interpretation
Currently, SLC30A10 Thr95Ile (rs188273166) is listed as a variant of uncertain significance in the ClinVar database 77 . Individuals carrying SLC30A10 Thr95Ile may benefit from increased surveillance of liver function and blood manganese levels and subsequent treatment if these conditions arise, because evidence from HMNDYT1 patients has demonstrated that chelation therapy combined with iron supplementation is effective at reversing the symptoms of SLC30A10 insufficiency 78 . Further studies will be needed to define whether other damaging missense variants or protein-truncating variants in SLC30A10, including the variants known to cause HMNDYT1, also predispose to liver disease in their heterozygous state. Because we only observe one homozygous carrier of SLC30A10 Thr95Ile in our data (who has no diagnosed liver or neurological disease, and unfortunately does not have ALT or AST measurements), further study will also be needed to understand the inheritance model of this association; we cannot determine whether risk in homozygotes is stronger than risk in heterozygotes, unlike cases of HMNDYT1 where identified cases have all been homozygous loss-of-function.
More broadly, the case of SLC30A10 fits a pattern of discoveries of recent discoveries showing that recessive Mendelian disease symptoms can manifest in heterozygous carriers of deleterious variants, blurring the distinction between recessive and dominant disease genes and bridging the gap between common and rare disease genetics 79,80 . These discoveries are possible only by combining massive, biobank-scale genotype and phenotype datasets such as the UK Biobank.

Sub-population definition and PC calculation
Sub-populations for analysis were obtained through a combination of self-reported ethnicity and genetic principal components. First, the White British population was defined using the categorization performed previously by the UK Biobank 81 (Field 22006 value "Caucasian"); briefly, this analysis selected the individuals who identify as White British (Field 21000), performed a series of subject-level QC steps (to remove subjects with high heterozygosity or missing rate over 5%, removing subjects with genetic and self-reported sex discrepancies and putative sex chromosome aneuploidies, and removing subjects with second or first degree relatives and an excess of third degree relatives), performed Bayesian outlier detection using the R package aberrant 82 to remove ancestry outliers using principal components (PCs) 1+2, 3+4, and 5+6 (calculated from the global UK Biobank PCs stored in Field 22009), selected a subset of SNPs in preparation for PCA by limiting to directly-genotyped (info = 1), missingness across individuals < 2%, MAF > 1%, regions of known long range LD, and pruning to independent markers with pairwise LD < 0.1. Based on this procedure used by the UK Biobank to define the "White British" subset, we defined three additional populations, using other self-reported ancestry groups as starting points (Field 21000 values "Asian or Asian British", "Black or Black British", and "Chinese"). Principal components were estimated in PLINK using the unrelated subjects in each subgroup. We then projected all subjects onto the PCs. For calculation of per-variant allele frequency and missingness thresholds, calculation of LD, and for association analyses performed in PLINK, the unrelated sets were used. For association analyses performed in SAIGE, the related sets were used.
For validation in an independent population, two other self-reported ethnicity groups with a sufficient number of SLC30A10 Thr95Ile carriers were assembled, who were not included in "White British" (Field 21000 values "White" subgroup "Irish", and "White" subgroup "Any other white background" or no reported subgroup).

Array genotype data for association analysis
Data were obtained from the UK Biobank through application 26041. Genotypes were obtained through array typing and imputation as described previously 81 . For genome-wide association analysis, variants were filtered so that imputation quality score (INFO) was greater than 0.8.
Genotype missingness, Hardy-Weinberg equilibrium (HWE), and minor allele frequency (MAF) were then each calculated across the unrelated subset of individuals in each of the four subpopulations. For each sub-population a set of variants for GWAS was then defined by filtering missingness across individuals less than 2%, HWE p-value > 10 -12 , and MAF > 0.1%.

Phenotype data
For genome-wide analysis, blood biochemistry values were obtained for ALT (Field 30620) and AST (Field 30650) and log10 transformed, consistent with previous genetic studies 3,83 , resulting in an approximately normal distribution.
For phenome-wide analysis, ICD10 codes were obtained from inpatient hospital diagnoses (Field 41270), causes of death (Field 40001 and 40002), the cancer registry (Field 40006), and GP clinical event records (Field 42040). A selection of 135 quantitative traits was obtained from other fields, encompassing anthropomorphic measurements, blood and urine biochemistry, smoking, exercise, sleep behavior, and liver MRI; all were inverse rank normalized using the RNOmni R package 84 .

Genome-wide association studies of ALT and AST
Because of the high level of relatedness in the UK Biobank participants 85 , to maximize power by retaining related individuals we used SAIGE software package 86 to perform generalized mixed model analysis for GWAS. A genetic relatedness matrix (GRM) was calculated for each subpopulation with a set of 100,000 LD-pruned variants selected from across the allele frequency spectrum. SAIGE was run on the filtered imputed variant set in each sub-population using the following covariates: age at recruitment, sex, BMI, and the first 12 principal components of genetic ancestry (learned within each sub-population as described above). Manhattan plots and Q-Q plots were created using the qqman R package 87 . The association results for each enzyme were metaanalyzed across the four populations using the METAL software package 88 using the default approach (using p-value and direction of effect weighted according to sample size) To report pvalue results, the default approach was used. To report effect sizes and standard errors, because the authors of the SAIGE method advise that parameter estimation may be poor especially for rare variants 89 , the PLINK software package v1.90 90 was run on lead variants on the unrelated subsets of each subpopulation, and then the classical approach (using effect size estimates and standard errors) was used in METAL to meta-analyze the resulting betas and standard errors.

Identifying independent, colocalized association signals between the two GWAS
Meta-analysis results for each enzyme were LD clumped using the PLINK software package, v1.90 90 with an r 2 threshold of 0.2 and a distance limit of 10 megabases, to group the results into approximately independent signals. LD calculations were made using genotypes of the White British sub-population because of their predominance in the overall sample. Lead SNPs (the SNPs with the most significant p-values) from these "r 2 > 0.2 LD blocks" were then searched for proxies using an r 2 threshold of 0.8 and a distance limit of 250 kilobases, resulting in "r 2 > 0.8 LD blocks" defining potentially causal SNPs at each locus. The "r 2 > 0.8 LD blocks" for the ALT results were then compared to the "r 2 > 0.8 LD blocks" for the AST results, and any cases where these blocks shared at least one SNP between the two GWAS were treated as potentially colocalized association signals between the two GWAS. In these cases, a representative index SNP was chosen to represent the results of both GWAS by choosing the index SNP of the GWAS with the more significant p-value. Next, these putative colocalized association signals were then distance pruned by iteratively removing neighboring index SNPs within 500 kilobases of each index SNP with less significant p-values (the minimum p-value between the two GWAS was used for the distance pruning procedure.) The Manhattan plot of METAL results with labeled colocalization signals was created using the CMplot R package 91 .

Annotation of associated loci and variants
Index SNPs and their corresponding strongly-linked (r 2 > 0.8) variants were annotated using the These pairs of association tests were then meta-analyzed in METAL using the default approach (using p-values) between the with-GP and without-GP subsets. Then, association results were meta-analyzed across the four sub-populations using the same method to obtain the final p-value.
To obtain effect sizes and standard errors, the same procedure was performed but using PLINK (on the unrelated subset of each population) and using the classical method in METAL (using effects and standard errors.)

Sequencing-based validation of rs188273166 array genotyping
Whole exome sequencing was available for 301,473 of the 487,327 array-genotyped samples.
DNA was extracted from whole blood and was prepared and sequenced by the Regeneron Genetics Center (RGC). A complete protocol has been described elsewhere 98 . Briefly, the xGen exome capture was used and reads were sequenced using the Illumina NovaSeq 6000 platform. Reads were aligned to the GRCh38 reference genome using BWA-mem 99 . Duplicate reads were identified and excluded using the Picard MarkDuplicates tool (Broad Institute) 100 . Variant calling of SNVs and indels was done using the WeCall variant caller (Genomics Plc.) 101 to produce a GVCF for each subject. GVCFs were combined to using the GLnexus joint calling tool 102 . Postvariant calling filtering was applied as described previously 98 .

Replication of SLC30A10 Thr95Ile associations
ALT and AST association tests were repeated as described for the genome-wide scans, using SAIGE and PLINK, in the "Other White" and "White Irish" populations, for the SLC30A10 Thr95Ile (rs188273166) variant. Results were meta-analyzed across the two populations. A forest plot was created using the forestplot package in R 103 .

Testing linkage of SLC30A10 Thr95Ile to common GWAS SNPs
To test linkage of SLC30A10 Thr95Ile (rs188273166) to common GWAS SNPs, the GWAS Catalog was searched for all results where "Mapped Gene" was assigned to SLC30A10; because of the very relevant phenotype, blood Mn-associated SNP rs1776029, an association that is not in the GWAS Catalog, was also included in the analysis, as well at cT1-associated SNP rs759359281.
LD calculations were performed in PLINK, using the White British unrelated subpopulation, between rs188273166 and the GWAS SNPs with the options --r2 dprime-signed in-phase withfreqs --ld-window 1000000 --ld-window-r2 0. For rs1776029, an additional Fisher's exact test was performed to determine the confidence interval of the enrichment of rs188273166 on the rs1776029 haplotype. The linked alleles from PLINK were then used in conjunction with the effect allele from the reported papers to determine the direction of effect. The GWAS Atlas website 32 was used (the PheWAS tool) to determine the direction of effect for the linked alleles from the original paper; in cases where the original paper from the GWAS Catalog did not report a direction of effect, other papers for the same phenotype and SNP from GWAS Atlas were used to determine the direction of effect and cited accordingly. Reference epigenome information for the GWAS SNPs was obtained by searching for rs1776029 in HaploReg v4.1 104 .

Phenome-wide association study of SLC30A10 Thr95Ile
A phenome-wide association study of SLC30A10 Thr95Ile (rs188273166) was performed by running SAIGE and PLINK against a set of ICD10 diagnoses and quantitative traits, obtained as described above, and meta-analyzed using METAL as described above for the test of association with liver disease. ICD10 diagnoses were filtered to include only those at a three-character (category), four-character (category plus one additional numeral), or "block" level that were frequent enough to test in both subpopulations and without significant collinearity with the sex covariate: at least 100 diagnoses overall, at least 10 diagnoses in both the with-and without-GP data subgroups, and at least 10 diagnoses in both men and women. This resulted in 4,533 ICD10 codes to test, serving as the multiple hypothesis burden. For 5 of these ICD10 code phenotypes, the first step of SAIGE (fitting the null logistic mixed model to estimate the variance component and other model parameters) failed before the association test; for these phenotypes, only PLINK was run.

Bioinformatic analysis of SLC30A10 Thr95Ile
To visualize Thr95Ile on the protein sequence of SLC30A10, UNIPROT entry Q6XR72 (ZN10_HUMAN) was accessed 53 . In UNIPROT, natural variants causing HMNDYT1 22,24,38 and mutagenesis results 38,54,105 were collated from the literature and highlighted. CADD score v1.5 106 was downloaded from the authors' website. SIFT score was obtained from the authors' website using the "dbSNP rsIDs (SIFT4G predictions)" tool 107 . PolyPhen score and multiple species alignment was obtained from the authors' website using the PolyPhen-2 tool 108 .