Tissue-Specific Alteration of Metabolic Pathways Influences Glycemic Regulation

Metabolic dysregulation in multiple tissues alters glucose homeostasis and influences risk for type 2 diabetes (T2D). To identify pathways and tissues influencing T2D-relevant glycemic traits (fasting glucose [FG], fasting insulin [FI], two-hour glucose [2hGlu] and glycated hemoglobin [HbA1c]), we investigated associations of exome-array variants in up to 144,060 individuals without diabetes of multiple ancestries. Single-variant analyses identified novel associations at 21 coding variants in 18 novel loci, whilst gene-based tests revealed signals at two genes, TF (HbA1c) and G6PC (FG, FI). Pathway and tissue enrichment analyses of trait-associated transcripts confirmed the importance of liver and kidney for FI and pancreatic islets for FG regulation, implicated adipose tissue in FI and the gut in 2hGlu, and suggested a role for the non-endocrine pancreas in glucose homeostasis. Functional studies demonstrated that a novel FG/FI association at the liver-enriched G6PC transcript was driven by multiple rare loss-of-function variants. The FG/HbA1c-associated, islet-specific G6PC2 transcript also contained multiple rare functional variants, including two alleles within the same codon with divergent effects on glucose levels. Our findings highlight the value of integrating genomic and functional data to maximize biological inference. Highlights 23 novel coding variant associations (single-point and gene-based) for glycemic traits 51 effector transcripts highlighted different pathway/tissue signatures for each trait The exocrine pancreas and gut influence fasting and 2h glucose, respectively Multiple variants in liver-enriched G6PC and islet-specific G6PC2 influence glycemia


28
It has long been recognized that rare and penetrant disease-causing mutations can pinpoint key proteins 29 and pathways involved in human metabolism (Froguel et al., 1992;Gloyn et al., 2004;Montague et al., 30 1997). Type 2 diabetes (T2D) results from an inability of the pancreatic islet beta cells to produce and 31 secrete sufficient insulin, compounded by the failure of metabolic tissues to respond to insulin and store 32 glucose appropriately. Blood glucose levels are regulated by the co-ordination of homeostatic pathways 33 operating across multiple tissues that control metabolism, therefore a clearer understanding of their 34 relative roles is critical in guiding efforts to modulate them pharmacologically to treat T2D and pre- These measures can influence the risk of developing pathophysiological conditions such as T2D and 40 cardiovascular disease. However, as in all genome-wide association studies (GWAS), it has proven 41 challenging to translate the associated genetic signals into biological pathways, as the vast majority of 42 association signals lie within non-coding regions, and connecting them to their respective effector genes 43 is less straightforward. There are to date over 97 loci reported to be associated with glycemic traits, 44 across different genetic approaches (Wheeler et al., 2017b). One approach to facilitate identification of 45 likely causal variants and transcripts is to focus on coding variation, whose effects on protein sequence 46 can be predicted and functionally tested, facilitating identification of likely causal genes and the ensuing 47 114 To increase power to detect rare variant associations, we additionally performed gene-burden and 115 sequence kernel association (SKAT) tests for gene-level analyses (Methods). We identified six genes with 116 significant evidence of association (P<2.5 × 10 -6 ), of which two -G6PC (for FG and FI) and TF (for HbA1c) 117 -represented novel associations (Tables 2 and S4). To establish whether the associated coding variants (both novel and those at established loci) were 121 likely to be causal, and/or likely to pinpoint an effector transcript, we first integrated these results with 122 published data with higher density GWAS coverage (Manning et al., 2012;Wheeler et al., 2017a). This is 123 important because coding variants can sometimes erroneously point to the wrong effector transcript, as 124 they can "piggy-back" on non-coding alleles that drive the association, and by virtue of having a 125 predicted effect on protein sequence they may falsely implicate the gene in which they reside as the 126 causal one. For example, the coding variant rs56200889 (p.Q802E) at ARAP1 is strongly associated with 127 FG (=-0.016, P=1.8 × 10 -14 , Table 1), and when considered in isolation might have suggested ARAP1 as 128 the effector transcript. However, T2D fine-mapping efforts showed this association to be secondary to a 129 much stronger non-coding signal (Mahajan et al., 2018b), and recent data integrating human islet and 130 mouse knockout information has established neighbouring gene STARD10 as the most likely gene 131 mediating the GWAS signal at this locus (Carrat et al., 2017). Therefore, we conditioned the coding 132 variants identified here on existing non-coding GWAS index variants at established loci from two 133 previously published GWAS datasets (Manning et (Table S5), as well as a body of literature establishing the 141 role of certain genes (mapping within our loci) in glucose metabolism, or red blood cell biology (for 142 HbA1c) to inform effector transcript classification. We further considered significant gene-based 143 associations driven by multiple coding variants within a single gene as strong evidence for the 144 determination of effector transcripts (Methods). 145 146 Combining the above approaches, we curated the 74 coding variant associations (in 58 genes) displayed 147 in Table 1, and where possible identified and classified effector transcripts into "gold", "silver" and 148 "bronze" categories, depending on the strength of evidence (Table S6, Methods). Loci with strong 149 evidence from reciprocal conditional analysis or from published data that supported the relevance of 150 the identified effector transcript to the glycemic trait were labelled "gold" (e.g. GLP1R, SLC30A8, G6PD, 151 PPARG, ANK1); those where an effector transcript could not be defined by conditional analysis (either 6 because it was inconclusive or due to lack of data) but where there was strong biological plausibility for 153 a given gene at the locus were labelled "silver" (e.g. MADD, MLXIPL, FN3K/FN3KRP, HK1, VPS13C); those 154 where we had some evidence but that was not as strong as "silver" were labelled "bronze" (e.g. DCAF12, 155 OBSL1, STEAP2, RAPGEF3); the remaining were left with an undetermined effector transcript (Figure 1, 156 Table S6). Effector transcript classification into the three categories was undertaken independently by 157 four of the authors and the consensus was used as the final classification for effector transcripts. From 158 74 single variant and six gene-based signals, we identified 51 unique effector transcripts (24 gold, 11 159 silver, 16 bronze), with many of them shared across traits (Figure 1). One case in point pertains to 160 VPS13C, which harboured a missense variant (p.R974K) associated with 2hGlu (labelled "bronze") at 161 exome-wide significance (=-0.069, P=6.4 × 10 -10 ; Table 1), and also exhibited a significant gene-based 162 association with FG (labelled "silver"; P SKAT =3.7 × 10 -7 ; Table S4 To identify pathways enriched for glycemic trait associations, and to subsequently determine the extent 172 to which associations within the same trait implicate the same or similar pathways (as indicated by the 173 functional connectivity of the network), we used GeneMANIA network analysis (Franz et al., 2018). 174 GeneMANIA takes a query list of genes and finds functionally-similar genes based on large, publicly-175 available biological datasets. We analysed all loci harbouring non-synonymous variants that reached P<1 176 × 10 -5 for any of the four glycemic traits in our study (totaling 121 associations). A high degree of 177 connectivity was observed within the HbA1c network, with enrichment of processes related to blood cell 178 biology such as porphyrin metabolism, erythrocyte homeostasis and iron transport (Figures 2 and S1, 179 Table S7). In comparison, the network generated from FG-associated genes captured several processes 180 known to contribute to glucose regulation and islet function, including insulin secretion, zinc transport 181 and fatty acid metabolism (Figure 2, Table S7). The FG network further revealed linking nodes (that are 182 not among the association signals) with known links to glucose homeostasis and diabetes, such as GCK 183 (encoding the beta cell glucose sensor glucokinase), GCG (encoding the peptide hormone glucagon 7 secreted by the alpha cells of the pancreas) and GIP (encoding the incretin hormone gastric inhibitory 185 polypeptide). One gene within the FG cluster for lipid-related pathways is CERS2, which encodes 186 ceramide synthase 2, an enzyme known to be associated with the sphingolipid biosynthetic process 187 (Figure 2, Table S7). Although CERS2 is only nominally associated with FG and is significantly associated 188 with HbA1c, it does not cluster together with any HbA1c-enriched pathway, suggesting that CERS2 is 189 regulating FG and HbA1c indirectly through its role in lipid metabolism. Given that there were fewer 190 genes associated with FI and 2hGlu, we were less powered to draw meaningful insights from the 191 enriched pathways in those traits ( Figure S1, Table S7). based on large-scale co-expression data (Fehrmann et al., 2015;Pers et al., 2015). These gene sets take 196 the form of z-scores, where higher z-scores indicate a stronger prediction that a given gene is a member 197 of a gene set. To reduce some of the redundancy in the gene sets (many of which are strongly correlated 198 with one another), we clustered them into 1,396 "meta-gene sets" using affinity propagation clustering 199 (Frey and Dueck, 2007). These meta-gene sets are used to simplify visualizations and aid interpretation 200 of results. Here, we combined and analyzed all variants that reached P<1 × 10 -5 for any of the four 201 glycemic traits (Methods). We found 234 significant gene sets in 86 meta-gene sets with false discovery 202 rate (FDR) of <0.05 (Table S8). As expected, we observed a strong enrichment of insulin-and glucose-203 related gene sets, as well as exocytosis biology (in keeping with insulin vesicle release). In agreement 204 with the GeneMANIA network analyses, we also noted a strong enrichment for blood-related pathways, 205 which was primarily driven by HbA1c-associated variants. This was likely because HbA1c levels are 206 influenced not only by glycation but also by blood cell turnover rate (Cohen et al., 2008;Wheeler et al., 207 2017a). To disentangle blood cell turnover from effects due to glycation, we repeated the analysis 208 excluding variants that were significantly associated with HbA1c only and found 128 significant gene sets 209 in 53 meta-gene sets (FDR <0.05) ( Table S8). We also analyzed each of the four traits separately (Table  210 S8, Methods). 211 To identify additional candidate genes, we then performed heat map visualization with unsupervised 212 clustering of the membership predictions (z-scores) of trait-associated genes for each significant gene 213 set (Figures 2, S2 and S3). This strategy has previously been effective for gene prioritization for 214 downstream analyses (Marouli et al., 2017;Turcot et al., 2018), as it becomes visually apparent which 215 9 secretion, we investigated whether this effect could be mediated by incretin levels. However, we found 248 no associations at rs147238447 for GLP-1 levels in the largest available dataset (fasting GLP-1, N=4170: 249 MAF=0.00457, P=0.495; 2h GLP-1, N=3839: MAF=0.00464, P=0.076) (Almgren et al., 2017), though this 250 might again be explained by limited power. Although additional validation of the rare coding variant 251 rs147238447 (p.L6V) as a potential causal variant is required given the absence of clear associations with 252 T2D risk and other glycemic traits, the results discussed above suggest a role of CTRB2 in glycemic 253 regulation. 254 We also noted a small but distinct cluster in the FG-only analysis indicating the role of the 255 cilium/axoneme, pointing to novel biology relating to sensing and signaling in response to the 256 extracellular environment ( Figure 2D). Two genes were the main drivers of this association: WDR78 and 257 AGBL2. These represent potentially interesting candidates for follow-up, although we note that the 258 AGBL2 signal may be driven through effects of the nearby MADD gene, which harbors a FG-associated 259 coding variant in our study and is labelled "silver" in our effector transcript classification (Tables 1 and  260 S6). Overall, our network and pathway analyses highlighted several trait-associated genes that do not 261 reach exome-wide significance in conventional single variant or gene-based tests, but show evidence of 262 contribution to glycemic regulation. In addition to identifying key metabolic pathways involved in glucose regulation, we sought to establish 267 the relative importance of particular tissues in the regulation of the different glycemic phenotypes. This 268 time, we assessed the tissues that are most highly enriched for the expression of the 51 effector 269 transcripts we have curated at the associated loci identified in this study, to highlight specific tissues 270 that contribute critically to the regulation of each glycemic trait. Using publicly-available tissue 271 expression data from GTEx (Battle et al., 2017) and human islets (van de Bunt et al., 2015), we noted 272 clear differences in tissue enrichment patterns as well as tissues shared between traits (Figure 3). 273 Comparisons between analyses of FG-and FI-associated effector transcripts underscored the relative 274 roles of the liver in both traits (P<0.05), whereas pancreatic islets were enriched in associations for FG 275 (P=9.99 × 10 -5 ) but not FI (P=0.75). In contrast, adipose (P=0.01) and kidney tissues (P=0.01) were 276 enriched in FI but not FG (P>0.05). These results not only highlight the established role of pancreatic 277 islets in influencing FG levels, but also the under-appreciated role of insulin clearance in the kidney and 278 likely the liver, in addition to insulin action in liver and adipose tissue, in influencing FI levels (Goodarzi 279 et al., 2011). Consistent with the EC-DEPICT GSEA, there was also support for the role of the exocrine 280 pancreas (which typically represents >95% of whole pancreas tissue) in addition to the endocrine 281 pancreas (islets) in FG (P=9.99 × 10 -5 ) and 2hGlu (P=2.99 × 10 -4 ) associations. We also observed 282 enrichment for genes expressed in stomach for 2hGlu (P=1.99 × 10 -4 ) but not for FG (P=0.16). HbA1c 283 analysis revealed enrichment in "metabolic" tissues reflecting insulin secretion (islets, P=1.59 × 10 -2 and 284 pancreas, P=0.01), insulin action (muscle, P=1.50 × 10 -2 ), insulin clearance (liver, P=0.03), as well as 285 strong enrichment for whole blood (P=3.99× 10 -3 ). These indicate key factors relating to hemoglobin 286 glycation and blood cell function in influencing overall HbA1c levels (Figure 3). To delve deeper into tissue-specific gene effects, we focused on two homologues, G6PC and G6PC2, 297 with constrasting tissue expression profiles where we identified gene-based association signals for FG/FI 298 and FG/HbA1c, respectively (Tables 2 and S4). Both genes encode gluconeogenic enzymes that catalyze 299 the same biochemical pathway but are known to have distinct tissue expression profiles. G6PC2 is 300 largely expressed in pancreatic islets whereas G6PC is highly expressed in the liver, kidney, and small 301 intestine (Foster et al., 1997;Mithieux, 1997). Our gene-based analyses highlighted G6PC through novel 302 associations with FG and FI, driven primarily by rare missense variants p.A204S (rs201961848) and 303 p.R83C (rs1801175), and protein-truncating variant (PTV) p.Q347X (rs80356487), none of which 304 achieved exome-wide significance at single-variant level (Table S4). Homozygous inactivating alleles in 305 G6PC, which include both p.R83C and p.Q347X, are known to give rise to glycogen storage disease type 306 Ia (GSD1a), a rare autosomal recessive metabolic disorder (Chou and Mansfield, 2008;Lei et al., 1995), 307 but this is the first time that rare coding variants in G6PC have been shown to influence FG and FI levels 308 in normoglycemic individuals. 309 310 Given the well-known role of G6PC in hepatic glucose homeostasis, we were interested in elucidating 311 the molecular impact of rare heterozygous G6PC coding variants highlighted in our exome-array 312 analysis, in particular novel variant p.A204S, one of the statistical drivers of the gene-based G6PC signal 313 (Table S4). In transient protein overexpression assays, p.R83C and p.A204S resulted in significantly 314 reduced protein levels compared to wild type (WT) G6PC in both Huh7 (human hepatoma) and HEK293 315 (human embryonic kidney) cell lines (Figure 4A-D). The PTV p.Q347X, which in our in vitro system 316 generated a smaller molecular weight protein, exhibited markedly lower protein expression levels in 317 Huh7 cells but not HEK293 cells. However, in both cell types, the cellular localization pattern of p.Q347X 318 appears to be largely diffuse and did not co-localize with the Golgi apparatus, which is important for 319 post-translational modification of G6PC protein (Figures 4E and S4A). Further functional characterization 320 of glucose-6-phosphatase (G6Pase) activity revealed that both p.R83C and p.Q347X variants lead to 321 proteins lacking any detectable phosphatase activity ( Figure S4B-C), consistent with previous 322 observations of several GSD1a-causing coding variants (Shieh et al., 2002). As we observed that the 323 p.R83C variant resulted in complete loss of glycosylation, we determined if glycosylation is essential for 324 G6Pase activity by treating cells with tunicamycin to inhibit N-linked glycosylation. The ability of 325 unglycosylated G6PC to catalyze glucose-6-phosphate (G6P) was found to be downregulated by up to 326 14%, although this difference was not statistically significant ( Figure 4F). We therefore concluded that 327 whilst glycosylation contributes to overall functional activity, it may not be a requisite for G6P 328 hydrolysis. Finally, we were unable to accurately assess p.A204S-G6PC phosphatase activity as the level 329 of expression in the microsomes was reduced by 41% relative to WT, supporting the hypothesis that 330 p.A204S-G6PC exhibits partial loss-of-function (LOF) most likely due to loss of protein expression. 331 332 Together, our functional studies support p.A204S, p.R83C, and p.Q347X as functional LOF variants due 333 to loss of G6Pase protein expression and/or activity. This results in a reduced potential to hydrolyze G6P 334 to glucose in gluconeogenic tissues (such as in the liver and kidney), thus directly reducing FG levels and 335 consequently lowering circulating FI levels in the plasma. Our data suggest that rare inactivating 336 mutations in G6PC (such as p. R83C Wessel et al., 2015). In this current study, 344 gene-based association signals for both FG and HbA1c were observed at the G6PC2 locus, primarily 345 driven by multiple coding variants (p.H177Y, p.Y207S, p.R283X, and p.S324P) ( Table S4). We aimed to 346 extend the investigation of coding variation in this gene, which is likely to harbor a series of functional 347 alleles, by characterizing the four G6PC2 coding variants above and six others, across the allelic 348 frequency spectrum (all with single-variant P<0.05 for FG or HbA1c in our analyses) (  As three variants (p.I171T, p.I171V, and p.F256L) appeared to be stably expressed and processed like WT 358 G6PC2 protein, we hypothesized that these alleles could be influencing glycemic levels through effects 359 on protein activity. As there is a high level of conservation between the catalytic domains in G6PC and 360 G6PC2, we adapted the G6Pase assay used earlier, to indirectly analyse the effect of the G6PC2 variants 361 on G6Pase enzymatic activity. We assumed that the G6PC2 alleles of interest, which mapped to the 362 conserved regions, will give rise to the same consequence in the G6PC backbone due to the strong 363 homology and preserved topology of both proteins. The adaptation was necessary as we were unable to 364 detect G6PC2 activity using the same experimental conditions. First, we generated variants that mapped 365 to equivalent sites within the G6PC protein (G6PC-p.L173T, p.L173V, and p.F258L correspond to G6PC2-366 p.I171T, p.I171V, and p.F256L, respectively), and then performed the enzymatic studies. Two alleles, 367 p.L173T, p.L173V, affected the same codon and were each genetically associated with FG levels but with 368 opposite directions of effect (Table S4). We found that G6PC-p.L173T exhibited ~20% decreased activity 369 compared to WT based on assessment of V max (maximal rate of reaction), a measure of enzymatic 370 activity ( Figure 5B). In contrast, G6PC-p.L173V had enhanced activity through both increased V max and 371 lowered K m (Michaelis constant, whereby a lower K m indicates higher substrate affinity) ( Figure 5B). 372 Importantly, our in vitro observations mirrored the genetic effects on FG (β I171T =-0.084 mmol/l; 373 13 β I171V =+0.131 mmol/l) and HbA1c levels (β I171T =-0.007%; β I171V =+0.093%) ( Table S4). The G6PC-p.F258L 374 variant also displayed impaired phosphatase activity due to reduced V max and a tendency towards higher 375 K m relative to WT (Figure 5C), consistent with the observed glucose-lowering effects of G6PC2-p.F256L. 376 To ensure that the observed effects of the rare variants on FG were not influenced by the common 377 G6PC2 variant rs560887, as was the case for a common variant V219L shown in an earlier study 378 (Mahajan et al., 2015) which we confirm here, conditional analyses were performed conditioning on 379 rs560887 (Table S9). Conditional results for p.I171T, p.I171V and p.F256L confirmed that the directions 380 of effect for the variants remain unchanged, making it unlikely that the regulatory variant rs560887 is 381 regulating these effects (Table S9). These results provided the first example of an activating allele in 382 G6PC2 (p.I171V) and highlighted the unique protein changes at a single codon that can give rise to a 383 corresponding loss or gain of functional activity. These data therefore show that variations in G6PC2 384 may influence FG levels through their impact on protein expression or activity. 385

386
To further characterize these variants, we set out to determine the effect of the G6PC2 LOF variants on 387 ER integrity, given that G6PC2 is an ER-resident protein and that beta cells, which are highly-specialized 388 secretory cells, are highly sensitive to ER stress. Specifically, we evaluated the expression of G6PC2 389 variant proteins on the canonical ER stress response (ERSE) and unfolded protein response (UPRE) 390 pathways. The three G6PC2 variants which displayed relatively severe effects on protein stability 391 (p.H177Y, p.Y207S, p.S324P) in our study were found to activate ERSE and UPRE reporter activities by 392 ~3-fold, in contrast to the variants p.I171T and p.F256L which exert their effects primarily on enzymatic 393 function ( Figure 5D). The common p.V219L variant, which reduces protein expression by approximately 394 50%, displayed an intermediary effect ( Figure 5D). These results suggest that G6PC2 variant proteins, 395 especially those that result in severe LOF due to protein instability, may also influence beta cell ER 396 homeostasis. 397

398
In previous studies, the G6PC2-p.R283X variant has shown inconsistencies in terms of their associations 399 with FG levels (Mahajan et al., 2015;Wessel et al., 2015). With a larger dataset we have now confirmed 400 that this variant influences both FG and HbA1c levels (Tables 1 and S3). As the nonsense p.283X allele is 401 located in the last exon of the gene and may evade NMD, we queried RNA sequencing data from human 402 islets and observed an allelic balance in heterozygous carriers, indicating that variant transcripts are 403 indeed likely to escape NMD and be translated ( Figure S6A). Based on our pipeline of in vitro assays, we 404 14 confirmed G6PC2-p.R283X loss-of-function due to reduced protein expression, failure to localize to the 405 Golgi network, and a high likelihood of complete loss of phosphatase activity (Figures 5A and S5D). establish whether this variant indeed influences G6PC2 regulation in human islets, we determined its 411 effect on G6PC2 isoform expression. We found that in human islets, the presence of the rs560887-G 412 allele is associated with increased expression of the full-length G6PC2 isoform as compared with the 413 shorter isoform lacking exon 4 ( Figure S6B). This observation supports the hypothesis that rs560887 may 414 alter splicing and is consistent with the association between rs560887-G and elevated FG and HbA1c 415 levels due to increased G6PC2 function. As the phenotypic consequence of rare coding variants can be 416 influenced by regulatory variants on the same haplotype, we therefore performed conditional analyses 417 to explore the relationship between rs560887 and the rare coding variants. We showed that the 418 direction of effects of all the rare alleles in our study remained the same after conditioning on rs560887, 419 though it is notable that the variants p.Y207S and p.R283X showed some reduction in strength of 420 association after conditioning (Table S9). 421 422 Functional assessment of G6PC2 variants improves gene-based association analysis 423 We next evaluated the utility of our functional data to enhance gene-based association analyses. We 424 showed that the gene-based signals were strengthened when the tests were informed by in vitro 425 functional validation of the variants (as determined in this study) as opposed to the predictive in silico 426 annotations based on the NSbroad and NSstrict masks (Table S9, Methods). In fact, in line with 427 expectation, flipping the alleles in the gain-of-function variant p.I171V (which we now know acts in the 428 opposite direction compared to other rare variants in the test), to align all alleles with the same 429 direction of effect, augmented the strength of association for both FG (from P=4.34 × 10 -71 to P=6.47 × 430 10 -78 )and HbA1c (P= 6.37 × 10 -30 to P=6.37 × 10 -33 ) in the gene burden test (Table S9). Improved methods 431 of filtering variants will enhance the performance of gene-based tests and increase the likelihood of 432 identifying true association signals, especially for those that are of borderline significance or that initially 433 fall below the significance threshold. 434 435 G6PC2 regulates basal insulin secretion in human beta cells 436 Although G6PC2 is known to be specifically enriched in pancreatic islet beta cells, its role in the 437 regulation of human beta cell function has not been shown. Using gene knockdown studies in the 438 human EndoC-βH1 beta cell line, we found that G6PC2-deficient cells exhibited significantly (but 439 modestly) increased insulin secretion at low glucose (1 mM) and a trend towards increased insulin 440 secretion at sub-maximal glucose (6 mM) levels (Figures 5F and S5E). When expressed as a fraction of 441 insulin content (Figure 5F), insulin secretion was significantly increased across multiple glucose 442 conditions, although this was primarily driven by reduced total insulin content in G6PC2-deficient cells 443 by ~15%. Overall, G6PC2 knockdown increases glucose responsiveness at sub-threshold levels of glucose 444 but not at maximal glucose concentration in EndoC-βH1 cells, suggesting enhancement of basal glucose 445 sensitivity by promoting glycolytic flux at sub-stimulatory glucose concentrations, and warranting more 446 in-depth characterization experiments. 447 16 Discussion 448 We have identified novel coding variant associations with FG, FI, 2hGlu and HbA1c, across the allele 449 frequency spectrum, and assigned these variants to their effector transcripts using available genetic and 450 biological evidence. We further pinpointed novel loci and effector transcripts that have now been 451 associated with T2D and other related metabolic traits since the time of our analysis. Our results 452 revealed that 15 out of 58 glycemic trait-associated loci have evidence of association with T2D risk 453 ( We used this work to explore the pathways and metabolic tissues through which the associated genes 460 influence variation in glycemic traits and highlighted those with key roles in glucose regulation and traits 461 that act through multiple metabolic tissues, including islets, liver, fat, and in addition, exocrine pancreas, 462 gut and kidney. Our GSEA enabled us to identify additional genes (e.g. CTRB2) within these tissues and 463 pathways which were below the threshold for statistical significance in our initial discovery effort and 464 that merit follow-up. We report an emerging role for the gut and exocrine pancreas for 2hGlu levels and and exocrine pancreas and liver, whilst FI is mediated by the insulin-sensitive tissues such as liver, 470 kidney, and adipose tissue, indicating the importance of both insulin action and insulin clearance 471 mechanisms. Genes expressed in muscle, also an insulin-sensitive tissue, were enriched in HbA1c-472 associated effector loci but not FI, though this could be due to differences in power between the two 473 analyses. We see evidence of multiple metabolic tissues being important for HbA1c regulation, and note 474 that the HbA1c-associated set of effector transcripts appear enriched for those that influence blood cell 475 biology. 476 We have also shown for the first time that genetic variation in G6PC, a gene implicated in GSD1a, 477 influences glycemic traits within the normal physiological range in heterozygote carriers. In vitro follow-478 up of the variants driving the gene-based association -p.A204S, p.R83C, and p.Q347X -confirmed that 479 these were indeed causal LOF variants at this locus that contribute to modulation of FG and FI levels. We 480 then reported novel rare coding variant associations for FG and HbA1c within a member of the same 481 gene family, G6PC2, and expanded the allelic spectrum of reported variants to include variants affecting 482 the same codon with both loss and gain of function alleles. Our comprehensive analysis of this locus 483 demonstrates multiple molecular mechanisms by which variants influence protein function, including 484 evidence from human islets that the common regulatory variant rs560887 influences G6PC2 isoform 485 expression, and that a rare PTV (p.R283X) evades NMD and results in a catalytically-null enzyme. Given 486 the possiblility that the effects of any coding variants in exon 4 which are carried in cis with the 487 rs560887-A allele could potentially be "diluted" due to the splicing effect, we checked whether the 488 observed rare variant effects could be driven by rs560887 in LD by repeating the single-variant 489 association tests with conditional analyses (Table S9) It has long been suspected that particular metabolic tissues are key to governing specific processes of 499 glucose metabolism. Using human genetics, our study has explored this within an unbiased approach 500 and has illustrated the impact of altered glycolysis in multiple metabolic tissues on various glycemic 501 phenotypes. Uniquely, our parallel studies of G6PC and G6PC2 highlighted two homologous proteins 502 that act through different tissues to influence glycemic traits. As G6PC is involved in hepatic glucose 503 production it influences both FG and FI levels. Previous studies have also established a potential role for 504 G6PC in influencing lipid and urate levels (Dewey et al., 2016;Sever et al., 2012). In contrast, due to its 505 restricted expression in the islet beta cell, variants in G6PC2 only influence FG and HbA1c due to a beta 506 cell-driven effect. There are also notable differences in the molecular mechanisms underlying protein 507 dysfunction: for G6PC variants the effect is primarily on enzymatic activity, whilst G6PC2 variants largely 508 cause protein instability. 509 18 510 A limitation of the present study is that we were not able to fine-map association signals, being 511 restricted to variants captured on the exome array, leaving many associated loci with unknown effector 512 transcripts. Additional large-scale studies, with higher density GWAS arrays and imputation to dense 513 reference panels, will be required for fine-mapping and further effector transcript identification. 514 515 In conclusion, we have combined human genetic discovery with pathway analysis and functional studies 516 to uncover tissue-specific effects in common pathways that influence glycemic traits. Our findings will 517 inform efforts to target these pathways therapeutically to modulate metabolic function. 518 Reactome pathways that were significantly enriched are depicted. To summarize these results, the most 532 significant term of all calculated terms within the same group is represented. Barplots with the 533 Bonferroni-adjusted -log10(p-values) of the most significant terms within each group are are shown. 534

Figure Legends
Each group was assigned a specific color; if a gene is present in more than one term, it is displayed in 535 more than one color. 536 is MAF 1-5%, and common is MAF >5%, (2) whether the gene has an Online Mendelian Inheritance in 547 Man (OMIM) annotation as causal for a diabetes/glycemic-relevant syndrome or blood disorder, (3) the 548 effector transcript classification for that variant: gold, silver, bronze, or NA (note that only array-wide 549 significant variants were classified, so suggestively-significant variants are by default classified as "NA"), 550 20 (4-7) whether each variant was significant (P<2 × 10 -7 ), suggestively significant (P<1 × 10 -5 ), or not 551 significant in Europeans for each of the four traits, and (8) Table S1. Aldrich), 5.5 µg/ml transferrin (Sigma Aldrich) and 6.6 ng/ml, sodium selenite (Sigma Aldrich). All cell 652 lines were tested negative for mycoplasma contamination using the MycoAlert Assay kit (Lonza). Cells 653 were maintained at 37°C and 5% CO 2 . 654

Studies in humans 657
Phenotypes 658 Studied outcomes were FG (mmol/L), Ln-transformed FI (pmol/L), 2hGlu (mmol/L) and HbA1c (% of 659 hemoglobin). Glycemic measurements are described in detail for each contributing cohort in Table S1. Individual cohorts ran linear mixed models using the raremetalworker (v 4.13.2) or rvtests (v20140723) 684 software (Table S1). For each glycemic outcome, analyses were performed using an additive model for 685 the raw and the inverse normal transformed trait. In the manuscript and in all tables and figures effect 686 estimates and standard errors are for the raw trait, while the p-values are from the inverse normal 687 transformed trait analyses. Analyses were adjusted for age, sex, BMI, study-specific number of PCs and 688 other study-specific covariates (Table S1). Raremetal (v4.13.7 or higher) was used to combine results by 689 fixed-effect meta-analyses. Variants with P < 10 -4 for deviation from Hardy-Weinberg equilibrium or with 690 call rate < 0.99 in individual cohorts were excluded from meta-analyses. In single variant analyses, the 691 threshold for significance was P < 2.2×10 -7 for coding variants (stop-gained, stop lost, frameshift, splice 692 donor, splice acceptor, initiator codon, missense, in-frame indel and splice region variants). These Combining these approaches, we attempted to identify effector transcripts at each locus, and we 745 classified their likelihood of being correct depending on the strength of the evidence. Those effector 746 transcripts where there was strong evidence from reciprocal conditional analysis or support from 747 27 published data for the relevant glycemic trait or phenotype were labelled "gold"; those where the 748 effector transcript could not be defined by conditional analysis (either because it was inconclusive or 749 due to lack of data) but where there was strong biological plausibility for a given gene at the locus were 750 labelled "silver"; those where we had some tentative evidence but that was not strong enough to 751 warrant a "silver" classification were labelled "bronze", and the remainder were left with an unknown 752 effector transcript. Effector transcript classification into "gold", "silver" and "bronze" was undertaken 753 independently by four of the authors and the highly concordant consensus score was given (Table S6). Briefly, the basic EC-DEPICT method is as follows. We first obtain a list of significant input variants (the 806 most significant nonsynonymous variant per locus) and then map variants to genes based on 807 annotations from the CHARGE consortium (see URL). For each gene set, we obtain the gene set 808 29 membership z-scores for all trait-associated input genes and sum them to generate a test statistic. We 809 then take 2,000 permuted ExomeChip association studies (described in more detail below) and calculate 810 the average permuted test statistic for that gene set, as well as the permuted standard deviation. For 811 each permutation, the number of top genes we take as "input genes" is matched to the actual observed 812 number of input genes. We then calculate (observed test statistic -average permuted test 813 statistic)/(permuted standard deviation) to generate a z-score, which is converted to a p-value via the 814 normal distribution. False discovery rates were calculated by comparing the observed p-values to a 815 permuted P-value distribution generated with an additional set of 50 permuted association studies. 816 The permuted ExomeChip association studies are conducted by (1) generating 2,200 sets of normally 817 distributed phenotypes and (2) using these randomly generated phenotypes to conduct 2,200 818 association studies with real ExomeChip data. Using these permutations to adjust the observed test 819 statistics corrects for any inherent structure in the data (e.g. that pathways made up of longer genes 820 may be more likely to come up as significant by chance). 821 For these analyses, we first generated permutations based on ExomeChip data we had used previously 822 for this purpose: 11,899 samples drawn from three cohorts (Malmö Diet and Cancer [MDC], All New 823

Diabetics in Scania [ANDIS], and Scania Diabetes Registry [SDR]
). For simplicity, we refer to these cohorts 824 as the "Swedish permutations." 825 As part of our GSEA pipeline, we remove input trait-associated variants that are not present in the 826 permuted data to ensure that all variants are appropriately modeled. When using the Swedish 827 permutations, this generally results in removing a substantial fraction of the variants, especially of the 828 very rarest variants (due to the smaller sample size of the Swedish data relative to the data being 829 analyzed). We have previously observed that this filtering can actually improve the GSEA signal, possibly 830 due to more heterogeneous biology or a higher false-positive rate in these very rare variants (Turcot et  831 al., 2018). However, in this case, we observed that in performing this filtering, we excluded variants in 832 several known monogenic disease genes, such as HNF1A and SLC2A2. Therefore, we wished to repeat 833 the analysis with a set of permutations which would allow us to retain these variants. We thus repeated 834 the analysis with a second set of permutations consisting of 152,249 samples from the UK Biobank 835 (referred to as the "UKBB permutations"). The larger sample size in the UKBB permutations means more 836 variants are present and can therefore be included in the analysis. 837

Concordance of results from two different sets of permuted distributions across phenotypes:
For 838 30 completeness, we report the results from the use of both sets of permutations. We note that the results 839 are strongly concordant. The larger number of significant gene sets reported based on the UK Biobank 840 permutations is generally a combination of 1) overall improved power (i.e. more variants are included) 841 and 2) the inclusion of variants in key driver genes absent in the Swedish permutations, encompassing 842 both the monogenic genes mentioned above (e.g. SLC2A2) and additional genes with clearly relevant 843 biology (e.g. CTRB2, SLC30A8). The results from both sets of permutations are summarized below. For all 844 analyses, "significance" refers to a false discovery rate of <0.05. were absent in the permutations or did not have a nonsynonymous annotation in the CHARGE 879 annotation file). This is because we assume that if the genes harboring those variants have strong 880 predicted membership in significantly trait-associated gene sets, they are still good candidates for 881 prioritization. In fact, this may be even stronger evidence in favor of these genes because they did not 882 contribute to the enrichment analysis and therefore their prioritization is independently derived (and 883 provides even more support to the implicated biology). We are grateful to the study participants who dedicated their time and samples to these studies. We also thank the VHS, the Swedish Diabetes Registry and Umeå Medical Biobank staff for biomedical data and DNA extraction. We also thank M Sterner, G Gremsperger and P Storm for their expert technical assistance with genotyping and genotype data preparation. The current study was funded by Novo Nordisk, the Swedish Research Council, Påhlssons Foundation, the Swedish Heart Lung Foundation, and the Skåne Regional Health Authority (all to PWF).

DPS
The DPS has been financially supported by grants from the Academy of Finland and Wellcome Trust. We are grateful to all the volunteers for their time and help, and to the General Practitioners and practice staff for assistance with recruitment.
We thank the Fenland Study Investigators, Fenland Study Co-ordination team and the Epidemiology Field, Data and Laboratory teams.

FIA3
Jansson J-H was responsible for the identification of MI cases. The FIA3 study was supported in part by a grant from the Swedish Heart-Lund Foundation (to PWF).

Hidetoshi Kitajima Hidetoshi Kitajima was funded by Manpei Suzuki Diabetes Foundation Grant-in-Aid
for the young scientists working abroad.

Inter99
The Inter99 was initiated by Torben Jørgensen (PI), Knut Borch-Johnsen (co-PI), Hans Ibsen and Troels F. Thomsen Table S6. Annotation and classification of effector transcripts into "gold", "silver" and "bronze" using GO terms and REACTOME gene sets. The enrichment results were considered significant when 1154 Bonferroni-adjusted p-value < 0.05 and at least 3% of the genes contained in the tested gene set is 1155 included in the network. Gene sets were also grouped using kappa score into functional groups to 1156 improve visualization of enriched pathways. Related to Figures 2 and S1. 1157    Effector transcript classification into "gold", "silver" and "bronze" categories based on strength of genetic and biological evidence. A total of 51 effector transcripts from 74 single variant and six gene-based signals were identified, with many of them shared across traits. The classification was undertaken independently by four of the authors and the consensus was used as the final classification for effector transcripts (see Methods). *Asterisk indicates "silver" for FG, "bronze" for 2hGlu. wh e r e d a r k r e d me a n s " v e r y l i k e l y a me mb e r " a n d d a r k b l u e me a n s " v e r y u n l i k e l y a me mb e wh e r e d a r k r e d me a n s " v e r y l i k e l y a me mb e r " a n d d a r k b l u e me a n s " v e r y u n l i k e l y a me mb e r . " T h e g e n e s e t a n n o t a t i o n s i n d i c a t e wh e t h e r t h a t me t a -g e n e s e t wa s s i g n i f i c a n t a t F DR < 0 . 0 5 o r n o t s i g n i f i c a n t ( n . s . ) f o r e a c h o f t h e o t h e r E C-DE P I CT a n a l y s e s . F o r h e a t ma p i n t e n s i t y a n d E C-DE P I CT P -v a l u e s , t h e me t a -g e n e s e t v a l u e s a r e t a k e n f r o m t h e mo s t s i g n i f i c a n t l y e n r i c h e d me mb e r g e n e s e t . T h e g e n e v a r i a n t a n n o t a t i o n s a r e a s f o l l o ws : ( 1 ) t h e E u r o p e a n mi n o r a l l e l e f r e q u e n c y ( MA F ) o f t h e i n p u t v a r i a n t , wh e r e r a r e i s MA F < 1 %, l o w-f r e q u e n c y i s MA F 1 -5 %, a n d c o mmo n i s MA F > 5 %, ( 2 ) wh e t h e r t h e g e n e h a s a n On l i n e Me n d e l i a n I n h e r i t a n c e i n Ma n ( OMI M) a n n o t a t i o n a s c a u s a l f o r a d i a b e t e s / g l y c e mi c -r e l e v a n t s y n d r o me o r b l o o d t h e i n p u t v a r i a n t , wh e r e r a r e i s MA F < 1 %, l o w-f r e q u e n c y i s MA F 1 -5 %, a n d c o mmo n i s MA F > 5 %, ( 2 ) wh e t h e r t h e g e n e h a s a n On l i n e Me n d e l i a n I n h e r i t a n c e i n Ma n ( OMI M) a n l y a r r a y -wi d e s i g n i f i c a n t v a r i a n t s we r e c l a s s i f i e d , s o s u g g e s t i v e l y -s i g n i f i c a n t v a r i a n t s a r e b y d e f a u l t c l a s s i f i e d a s " NA " ) , ( 4 -7 ) wh e t h e r e a c h v a r i a n t wa s s i g n i f i c a n t ( P < 2 × 1 0 -7 ) , s u g g e s t i v e l y s i g n i f i c a n t ( P < 1 × 1 0 -5 ) , o r n o t s i g n i f i c a n t i n E u r o p e a n s f o r e a c h o f t h e f o u r t r a i t s , a n d ( 8 ) wh e t h e r e a c h v a r i a n t wa s i n c l u d e d i n t h e a n a l y s i s o r e x c l u d e d b y f i l t e r s ( s e e Me t h o d s ) . wa s s i g n i f i c a n t ( P < 2 × 1 0 -7 ) , s u g g e s t i v e l y s i g n i f i c a n t ( P < 1 × 1 0 -5 ) , o r n o t s i g n i f i c a n t i n E u r o p e a n s Ge n e V a r i a n t A n n o t a t i o n s C B F G n e t wo r k A Hb A 1 c n e t wo r k