A new hypothesis for type 1 diabetes risk: The at-risk allele at rs3842753 associates with increased beta cell INS mRNA in a meta-analysis of single cell RNA sequencing data

Type 1 diabetes is characterized by the autoimmune destruction of insulin secreting β cells. Genetic variations upstream at the insulin (INS) locus contribute to ~10% of type 1 diabetes heritable risk. Multiple studies showed an association between rs3842753 C/C genotype and type 1 diabetes susceptibility, but the molecular mechanisms remain unclear. To date, no large-scale studies have looked at the effect of genetic variation at rs3842753 on INS mRNA at the single cell level. We aligned all human islet single cell RNA sequencing datasets available to us in 2020 to the reference genome GRCh38.98 and genotyped rs3842753, integrating 2315 β cells and 1223 β-like cells from 13 A/A protected donors, 23 A/C heterozygous donors, and 35 C/C at-risk donors, including adults without diabetes and with type 2 diabetes. INS expression mean and variance were significantly higher in single β cells from females compared with males. Comparing across β cells and β-like cells, we found that rs3842753 C containing cells (either homozygous or heterozygous) had the highest INS expression. We also found that β cells with the rs3842753 C allele had significantly higher ER stress marker gene expression compared to the A/A homozygous genotype. These findings support the emerging concept that inherited risk of type 1 diabetes may be associated with inborn, persistent elevated insulin production which may lead to β cell ER stress and fragility.


INTRODUCTION
Pancreatic β cells are the body's primary source of insulin, a hormone that promotes glucose uptake from the bloodstream into tissues [1]. Type 1 diabetes constitutes ~10% of diabetes cases and is caused by the autoimmune destruction of a substantial proportion of β cells [1]. Type 1 diabetes is highly heritable with a twin concordance between 30% and 70% [2]. The HLA locus is the major genetic determinant of type 1 diabetes risk (OR = 0.02 to 11), but other genes play a significant role as well [3]. Gene mapping and GWAS studies identified the human insulin (INS) locus as the 2 nd most important contributor of type 1 diabetes genetic risk (OR = 2.38, ~10% of heritable risk) [4; 5]. The region of the INS locus implicated in type 1 diabetes susceptibility has three genetic variants in near-perfect linkage disequilibrium with each other [4][5][6].  [4][5][6][7][8]. Often, the SNP alleles are examined as surrogates for INS-VNTR length [6; 8-10]. The rs3842753 C allele has been consistently correlated with increased type 1 diabetes susceptibility [5; 6; 8].
Despite genetic studies linking the rs3842753 C allele to increased type 1 diabetes susceptibility, there have only been three studies examining the effect of these genetic variants on INS expression in the pancreas [6; 9; 10]. The studies found an allele specific association between rs3842753 C allele and increased INS expression in heterozygous adult and fetal whole pancreas tissue, but were confounded by a lack of cell type specificity. Moreover, the sample sizes were small (n = 3, 1, and 10) [6; 9; 10]. These findings lend support to the β cell ER stress model of type 1 diabetes pathogenesis. It has been hypothesized that increased insulin demand could generate large proportions of misfolded or unfolded proinsulin, leading to ER stress [11]. Under glucose stimulation [12] and even basal conditions [13], insulin production is a significant source of ER stress. If the allele-specific increase in INS expression is translated into increased insulin production, there could be increased ER stress activated unfolded protein response leading to increased generation of unique antigen that present exclusively under specific conditions (neoautoantigens) [14]. Neoautoantigens derived from INS have been shown to stimulate Tcell proliferation and cytokine production, though the role of these neoautoantigens on type 1 diabetes autoimmunity remain unclear [15].
Recently, multiple data sets of single cell RNA sequencing (scRNAseq) from human pancreatic islet cell have been produced [16][17][18][19][20][21][22][23]. Studies have found distinct subpopulations of β cells and β-like cells, with varying conclusions for differences in INS expression, confirming heterogeneity as a core feature of islet cell biology [18; 23]. Studies have also found transcriptomic differences in β cells from donors without diabetes compared to donors with type 2 diabetes [16][17][18], although these have not always been reproducible between studies [24] (likely due to the complex type 2 diabetes etiology and limited sample size). We reasoned that integrating all available scRNAseq data from human pancreatic islets into one dataset and conducting analyses with a greater sample size could resolve true differences.
In this study, we examined the effect of rs3842753 genotype, sex, and type 2 diabetes status on INS expression independently and in combination in single pancreatic β cells using an integrated dataset containing all available scRNAseq data. Our findings are consistent with the novel hypothesis that elevated β cell insulin production contributes to inherited type 1 diabetes risk by increasing β cell stress.

Data inclusion
We conducted all scRNAseq alignments and initial dataset integration analysis using the Cedar Compute Canada cloud compute resource (www.computecanada.ca). All subsequent analyses were done in RStudio version 3.6.0. We included all published pancreatic islet scRNAseq datasets obtained using Smart-seq2 or Smart-seq methods for library preparation due to their higher sequencing depth for accurate SNP genotyping. We also included the Human Pancreas Analysis Program (HPAP) data available at the time of our study [22]. We obtained the published datasets through the Gene Expression Omnibus (GEO) or the European Bioinformatics Institute (EMBL-EBI), and the HPAP datasets directly from investigators (Supplemental Table 1). All dataset metadata contain donor age, sex, and diabetes status. The read length ranges from 43bp to 100bp and median read depth per cell ranges from 0.75 to 4.4 million reads. In total, we used data from 71 donors, split into 48 donors without diabetes and 23 donors with type 2 diabetes. At the time of our analysis, single cell data was only available from 1 adult donor and 1 child donor with type 1 diabetes, precluding their use in our analysis which requires multiple donors of each genotype. We also expect that gene expression analysis of β cell from donors with type 1 diabetes would be severely confounded by the stresses associated with the disease.

Read alignment and genotyping
Single cell RNAseq reads were aligned to human reference genome GRCh38.98 using STAR version 2.7.1a [25] with gene read counts obtained using --quantMode GeneCounts. Read count files were aggregated into study specific read count matrices using a custom code. Each cell was genotyped at SNP rs3842753 (chr 11:2159830), using Samtools mpileup version 1.9 [26]. Pancreatic β cells (indicated by total read count (DP4) at rs3842753 > 100) were used for donor genotype determination. Reference allele percentage was calculated as The alternate allele percentage was calculated as Homozygosity was determined as reference or alternate allele percent > 80%, and heterozygosity otherwise. Donor genotype is determined as the majority of the individual's β cells' genotype. Less than 5% of individual β cells' genotype are non-concordant with donor genotype. Genotype non-concordant cells are included in downstream analysis and re-labeled with donor genotype.

Dataset filtering and cell type analysis
Initial filtering removed cells with abnormally high total RNA counts and genes expressed (i.e. doublet cells) and cells with low number of expressed genes and/or high mitochondrial gene percent (low viability), through visual inspection of distribution graphs for each dataset. The upper and lower limits were specific to each dataset to account for library preparation and sequencing protocol differences. The highest mitochondrial gene percentage was set to 25%.Datasets were integrated into a single dataset in Seurat version 3.1 [27] using SCTransform [28] and clustered in UMAP space using default settings. Top ten differentially expressed genes in each cluster (compared to all other cells) was used to determine cluster cell type identify. As well, the location of pancreatic hormone genes (GCG, INS, PPY and SST) expression was used to determine Alpha (α), β , PP, and delta cell clusters, respectively. Enrichr (gene enrichment analysis) [29] was used when cluster identity could not be determined using previous methods.

Gene expression analysis
ER stress marker genes expression was normalized to all genes expressed in cell according to default   Table 2).

Cell type clustering and identification
We performed cluster analysis and labeling to identify specific cell types since our dataset included a mixture of cells from pancreatic islets. We found 14 distinct clusters within 13622 pancreatic islet cells

rs3842753 C allele associates with increased ER stress markers in single β cells and β-like cells
Insulin production has been shown to sustain chronic baseline ER stress [13]. Due to the increase in INS expression observed in cells with the at-risk rs3842753 C allele compared to the protective A allele, we were tested whether rs3842753 genotype also affected markers of ER stress. Indeed, the rs3842753 C allele was significantly associated with increased ERN1 and ATF6 expression in β cells ( Figure 4A, B).
There was no association between ERN1 or ATF6 expression and rs3842753 genotype in β-like cells ( Figure 4C, D). These results indicate that rs3842753 C allele is associated with increased ER stress in 'mature' β cells.  Figure   1A). Cell type variation in the β -like cells cluster was likely the primary factor affecting INS expression variance compared to the rs3842753 genotype. In other work from our lab, we have recently found that β cells can fluctuate between these states of high and higher insulin gene expression [31]. In the current study, we are unable determine whether comparatively high or low INS expression in single cells reflects dynamic or stable states.

DISCUSSION
Sex and diabetes status play an important role in insulin secretion and insulin resistance. We found that female donors had more pancreatic β cell INS expression than males. This is consistent with previous findings that males secrete less insulin after glucose stimulation than females, though insulin secretion under basal conditions is less extensively studied [32]. proposed for other systems [33]. BMI, dietary differences, and ethnicity [1]. To address these limitations, we would need to increase the sample size, especially in younger age ranges.
We argue for an increase in studies using integrated scRNAseq datasets to take advantage of the massive amount of publicly available data. Our integrated dataset includes the most recent pancreatic islet scRNAseq data and will be available upon request. benefit greatly with increased sample size and robust single-cell protein measurements, although it should also be noted that insulin protein content does not necessarily reflect transcription or translation rates.
What is the translational significance of these molecular genetic findings? Our results confirm the association between the rs3842753 C allele and increased INS expression, first proposed using whole pancreas from a small number of donors, at the single cell level using 2315 β cells and 1223 β-like cells from 71 donors. Our findings support the emerging concept that inherited risk of type 1 diabetes at SNP rs3842753 may be associated with elevated insulin production at the mRNA level and an increase in β cell ER stress. The insulin production-driven ER stress we have previously described [13] could make β cells, especially those near their maximal insulin synthesis and folding capacity, more fragile and sensitive to external stresses. Our data in β-like cells, which have lower absolute INS expression and do not display the same signs of ER stress, offers further support for that idea. Increased insulin production may also lead to errors in insulin mRNA translation, proinsulin protein processing and insulin peptide degradation, as well as other factors that could promote neoautoantigen generation and presentation, and the provocation of an autoimmune reaction (Figure 4 C) [11; 36]. Our results do not preclude a pathological role for VNTR-associated decrease in INS expression in the thymus, which has been proposed to impair central tolerance [37], although a dominant role for thymic negative selection in type 1 diabetes has been questioned [38]. Taken all together, paradoxically elevated pancreatic insulin production from birth could make some β cell more vulnerable to stress, against the background of HLAdriven autoimmunity. This hypothesis should be testable in animal models that have insulin production is specifically reduced in β cells [39], if they were crossed to a type 1 diabetes susceptibility background