A Novel Gene ARHGAP44 for Longitudinal Changes in Glycated Hemoglobin (HbA1c) in Subjects without Type 2 Diabetes: Evidence from the Long Life Family Study (LLFS) and the Framingham Offspring Study (FOS)

Glycated hemoglobin (HbA1c) indicates average glucose levels over three months and is associated with insulin resistance and type 2 diabetes (T2D). Longitudinal changes in HbA1c (ΔHbA1c) are also associated with aging processes, cognitive performance, and mortality. We analyzed ΔHbA1c in 1,886 non-diabetic Europeans from the Long Life Family Study to uncover gene variants influencing ΔHbA1c. Using growth curve modeling adjusted for multiple covariates, we derived ΔHbA1c and conducted linkage-guided sequence analysis. Our genome-wide linkage scan identified a significant locus on 17p12. In-depth analysis of this locus revealed a variant rs56340929 (explaining 27% of the linkage peak) in the ARHGAP44 gene that was significantly associated with ΔHbA1c. RNA transcription of ARHGAP44 was associated with ΔHbA1c. The Framingham Offspring Study data further supported these findings on the gene level. Together, we found a novel gene ARHGAP44 for ΔHbA1c in family members without T2D. Follow-up studies using longitudinal omics data in large independent cohorts are warranted.


Introduction
Glycated hemoglobin (HbA1c) is utilized both for the diagnosis and monitoring of type 2 diabetes (T2D), indicative of glycemic control and long-term complication risks in T2D management (Sherwani et al., 2016).HbA1c levels have a genetic basis.More than 120 loci associated with HbA1c have been identified in individuals without T2D through genome-wide association studies (GWAS) (Chen et al., 2021).Linkage scans have also revealed significant genomic regions influencing HbA1c (Meigs et al., 2002(Meigs et al., , 2007)).Some of those regions and genes have been confirmed in multi-ancestry cohorts (Sarnowski et al., 2019).
Despite the acknowledged importance of HbA1c for T2D diagnosis and management, there remains a lack of longitudinal studies focusing on the long-term changes in HbA1c levels (ΔHbA1c).Most existing research tends to focus on the short-term fluctuations and control of HbA1c in T2D patients.However, understanding the long-term trends and changes in HbA1c levels, especially in populations without T2D, is crucial for developing more effective strategies for pre-diabetes diagnosis and improving healthy aging.
A previous GWAS from the Long Life Family Study (LLFS) confirmed two known common loci at GCK and HK1 and uncovered 25 suggestive loci for influencing baseline HbA1c among nondiabetic participants of the LLFS (An et al., 2014).In the present study, we conducted a GWAS of ΔHbA1c followed by a linkage-guided sequence analysis under a significant linkage peak on 17p12 using the latest available whole genome sequencing and omics data, and further pursued replication using the Framingham Offspring Study (FOS) data.

Cohort Populations
The LLFS is a comprehensive, international, longitudinal study that spans two European ancestry generations, focusing on longevity and the factors underlying healthy aging (Wojczynski et al., 2022) coefficient of variation ranging from 1.4% to 1.9%.ΔHbA1c, derived by growth curve modeling using HbA1c collected from two exams seven years apart, was adjusted for age, sex, BMI, smoking, field centers and 20 principal components, and blom-transformed to approximate normality prior to genetic testing.Subjects with clinical diagnosis of T2D or T2D treatment and undiagnosed T2D cases whose fasQng glucose ≥ 126 mg/dl or HbA1c ≥ 6.5% were excluded from this analysis.

Sequencing Data PreparaPon
Genotyping was performed on parQcipants using the Illumina Human Omni 2.5 v1 chip by the Center for Inherited Disease Research (CIDR), leading to 1,421,289 SNPs aoer applying quality controls for call rate<98%, minor allele frequency (MAF) < 1%, p value Hardy-Weinberg equilibrium < 1e−6, and correct correspondence with the 1000 Genomes Project.
Whole genome sequencing (WGS) was executed using Illumina plasorms at the McDonnell Genome InsQtute (MGI), Washington University, with reads aligned to GRCh38.Variant calling followed a four-step process using GATK tools, with addiQonal QC to eliminate contaminated samples and those with unsuitable coverage or high Mendelian errors.
The visit 1 RNA sequencing was performed on extracted RNA from PAXgene™ Blood RNA tubes using the PAXgene microRNA extracQon kit.Library preparaQon and quality control was managed by the Division of ComputaQon & Data Sciences, Washington University.The nfcore/rnaseq 3.14.0pipeline facilitated read alignment, duplicaQon marking, and transcript quanQficaQon.Post-processing included filtering out samples with a high fracQon of reads mapping to intergenic regions, filtering out genes with very low expression levels, normalizing gene counts with variance stabilizing transformaQon in DESeq2, leading to a final selecQon of 1,810 samples and 16,418 genes.
Lipid metabolomic profiling at visit 1 was executed at Washington University's Biomedical Mass Spectrometry Lab.The laboratory implemented LC/MS for untargeted lipid detecQon, matched polar metabolites with internal and online databases, and annotated MS/MS lipid data.Aoer rigorous QC and batch effect correcQon using a pooled QC sample, the analysis yielded data on 188 lipids across 13 compound classes.

Framingham Heart/Offspring Study for ReplicaPon
The Framingham Heart/Offspring Study (FHS/FOS) is a longitudinal cohort study tracking three generaQons for up to 65 years to assess cardiovascular disease risk factors.HbA1c measurements were taken from exam 5 and exam 7, selected for their similarity with LLFS intervals.Longitudinal HbA1c employed the same growth curve modeling adjusted for age and sex.Its genomic research, including whole genome sequencing (WGS) and related phenotypic analysis, is incorporated into the NHLBI's Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Program.WGS (freeze 9b) data from the Framingham Offspring GeneraQon was used for replicaQon.Bi-allelic single nucleoQde variants (SNVs) that pass all QC filters were kept, resulQng in ~52 million variants.Samples were excluded if they were either sequence controls, not sequenced in blood, or had FREEMIX percentage > 3%.AddiQonally, samples were removed if they had a mean depth of < 30x or < 95% of sites covered at 10x or < 80% at 20x.A total of 2,186 parQcipants, derived from the offspring cohort's sequencing data, were queried for replicaQon.

StaPsPcal Analysis
The SequenQal Oligogenic Linkage Analysis RouQnes (SOLAR) program is designed to accommodate familial relatedness by employing maximum-likelihood based methods.These methods are uQlized to esQmate the residual geneQc heritability of outcome measures, as well as to discern the variance that can be a{ributed to fixed covariate effects (Almasy & Blangero, 1998).We used SOLAR to select linked families with PEDLOD > 0.1 or PEDLOD/N > 0.01, where N was family members included in the pedigree trait count, are defined as the "potenQally linked" of families.Then we ranked the families by LOD and the minimum number of families with the highest LODs that add up to at least 9 were defined as the "top linked" families.We performed GWAS analyses using a linear mixed model for addiQve dosage of the variants.Familial relaQonships were accounted for by including a kinship matrix, esQmated via the "kinship" R package, as a random effect in the "lmekin" R package.LODs > 3 for GWLS and p < 5e-8 for GWAS associaQon were used to declare significance.AssociaQons between phenotype and RNASeq or phenotype and metabolites were analyzed using similar linear models adjusted for age, sex, and field centers.

Demographic CharacterisPcs
This analysis included a total of 1,886 family members (826 men and 1,060 women) from LLFS with complete phenotypic data at both visits and genotypic informaQon (Table 1).Similar inclusions and exclusions were applied in the replicaQon cohorts and a total of 1,739 (752 men and 987 women) from the FOS with complete phenotypic data at both visits and genotypic informaQon were selected for replicaQon (Table 1).Detailed characterisQcs of the LLFS linkageenriched group and others are also given in Table 2.We identified 176 subjects from 16 linkageenriched families ("top linked" families with cumulative LODs over 9) of in the LLFS.Significant mean differences in characteristics were noted both between the study group and the others, as well as within each group across sexes.Additionally, the two cohorts differ significantly in terms of HbA1c changes; the LLFS shows minimal change in HbA1c levels, whereas the FOS exhibits a more substantial change (Table 1).
Taking advantage of our available transcriptome sequencing (RNAseq) data, we assessed the association between quantification of the ARHGAP44 RNA transcript and ΔHbA1c among 176 subjects, and we found they were significantly associated (β = -0.0002,SE = 0.002, p = 0.02).We further explored the association of the ARHGAP44 RNA transcript and its corresponding SNP rs56340929, we found that the association was only marginally missed (β = -0.21,SE = 0.16, p = 0.07).We assessed currently available metabolomics data (188 metabolites in 13 compound classes) among the 16 linkage-enriched families.We found that triacylglycerol (p = 0.024) and sphingomyelin (p = 0.036) appeared to be marginally associated with ΔHbA1c.These however were non-significant after the p-values were corrected for multiple testing (p < 3e-4 for metabolites or p < 0.004 for compound classes).We also assessed rs56340929 associations with the currently available lipidomic data; and we found no significant associations after correction for multiplicity.This would not suggest at least from this analysis that the ARHGAP44 gene is directly engaged in regulating lipid metabolism.

ReplicaPon in the FOS
A total of 1,739 non-diabeQc subjects from the FOS with complete phenotypic data at both visits and genotypic informaQon were used for replicaQon.Another SNP rs140270267 (830 bp downstream of the rs56340929, p = 0.0002, MAF = 2%) was idenQfied, which indicated that the ARHGAP44-rs56340929 was only replicated at a nearby site (rs140270267, D' = 1, R 2 = 0.002, not in linkage disequilibrium, not at an exact SNP site) or at the ARHGAP44 gene level (SimpleM, Neff = 148, p = 0.00029 < 0.00034).

DISCUSSION
This analysis found an appreciable and significant geneQc component (heritability of 37%) for DHbA1c among non-diabeQc subjects from the LLFS.The heritability appears to be compaQble with or slightly lower than our previously reported esQmate of 42% in the LLFS (An et al., 2014) and esQmates of 47-59% in other family studies (Meigs et al., 2002;Pilia et al., 2006;Soranzo, 2011) for HbA1c at baseline.No other familial aggregaQon reports of DHbA1c are noted.
InteresQngly, this analysis idenQfied a significant linkage peak on 17p12 (LODs = 3.6) for DHbA1c.Several studies found suggesQve linkage evidence on this region for relevant traits including fasQng glucose (Loos et al., 2003), circulaQng lepQn (Kissebah et al., 2000), T2D (Lindgren et al., 2002), coronary artery disease (Gao et al., 2014), and metabolism syndrome (Kissebah et al., 2000;Zhang et al., 2013).While no previous linkage scans were found for DHbA1c, Meigs and colleagues reported a suggesQve linkage on chromosome 1 for baseline HbA1c in the FHS (Meigs et al., 2002).InteresQngly, in this analysis, we found that our linkage peak on 17p12 was substanQally a{enuated (LODs from 3.6 to 1.0) when DHbA1c was corrected for baseline HbA1c, whereas it only modestly changed when DHbA1c was corrected for baseline fasQng glucose levels (LODs from 3.6 to 3.4) or baseline hemoglobin levels (LODs from 3.6 to 3.8).This observaQon would suggest that the linkage peak for ΔHbA1c may be parQally condiQonal on baseline HbA1c levels, but it does not clearly disQnguish between glycemic and non-glycemic (erythrocyQc) pathways.
The only staQsQcally significant discovery from our linkage-guided sequence analysis was the ARHGAP44-rs56340929 (p = 2e-6, MAF = 6%) for DHbA1c.We assessed all sequence elements under the linkage peak on 17p12 (within 1-LOD support interval from 12.7 Mb to 12.9 Mb) and found this lead variant accounted for nearly 30% of the linkage peak.Our RNAseq data showed that ARHGAP44 expression level was only marginally missed its associaQon with the variant (p = 0.07) but significantly associated with DHbA1c (p = 0.02).Our lipidomics data did not reveal any metabolites that were significantly associated (p < 3e-4 aoer mulQple tesQng correcQon) with DHbA1c or with the variant.This observaQon would suggest the ARHGAP44 gene variant does not directly regulate lipid metabolism.The ARHGAP44 encodes Rho GTPase acQvaQng protein 44 which is involved in the control of Rho-type GTPases (Xu et al., 2017).The ARHGAP44 gene has also been reportedly associated with cardiovascular diseases, serum creaQnine and glycemic traits including HbA1c (Dornbos et al., 2022).Finally, we looked up the ARHGAP44-rs56340929 in the FOS and found encouraging validaQon evidence at a neighboring variant (rs14270267, 830 bp downstream, p = 0.0002, r 2 = 0.002) and at the gene level (simpleM, p < 0.0003, Neff = 148, aoer correcQon for mulQple tesQng) though not at the exact SNP site (p = 0.8).
Strengths of the current study included its extended family design, relaQvely large sample size, longitudinal data availability with mulQple visits, well-defined phenotypic measures, and moreover availability of mulQ-omics data.However, there are also few limitaQons needed to be noted in this analysis.Sample heterogeneity and thus geneQc heterogeneity may exist across the LLFS and FOS.This was evidenced by significant mean differences in HbA1c levels and key covariates between the two studies (see Table 1).The excepQonal longevity of the LLFS sample compared to the general populaQon may introduce selecQon bias that potenQally rendered somewhat data heterogeneity.AddiQonally, incomplete access to the FOS data with missing 12 covariate variables may impede the exact replicaQon of phenotyping adjustment methods, which may further diminish the validity of the replicaQon.
In conclusion, this integrated linkage-guided sequence analysis allowed for our idenQficaQon of a novel gene the ARHGAP44 for ΔHbA1c in the LLFS with supporQve evidence from our omics data as well as encouraging replicaQon data using the FOS.Further independent replicaQons from large cohorts are needed to confirm and extend our findings.
. The LLFS was conducted at four field centers, three in the United States (Boston, Pittsburgh, New York) and one in Denmark.It enrolled 4,953 participants from 539 families between 2006 and 2009.The visit 1 collected data on anthropometrics, blood pressure, physical performance, pulmonary function, and various blood tests.A second visit from 2014 to 2017 replicated the initial protocols and added carotid ultrasonography measures.HbA1c levels were measured at the University of Minnesota's Advanced Research and Diagnostics Laboratory using high-performance liquid chromatography (HPLC).The measurements employed Tosoh analyzers calibrated to the National Glycohemoglobin Standardization Program's standards.The laboratory's precision for HbA1c values showed a

Figure 1A .
Figure 1A.An overlay of Manha{an plot from the GWAS and GWLS results.The illustraQon

Figure 1B .
Figure 1B.Highlights of the idenQfied lead SNP rs56340929 under the right linkage peak (17p12)

Table 1 .
Sample characterisQcs of the LLFS and FOS cohorts *Absolute mean change