Inflammatory and JAK-STAT Pathways as Shared Molecular Targets for ANCA-Associated Vasculitis and Nephrotic Syndrome

Background Glomerular diseases of the kidney are presently differentiated, diagnosed and treated according to conventional clinical or structural features. While etiologically diverse, these diseases share common clinical features including but not limited to reduced glomerular filtration rate, increased serum creatinine and proteinuria suggesting shared pathogenic mechanisms across diseases. Renal biopsies from patients with nephrotic syndrome (NS) or ANCA-associated vasculitis (AAV) were evaluated for molecular signals cutting across conventional disease categories as candidates for therapeutic targets. Methods Renal biopsies were obtained from patients with NS (minimal change disease, focal segmental glomerulosclerosis, or membranous nephropathy) (n=187) or AAV (granulomatosis with polyangiitis or microscopic polyangiitis) (n=80) from the Nephrotic Syndrome Study Network (NEPTUNE) and the European Renal cDNA Bank. Transcriptional profiles were assessed for shared disease mechanisms. Results In the discovery cohort, 10–25% transcripts were differentially regulated versus healthy controls in both NS and AAV, >500 transcripts were shared across diseases. The majority of shared transcripts (60–77%) were validated in independent samples. Therapeutically targetable networks were identified, including inflammatory JAK-STAT signaling. STAT1 eQTLs were identified and STAT1 expression associated with GFR-based outcome. A transcriptional STAT1 activity score was generated from STAT1-regulated target genes which correlated with CXCL10 (p<0.001), a JAK-STAT biomarker, predictors of CKD progression, interstitial fibrosis (r=0.41, p<0.001), and urinary EGF (r=-0.51, p<0.001). Conclusion AAV and NS caused from histopathologically distinct disease categories share common intra-renal molecular pathways cutting across conventional disease classifications. This approach provides a starting point for de novo drug development, and repurposing efforts in rare kidney diseases.


Introduction
Rare diseases represent a unique challenge in both patient management and drug development. Nearly 30 million Americans, or approximately 10% of the population, are afflicted by at least one of over 7,000 rare diseases 1,2 . Because the patient population for any single rare disease cause can be quite small, understanding the underlying mechanisms of each disease and thus, the opportunity to develop and apply effective treatments is limited. As a consequence, the majority of rare diseases have no approved therapies 1,3 . With the advance of molecular profiling of tissue samples obtained from organ systems damaged by the rare disease, a data-driven understanding of the molecular basis across these diseases is in reach. Understanding shared molecular pathways activated during tissue damage can help broadly define mechanisms in distinct diseases that may be leveraged for drug discovery and repurposing across rare diseases 4 . By selectively targeting specific mechanisms of rare diseases responsible for chronic progressive end-organ dysfunction and damage, it may be possible to preserve organ function even when the disease initiating process is not directly impacted.
Nephrotic syndrome (NS) refers to a group of symptoms resulting from etiologically distinct diseases including focal segmental glomerulosclerosis (FSGS), minimal change disease (MCD), and membranous nephropathy (MN). To address the opportunity for drug repurposing and target identification in rare renal diseases, histologically distinct rare diseases (primary nephrotic syndrome (NS) and ANCA-associated vasculitis (AAV, granulomatosis with polyangiitis (GPA) and microscopic polyangiitis (MPA)) that each can result in end-organ kidney damage, were evaluated to identify shared molecular targets. Based on previous work that identified shared pathways across many distinct histopathologically defined chronic kidney diseases including primary forms of NS: FSGS, MCD and MN, as well as diabetic kidney disease and lupus nephritis 5 , the hypothesis tested was that transcriptional profiling in etiologically distinct rare diseases would enable the identification of shared disease mechanisms. Using primary NS (FSGS, MCD, and MN) and AAV as exemplar disease indications, we sought to test this hypothesis. These profiles were subsequently interrogated to identify shared activated pathways, with a focus on inflammation and fibrosis, and subsequently linked to clinical features and noninvasive biomarkers of pathway activation ( Figure 1). Activated pathways were further evaluated to discover opportunities for drug repurposing, and identification of novel drug targets and/or biomarkers of kidney disease ( Figure 1).

Results
Previous studies have indicated shared networks in FSGS, MCD, and MN 5 ; and current studies from our group have demonstrated a continuum of expression between FSGS and MCD (unpublished data), we first sought to confirm common transcriptional programs in these primary forms of NS using data reduction and data-driven approach to assess transcriptional profiles from subjects with FSGS, MCD, or MN. Using transcriptional data generated from subjects with FSGS, MCD, and MN from ERCB, there were no distinct disease-specific clusters when a principal component analysis was performed on both the glomeruli or tubulointerstitium compartments (using data deposited in GEO under GSE104948 and GSE104954, Supplemental Figure 1a and 1b). To assess this statistically, eigen vectors for the first three principal components were assessed for differences across histopathologically classified diseases and displayed no evidence of significant differeces (Supplemental Figure 1c-e), thus FSGS, MCD, and MN were considered as a group of primary NS for comparison with AAV. To identify shared molecular mechanisms in NS and AAV (Figure 1), intrarenal transcripts in NS and AAV were evaluated in a discovery set from the ERCB. Affymetrix GeneChip-based steady state transcript expression from the kidney was separately derived from the glomeruli and tubulointerstitial compartment of kidney biopsies from 62 patients with NS and 23 patients with AAV from the ERCB (Table 1, Discovery cohort) and compared to intrarenal transcript profiles generated from 22 healthy living transplant donor biopsies (LD). In the tubulointerstitium, 1768 of the total of 11,666 expressed transcripts assessed in samples from patients with AAV and 565 genes in samples from patients with NS were differentially expressed (│FC│>1.3, q-value<0.05) versus LD. Of the 565 differentially expressed transcripts in the tubulointerstitium from subjects with NS, 90% (511/565) were also differentially expressed in the tubulointerstitium from subjects with AAV ( Figure 2a). Importantly, the direction of expression was conserved across all transcripts in the tubulointerstitium, with a heatmap of the differentially expressed genes in the discovery cohort shown in Figure 2b. The top 10 differentially expressed transcripts in the tubulointerstitium from the discovery cohort are shown in Table 2, while a full list of differentially expressed transcripts is provided in Supplemental Table   1. In the glomeruli, 2322 genes were differentially expressed in AAV relative to LD, and 1059 in NS versus LD biopsies. A majority of differentially expressed transcripts in the glomeruli from patients with NS, (88% or 930/1059) were also differentially expressed in the glomeruli from patients with AAV, and nearly all (99.8% or 928/930) of shared transcripts were regulated in the same direction across diseases (Supplemental Figure 2, differentially expressed transcripts in the glomerular compartment are provided in Supplemental Table 2).
These results are consistent with a common shared intra-renal transcriptional response across kidney diseases, independent of histolopathological diagnosis.
Hierarchical clustering was then performed on the differentially expressed genes in the discovery cohort to explore sample and cluster level similarities or differences in expression profiles from patients with nephrotic syndrome and ANCA-associated vasculitis. Using the tubulointerstitial data, hierarchical clustering identified four clusters (Figure 3a). The first cluster contained all of the healthy LD samples, as well as a broad spectrum of samples across kidney diseases (MCD, FSGS, AAV), while clusters 2, 3, and 4 contained samples from all diseases, but no LD (Figure 3b). The association of transcript clusters with phenotypes was evaluated by assessing eGFR across the clusters (Figure 3c). Within each histological diagnosis, significant eGFR differences were observed across clusters (ANOVA, p<0.05) (Figure 3c). The majority of transcriptional profiles from MCD patient samples were found in clusters 1 and 2 and these were from patients with MCD that had the highest mean eGFR (110±20 and 82±35 in clusters 1 and 2, respectively). Conversely, a majority of profiles from AAV patient samples were found in clusters 3 and 4 (Figure 3b), and were from patients with the lowest eGFR across AAV (14±10 and 15±10 for clusters 3 and 4 respectively). Samples from patients within each histological diagnosis, had significantly lower eGFR in cluster 4 (p<0.001), and increased transcripts in this cluster of patients were negatively correlated with log2 eGFR across diseases. For example, the top differentially regulated genes in cluster 4 -CXCL6, C3 and SERPINA3 showed a negative GFR correlation (r≤-0.4, p<0.001 for all three genes, Figure 3d) linking an inflammatory disease state to impaired renal function independent of underlying histological diagnosis.
To validate shared transcripts in the tubulointerstitium and glomeruli, we profiled an independent set of samples for AAV from ERCB and an independent set of samples for nephrotic syndrome from NEPTUNE on Affymetrix ST2.1 gene chips (Table 1, Validation cohort), assessing the 511 shared transcripts identified in the tubulointerstitium and the 928 shared transcripts in the glomeruli from the discovery set (Table 3). In the tubulointerstitium, 508 of 511 genes identified in the discovery cohort were expressed above background expression levels in the validation cohort and could be assessed for differential expression, while all 928 differentially expressed genes in the glomeruli were available for assessment in the validation cohort. Table 3 provides summary level information and individual genes validated in both the tubulointerstitium and glomeruli; full results are provided in Supplemental Tables 3 and 4 for each compartment, respectively. Thus, the vast majority of transcripts identified as differentially regulated and shared between diseases in the discovery phase were validated in a test dataset of independent samples.
The complement gene, C3 was among the highest up-regulated transcript in both tubulointerstitial and glomerular samples from patients with NS and AAV in the discovery and validation cohorts (Figure 4a-d). As complement activation has been implicated in AAV 6 , the status of the pathway in AAV and NS was investigated further. Multiple genes in the complement pathway showed tissue level transcript up-regulation in both NS and AAV in the TI (Figure 4e). The complement system pathway was one of the most significantly increased canonical pathways as tested by a panel of enrichment strategies (Table 4).
While individual gene expression profiles can be identified and mapped as drug targets, identifying the upstream mechanisms influencing downstream gene expression is likely a more rigorous strategy to identify functionally active, and potentially targetable signaling networks [7][8][9] . To derive the activation state of targetable pathways from the transcriptional profiles, two approaches were taken. The first strategy assessed whether any of the validated differentially expressed transcripts (396 shared transcripts with q-value<0.1 in the tubulointerstitium) could be explained by predicted upstream regulators. This was paired with an independent approach to match the shared transcriptional profile with those of cell lines exposed to various perturbagens targeting known pathways (small molecules, over-expressing constructs, and siRNA knockdown).
Upstream regulators are molecules (including but not limited to: transcription factors, growth factors, kinases, chemicals) predicted to affect activity of downstream expression profiles. Activities are predicted using literature derived cause and effect relationships, e.g. up-regulation of a transcription factor target gene supports a predicted activation in the transcription factor, while down-regulation supports inhibition [7][8][9] . The top biological networks from upstream regulator analysis in the TI are shown in Table 5 (an expanded list of results is provided in Supplemental Table 5; glomerular results are found in Supplemental Table 6). Predicted activities of upstream regulators are based on directionality of underlying gene expression that is similar to (predicted activation) or inverse (predicted inhibition) of the uploaded differential expression signature. The top enriched upstream regulators were indicative of inflammatory pathway activation, and included predicted activation of cytokines IFNγ, TNF and predicted inhibition of the glucocorticoid receptor (NR3C1). A mechanistic network of interconnected upstream regulators was discovered (upstream regulators that can explain multiple downstream transcripts as part of a network) (Figure 5a), and explains 56% of the downstream gene expression changes in the shared, tubulointerstitial data set ((222/396)).
To further define functional associations, the 396 gene shared transcriptional profile from the tubulointerstitium was used to query Connectivity Map (CMap) L1000 profiles 10 . The resulting CMap output contains a ranking of experimentally generated transcriptional profiles generated in cell lines that are inverse (potential therapeutic) or similar (potentially exacerbating disease) to the shared transcriptional profile. This provided an additional, independent, data-driven approach to identify signaling networks and potential drug targets active in patients with NS and AAV. For these analyses, CMap profiles were limited to those with absolute profile scores >50 (13% of CMap profiles met this criteria). Looking at compound profiles that are inversely related to the shared signature, profiles of glucocorticoid agonists were the most frequent ( Figure   5b). Calcium channel blockers were also identified. IKK inhibitors, which block NFκB activation were also identified as inversely related to the shared signature. Activation of NFκB and RELA (p65 subunit of NFκB) were also predicted as part of the shared transcriptional profile and are present in the upstream regulator network ( Figure 5a). Interestingly, a number of profiles from ATPase inhibitors, HDAC inhibitors, protein synthesis inhibitors, and proteasome inhibitors were also identified. Over-expression (OE) experiments (Supplemental Table 7) with the highest CMap profile scores indicating a signature similar to the shared molecular profile included HLA-DRA (99.91), consistent with an inflammatory signal and PSMB10 (95.43), which would suggest elevated proteasome signaling shared across diseases.
To gain further insight into key signaling networks shared across diseases, upstream regulators with predicted activities from IPA were compared with CMap profiles to test for consensus between these two independent modeling approaches. The top 15 unique TI networks are shown in Table 6, (multiple CMap profiles for a compound are often observed due to profiles being generated for different treatment concentrations or duration) with 67% (10/15) showing consensus using independent approaches. These included networks representing lower glucocorticoid signaling (dexamethasone and fluticasone), reduced transcription factor activities in the TI of FOXA1 and HNF4A, and activation of CD44, STAT1, and SOX2 (Table 6). In CMap profiles, experimental knock-down of STAT1 resulted in an expression profile that was the inverse of the shared disease profile, and was among the top 5% of profiles showing this inverse relationship (Figure 5c). In the glomeruli, 18 unique nodes were identified (Supplemental Table 8) and 61% (11/18) were consistent across the two analyses. These included nodes representing a lowered glucocorticoid signal (dexamethasone), reduced transcription factor activities of FOXA1, HNF1A and HNF4A, and activation of TLR7, IL2, CCND1, PLAUR, IFNGR1, and NR0B2. These pathways are consistent with inflammatory activation and decreased epithelial differentiation states common among NS and AAV.
Consensus findings from IPA and CMap profiles were then investigated for the potential underlying genetic components that might contributed to an alteration in disease-associated expression and activity. TI and glomerular-specific cis-eQTLs were generated across the NEPTUNE cohort (www.nephqtl.org) 11 ; cis-eQTLs were fine mapped using the Bayesian Deterministic Approximation of Posteriors 12 , which identifies clusters of variants predicted to drive eQTL associations. STAT1, CD44, SOX2, FOXA1, ATP2A1, ATP2A2, ATP2A3 (the ATP2A family members encode sarcoplasmic reticulum Ca 2+ ATPase and are targets of thapsigargin), IPMK, HNF4A, MAPK1, and NR3C1 (translated protein is the glucocorticoid receptor, a target of prednisone and dexamethasone) were assessed in TI-specific eQTL analyses, and TLR7, IL2, NR0B2, PLAUR, IFNGR1, FLT3 (the translated protein is a target of sunitinib) FOXA1, HNF4A, HNF1A, and NR3C1 were assessed in glomerularspecific multi-SNP eQTL analyses. Analyses of eQTLs for these genes in the TI and glomeruli only identified STAT1 with a gene level FDR<0.05 (Supplemental Table 9. As variants associated with STAT1 expression were non-coding, expression of STAT1 was investigated for associations with outcome across the NEPTUNE cohort. Samples were binned in tertiles according to STAT1 expression. In an unadjusted model, elevated STAT1 expression was significantly associated with faster progression to a composite eGFR endpoint (time to ESRD or 40% loss of eGFR) (log rank p-value<0.01) ( Figure 6).
Because expression of a single gene is not necessarily indicative of network or pathway activity, a STAT1 score was generated, given that a similar approach yielded a JAK-STAT activation score that was associated with increased STAT1 and STAT3 phosphorylation in FSGS 13 . Using a data driven approach, a kidney-specific STAT1 network comprised of 17 genes was identified 14 , and further refined for genes functionally up-regulated by IFNγ stimulation by utilizing the Database of IFN Regulated Genes, Interferome 15 . Transcripts were considered validated if they showed at least 2-fold increase by IFNγ stimulation in at least 5 datasets. Next, the genes were assessed for the presence of STAT1 or STAT family binding sites, bringing the STAT1 activated gene set to 14 genes (detailed in Supplemental Table 10). The STAT1 activation score was strongly correlated with STAT1 expression in the discovery (r=0.95, p<0.001, n=68) and validation cohorts (r=0.89, p<0.001) (data not shown).
IP-10 (CXCL10) has been used as a target engagement and renal inflammation biomarker in a clinical trial assessing the effectiveness of JAK inhibition in patients with diabetic kidney disease treated with baricitinib 16 , but was not used to generate the STAT1 activation score. The STAT1 activation score strongly correlated with CXCL10 mRNA levels in both the discovery (Figure 7a, r=0.81, p<0.001, n=68) and validation (Figure 7b, r=0.58,   p<0.001, n=166) AAV and NS datasets, as it did in DKD in confirmation of the clinical trial observation 17,18 (Supplemental Figure 3). Because STAT1 activation can be considered a reflection of the intra-renal inflammatory state and has been reported to drive tissue fibrosis 19 , the interaction of the STAT1 activation score with interstitial fibrosis was assessed. In NEPTUNE NS patients, the STAT1 activation score was correlated with interstitial fibrosis (Figure 7c, r=0.41, p<0.001, n=95). Similar results were found associating STAT1 activation score with tubular atrophy (Supplemental Figure 4). Intrarenal epidermal growth factor (EGF) mRNA levels and urinary EGF to creatinine ratios (uEGF) have been identified as markers of tubular differentiation and regeneration. Indeed, renal tissue EGF mRNA and urinary EGF protein have been reported to negatively correlate with interstitial fibrosis and tubular atrophy, and can predict progression of CKD 17 . The

Discussion
Parenchymal organs require high degrees of differentiation to perform their specialized functions. This differentiation, however, can limit cellular programs that are activated in response to a diverse set of injury mechanisms. Tissues require, at least in part, dedifferentiation to carry out cellular damage responses [20][21][22] .
Across many epithelial organs a uniform response consisting of loss of epithelial function, innate immune response activation, and activation of chronic fibrotic responses can be observed [23][24][25][26] . Through the evaluation of shared tissue injury mechanisms across diseases, pathways associated with progressive loss of organ function may be used to identify disease-modifying pathways and networks as therapeutic targets capable of impacting patient care in multiple rare diseases. In the setting of the NIH Rare Clinical Disease Research Network this hypothesis was tested by mapping the transcriptional profiles of different diseases that cause kidney damage.
Consistent with previous findings of shared kidney-specific transcriptional mechanisms among patients with chronic kidney disease from various causes, in particular a strong transcriptional overlap in found in patients with diabetic nephropathy, lupus nephritis, and IgA nephropathy 5 , an analysis of samples from patients with diverse causes of primary NS or AAV yielded a large number of shared transcripts and mechanisms that were shared across conventional histological disease classifications. The majority of transcripts differentially expressed in NS (90%) were also found to be differentially expressed in AAV, were directionally consistent, and were validated in an independent set of patient samples. Of note, multiple genes in the classical and alternative complement pathway were among the shared altered transcripts in AAV and NS, in both the TI and glomeruli. Given the role that the alternative complement pathway plays in the kidney of patients with AAV 6 , Phase II trials suggesting the efficacy of alternate complement activation in patients with AAV through inhibition of C5a 27 , and the heterogeneous C3 mRNA expression upstream of the alternative pathway, expanded targeting of complement activation in patients with other kidney diseases may be warranted.
Using the validated gene set for further analyses, a convergence of molecular mechanisms was identified including activation of inflammatory pathways and suppression or inhibition of differentiation related pathways. For the inflammatory-related pathways, evidence for their potential causal role in autoimmune or renal disease has been established, including TNF 28-30 , JAK-STAT 16 , NFκB 31 , proteasome signaling 32,33 , and reactivation of glucocorticoid signaling through the use of glucocorticoid agonists 34 . JAK-STAT pathway activation has already been implicated across other kidney diseases including diabetic kidney disease (DKD) [35][36][37][38] , and FSGS 13 . In addition to the inflammatory-related pathways, reduced HNF4 transcriptional activity could represent a state of de-differentiation as Hnf4a can induce mesenchymal to epithelial transition 39 . Hnf4a coexpressed with Emx2, Hnf1b, and Pax8 has been shown to induce tubular epithelial cell differentiation 40 .
Furthermore, HNF4A was identified as a hub protein in a protein-protein interaction network of CKDGen genes with eGFR-associated expression, suggesting that reactivation of the HNF4A transcriptional program could have broad implications across CKD 5 .
Expertly curated causal inference associations, pathway enrichment, and experimentally-derived computational approaches to identify perturbations associated with a shared transcriptional profile in NS and AAV all converge on transcriptional mechanisms related to inflammatory and immune signaling, JAK-STAT, complement activation, and HNF4A. An assessment of converagent mechanism in the compartment specific eQTL landscape in kidney disease was performed across the NEPTUNE cohort. This analysis identified clusters of non-coding snps that were associated with STAT1 expression in the tubulointerstitium, and elevated STAT1 expression was associated with faster progression to a composite eGFR endpoint (40% loss of eGFR or ESRD).
Because transcription factor activation is driven by its ability to activate downstream transcription and not simply expression, we generated a STAT1 activation score. This approach was previously used to characterize JAK-STAT pathway activation and correlated with activated (phos-STAT) activation in FSGS 13 in the NEPTUNE cohort. The STAT1 score correlated across diseases on an individual patient level with CXCL10 (IP-10) mRNA, and urinary EGF. IP-10 has been used as a pharmacodynamic biomarker in a Phase II trial suggesting the efficacy of baricitinib, an inhibitor of JAK1 and JAK2, in diabetic kidney disease 35 . Given the successful phase II trial of baricitinib in DKD, the similarity of the STAT1 score in AAV relative to DKD, and the superiority of baricitinib compared to TNF inhibition in a distinct systemic autoimmune disease, rheumatoid arthritis 41 , argues for clinical evaluation of JAK inhibitors in patients with AAV. In addition, the similar level of signaling identified in NS, along with activation being associated with a progression biomarker, lends support to developing baricitinib for treatment of NS as well. Using a biomarker-driven approach in a precision medicine strategy, patients with active JAK-STAT signaling regardless of histological diagnosis could potentially benefit from targeted inhibition of the pathway using a basket clinical trial design 42 . The FDA recently approved the first treatment for cancer that applies a common biomarker to guide decisions about therapy, cutting across solid tumors as opposed to approval based on histological assessment of tumor origin. 43 This work establishes the landscape of underlying molecular mechanisms shared between nephrotic syndrome and ANCA-associated vasculitis. In a more general sense, identifying common disease mechanisms shared across rare diseases is an approach that may expand the currently limited treatment options. This may help to overcome limits in the molecular understanding of rare diseases, identify novel and known targetable mechanisms shared across diseases, and use the knowledge towards therapeutic repurposing for available drugs, ultimately improving treatment options for patients.

Patient cohorts
For the discovery cohort, samples from adult patients with AAV (n=23) and NS (n=62) were used from the European Renal cDNA Bank (ERCB). ERCB is a European multicenter study that collected renal biopsy tissue for gene expression analysis in RNA fixative (RNAlater, Qiagen), coupled with cross-sectional clinical information collected at the time of a clinically indicated renal biopsy 44 . For the validation cohort, samples from adult patients with AAV (n=57) were obtained from the ERCB, and samples and clinical data from adult patients with NS (n=116) were obtained from the Nephrotic Syndrome Study Network (NEPTUNE). NEPTUNE is part of the National Institutes of Health Rare Disease Clinical Research Network (NIH RDCRN), and is a multicenter prospective cohort study that, at the time of this study, had enrolled patients with proteinuric glomerular disease and performed comprehensive clinical and molecular phenotyping at 23 sites. Patients with secondary glomerular disease (such as diabetic kidney disease, lupus nephritis, and amyloidosis) were excluded. The overall study design has been previously published 45 . Comparable healthy renal tissue was obtained from living transplant donors (LD, n=29) and included in the study. Biopsy material from samples was microdissected into glomerular and tubulointerstitial compartments prior to transcriptional profiling as previously described 46 .

Transcriptional profiling
Kidney tissue was processed as previously described 46 . Briefly, renal biopsy tissue was stored in RNAlater® (ThermoFisher, Waltham, MA), and manually microdissected into tubulointerstitial and glomerular compartments. Microdissected renal biopsy specimens were processed and analyzed using Affymetrix GeneChip Human Genome U133A 2.0, U133 Plus 2.0 and Human Gene ST 2.1 Array platforms. Probe sets were annotated to Entrez Gene IDs using custom CDF version 19 generated from the university of Michigan Brain Array group 47 . For the analysis of samples profiled on the U33A 2.0 and U133 Plus 2.0 platforms, we restricted the analysis to the non-Affx probe sets common to both platforms (12,074). Expression data was quantile normalized and batch corrected using COMBAT 48 . Technical replicates of human reference RNA profiles (Stratagene, La Jolla, CA) were included with each batch and used to assess batch correction efficiency.
Differential gene expression across the transcriptome was compared between patients with NS and AAV versus living donors using the SAM method 49 in MEV (version 4.9); genes were defined as differentially expressed with q-value<0.05 and fold-change thresholds were applied as defined in the manuscript.
Hierarchical clustering analysis (Ward's method) was performed in Spotfire using Z-score normalized transcript expression. CEL files and processed data used in these analyses are accessible in GEO 50 under reference numbers: GSE104948, GSE104954, GSE108109, and GSE108112.

Functional enrichment and network analysis
Functional networks were interrogated for common disease mechanisms and upstream regulators, shared between both diseases using: QIAGEN's Ingenuity Pathways Analysis (IPA) (QIAGEN, Redwood City, CA, https://www.qiagenbioinformatics.com/products/ingenuitypathway-analysis). Automated assessment of literature curated cause and effect relationships was used to identify potential regulators of the shared transcriptional profile 9 . The approach derives a prediction of activation or inhibition by identifying potential network changes that can explain the observed gene expression profile (activated), or reverse the observed expression profile (inhibited). Genomatix Genome Analyzer (GGA) 51

Development of STAT1 activation gene networks and activation scores
The STAT1 activation network was generated by first developing a kidney specific STAT1 interaction network using GIANT 14 (minimum relationship confidence of 0.70, maximal network size of 20 genes), which resulted in a 17 gene interaction network. To further refine the network, genes were assessed for functional activation in response to IFN-gamma stimulation using the Interferome database 53 , and for the presence of STAT1 and STAT family binding sites in promoter regions, resulting in a 14 gene set representing STAT1 activation. The STAT1 activation score was generated in patient samples by transforming log2 expression profiles into Zscores, and averaging Z-scores of 14-STAT1 dependent genes to generate a STAT1 activation score for each patient sample.

GFR estimates
GFR was estimated by the four-variable Modification of Diet in Renal Disease (MDRD) study equation 54 for each patient in ERCB and NEPTUNE.

Measurement of urinary EGF
uEGF concentration was measured in spot urine samples using the Human EGF Immunoassay Quantikine ELISA (R&D Systems) as previously described 17.

Assessment of interstitial fibrosis and tubular atrophy
Histopathology was assessed in the NEPTUNE biopsy cohort with digital images obtained from paraffinembedded kidney tissue stained with hematoxylin and eosin, PAS, silver-based, and trichrome stains 55 . Whole slide images of glass slides from cases stored in the NEPTUNE digital pathology repository were assessed for percentage of cortex involved by IF/TA by six pathologists. The percentage of cortex involved by IF/TA was determined in each individual stain and averaged for an overall % value 56 . Cases with paired tubulointerstitial gene expression were assessed.

Statistical analyses
Correlation analyses were performed using Pearson's correlation within Spotfire (TIBCO). Analysis of variance (ANOVA) was used to compare PC1, PC2, and PC3 eigenvalues across FSGS, MCD, and MN in the ERCB cohort, and to assess eGFR differences across patient clusters within Spotfire (TIBCO). A two-tailed unpaired t-test (unequal variance) was used to compare eGFR differences between clusters 1 and 4.

Author contributions
SE planned and analyzed human transcriptome data, database investigation, and correlation with clinical parameters, supervised JH, managed the project, critically reviewed data, and wrote the manuscript. VN        Table 3. Summary level information for shared gene expression profiles in the glomeruli and tubulointerstitium from the discovery and validation cohorts. In the discovery cohort, differentially expressed (DE) genes were defined as those with │ FC│>1.3 and q-value<0.05.