Discovery of NRG1-VII: a novel myeloid-derived class of NRG1 isoforms

18 19 Macrophages are a primary source of the growth factor Neuregulin-1 (NRG1), which has pleiotropic roles 20 in proliferation and differentiation of the stem cell niche in different tissues, and has been implicated in gut, 21 brain and muscle development and repair. Six isoform classes of NRG1 and over 28 protein isoforms have 22 been previously described. Here we report a new class of NRG1, designated NRG1-VII to denote the start 23 of this isoform class from a myeloid-specific transcriptional start site (TSS). Long-read sequencing identified 24 up to nine different NRG1-VII transcripts that show major structural differences from one another due to 25 use of cassette exons and alternative stop codons. Expression of NRG1-VII was confirmed in human 26 monocytes and tissue resident macrophages. Isoform switching via cassette exon usage and alternate 27 polyadenylation was apparent during monocyte maturation and macrophage differentiation. NRG1-VII


41
Neuregulins (NRGs) are a family of highly pleiotropic growth factors derived from four paralogous genes 42 (NRG1-4) ( Figure 1A). NRGs are typically synthesized as transmembrane pro-peptides cleaved by 43 metalloproteases in the extracellular space to form a bioactive peptide with an exposed epidermal growth 44 factor-like (EGF) domain that can bind ERBB receptors. The human Neuregulin-1 (NRG1) locus, on 45 Chromosome 8p12, generates a large number of isoforms ( Figure 1A, B) which are thought to be tissue-46 specific, and functionally diverse (Falls, 2003). NRG1 has been implicated in the development of multiple tissues by promoting cell division within the stem cell niche and in differentiation trajectories (Yu et al., 2021, 48 Wagner et al., 2007) of progenitor cells in the gut (Jardé et al., 2020), skeletal muscle (Gumà et al., 2010,

98
Schematic annotating exons in the human NRG1 genomic locus (Chr 8p12) with modular protein coding (Jardé et al., 2020, Garrido-Trigo et al., 2022. However, the specific NRG1 isoforms expressed by 137 macrophages in these contexts has not yet been described.

139
Understanding the cellular origins of different NRG1 isoforms is an important part of understanding how 140 NRG1 directs these different developmental, reparative and inflammatory outcomes. Here, data mining to 141 assess the expression profiles of NRG1 led to the identification of a previously uncharacterized TSS that 142 appears to be used exclusively in cells of the myeloid lineage. We propose that transcripts generated from 143 this alternative TSS belong to a new NRG1 Class, NRG1-VII. Using Oxford Nanopore sequencing, we 144 identified nine class VII isoforms with distinct transcript structures and predicted the protein characteristics 145 of these isoforms. qRT-PCR targeting the unique first exon of NRG1-VII transcripts in human cells 146 confirmed that type VII isoforms are expressed by monocytes and tissue resident macrophages.

147
Immunohistochemistry using antibodies directed towards the EGF-like or intracellular domains (ICD) 148 demonstrated that tissue-resident macrophages are the major source of NRG1 in these tissues, an 149 observation supported by transcriptional evidence derived from single cell data collated within the Human 150 Protein Atlas. This study therefore contributes to untangling the complexity of this already intricate locus by 151 characterizing the structure and distribution of myeloid specific NRG1 isoforms.

154
NRG1-VII is defined by a novel TSS discovered in myeloid cells.

156
Since the different isoforms of NRG1 have different structures and binding affinities for their receptors, we 157 sought to investigate which of these were expressed by myeloid cells. First, we used the FANTOM5 158 (Functional Annotation of the Mammalian Genome) database (The FANTOM Consortium; 2014), a 159 catalogue that maps the TSSs of genes expressed in 975 human samples (including cells, tissues, and cell 160 lines). Here we found a previously uncharacterized TSS of NRG1 that was exclusively active in myeloid 161 cells including monocytes, macrophages, and basophils ( Figure 2A). A schematic of the locus suggested 162 that this represents a potential new class of NRG1 transcripts which we prospectively named NRG1-VII 163 ( Figure 2). The FANTOM TSS predicted that the NRG1-VII starting exon contained 147 bp of mRNA 164 sequence unique to transcripts originating from this site, and an in-frame methionine was identified 162 bp 165 downstream from the start of transcription ( Figure 2B, C). Therefore, we aimed to characterize whether this 166 newly described TSS would lead to the expression of a new class of protein coding transcripts.

168
The TSS for NRG1-VII initiates in the intron positioned 5' of ENSEMBL exon ENSE00003743434 which 169 encodes for part of the spacer sequence between the Ig and EGF-like domains in the canonical NRG1 170 protein ( Figure 2B). The TSS adds a 5' untranslated region (UTR) to the transcript that extends the exon from 51 bp to 190 bp (ENSE00002124728) and includes an initiating Methionine at nucleotide position 154-in frame with the canonical NRG1 coding regions and that are also present in other isoforms (I, II, IV and 175 V), thus NRG1-VII is the only reported NRG1 protein class that would not present a class-specific N-terminal 176 amino acid tail.

178
We mapped the transcriptional activity arising from this TSS and identified two Expressed Sequence Tags 179 (ESTs) that had been sequenced from the 5' end (NCBI accessions BI908144 and BI907799), and 180 originated from the NRG1-VII TSS. These originated from the same library (SAMN00164230) made from a 181 pool of non-activated human leukocytes from anonymous donors. Both ESTs harbored evidence of an open 182 reading frame (ORF) in frame with other NRG1 isoforms, but both were 3' truncated. To further validate the 183 potential activity of this TSS we looked for sequence evidence of conservation in other mammal species.

184
Data available from 15 non-Human primate species (Pipes et al., 2013), and cross species alignment shows 185 that the NRG1-VII TSS is conserved and highly specific to bone marrow and whole blood, while isoforms 186 isolated from other tissues use alternate TSSs ( Figure S1A). We also found transcriptional evidence of an

209
To characterize the diversity of NRG1 transcripts that use novel NRG1-VII TSS, we performed Oxford designed a forward primer that targets the unique 5' UTR of NRG1-VII transcripts. Two reverse primers 212 were designed to target two of the known alternative transcriptional stops in which we had previously 213 observed transcriptional activity in myeloid cells. We ran this test separately in in vivo and in vitro 214 progenitors and differentiated cells. In total, we identified nine novel high confidence transcripts. genome between TSS-3 and 6 of the NRG1 locus. This exon introduces an early stop codon in frame; 217 hence we classified it as a poison exon. We also report the presence of isoforms that contain both α and β 218 EGF-like exons. This alternative splicing possibility had been previous captured in transcript NM_004495, 219 but as this was a truncated transcript its ORF and TSS had not been defined.

221
We predict that five of these transcripts would be protein coding ( Figure 3B). From these five proteins, three 222 are predicted to include a transmembrane domain (NRG1-VII α2a, α2b and β2a) and undergo canonical 223 processing through metalloproteases in order for the EGF-containing peptide to be released, while the other 224 two lack this domain ( Figure 3C). We observed that these five protein coding transcripts can be found in all 225 our studied myeloid samples, except for monocytes in which we saw a different transcriptional profile and 226 isoforms NRG1-VII α2a and β2a were not detected. The predicted NRG1-VII protein isoforms all share the 227 absence of a specific N-terminal domain, but they all present domains that had been previously observed 228 in other NRG1 isoforms. Exceptionally, the presence of both and α and β domains in NRG1-VII αβ3, causes

246
The yellow β domain represents the peptide sequence that arises from the related exon but is translated to 247 a different peptide sequence due to a frame shift.

249
Macrophages are a major source of NRG1 in human tissue.

250
We next assessed the distribution patterns of NRG1 in single-cell RNAseq experiments in the Human

251
Protein Atlas (Karlsson et al., 2021), which revealed that the NRG1 locus was actively expressed by many 252 different cell types, such as neurons in brain and eye; stromal cells in colon; endothelial cells in heart and 253 liver; and epithelial cells in kidney and lung. In most of the tissues that we examined, NRG1 expression 254 was observed in macrophages ( Figure 4A). However, no isoform-specific data is available to compare 255 differential isoform expression between these tissues. 256 257 NRG1-VII expression in myeloid cells is affected by differentiation and maturation.

258
We sought to confirm if isoform VII is the only or even primary TSS used by myeloid cells. Therefore, using 259 primers that could discriminate between each unique start exon (Table 1)

277
To confirm that NRG1 mRNAs are translated into proteins on myeloid cells, we then performed 278 immunohistochemical stainings of NRG1 on human glioblastoma (GBM) tissue which is enriched in bone 279 marrow-derived macrophages (Klemm et al., 2020). Myeloid cells were identified using a CD68 antibody; 280 antibodies detecting the extracellular (EGF-like domain) and intracellular (ICD) domains of NRG1 were 281 used. NRG1 was detected in both myeloid and non-myeloid cells ( Figure 4D-E). Not all CD68+ cells 282 expressed NRG1, and those cells that did, exhibited different patterns of expression ( Figure S2). Altogether, 283 these results show that myeloid cells synthesize NRG1 peptides in human tissues but are not the sole 284 contributors to the NRG1 pool. They also reveal diverse NRG1 expression patterns in myeloid cells,

285
suggesting that regulation of NRG1 expression is different among myeloid cells.

304
Atlas mRNA dataset, suggested that macrophages are a source of NRG1 in most tissues. However, the 305 specific isoforms involved in the macrophage mediated NRG1 secretion were uncharacterized. Here we 306 report that myeloid cells preferentially use a novel TSS that is myeloid-specific and conserved to generate 307 a previously uncharacterized class of NRG1 isoforms. The tissue variability and functional versatility of 308 NRG1 are a consequence of the alternative promoter usage and exon retention that give rise to a high 309 diversity of isoforms. Description of a new TSS class, NRG1-VII, including at least five new protein-coding 310 isoforms, has expanded the known NRG1 protein coding isoforms from 28 to 33.

312
It had been previously reported that the highest NRG1 expression levels in human tissues was in circulating . We further found that monocytes express at least six NRG1-VII isoforms, including transcripts that 320 contain a novel 'poison' exon (NRG1-VII Pαca and Pαa) predicted to prematurely terminate translation.

321
This exon introduces an early stop codon and hence is likely to drive the transcript to nonsense mediated 322 decay; hence generating a monocyte specific transcriptional mechanism to regulate the levels of NRG1 323 synthesis. Additionally, transcripts containing this exon present unusual splicing on the 3' end, suggesting 324 that there are additional differences and undiscovered mechanisms controlling transcription of these 325 mRNAs. Due to the high levels of expression of this gene in monocytes compared to other cell types, we 326 hypothesize that these represent regulatory mechanisms that regulate NRG1 expression in circulation.
here show that tumor-associated myeloid cells (CD68 + ) may contribute to the NRG1 pool seen in these 333 tumors. Due to the lack of unique domains in type VII isoforms, the antibodies used in this study targeted 334 common NRG1 regions (the EGF-like and intracellular domains). Even though the available antibodies lack 335 specificity to prove that type VII isoforms are translated in this disease, the only isoforms detected in primary 336 monocytes or macrophages in this study belong to classes I and VII.

355
The human NRG1 locus has over 30 curated isoforms, to which this study adds nine new transcripts that 356 are specific to myeloid cells. It is likely that the full transcriptional profile of this locus has not yet been 357 described; for example, while long-read amplicon sequencing is a sensitive isoform recovery method (Clark 358 et al., 2020), it is limited by the primer set(s) used. Thus, only NRG1-VII isoforms utilizing linker 3 or the a 359 tail regions were amplifiable in this study. In addition, transcript NM_004495 presents domains α and β, but 360 is 5' truncated. Hence it is unique but unassigned to a specific NRG1 class as its TSS has not been 361 identified. Our results show an isoform with both exons also present, supporting the existence of splice NRG1 is involved in many different functions, this is attributed to the many isoforms which are tissue-366 restricted during different developmental stages. Only through a thorough investigation of this locus, can 367 we better understand the specifics in each process and how to better develop clinical strategies to prevent 368 or treat the different pathologies associated with this locus. Here we showed that myeloid cells exhibit a 369 unique regulation pattern of the NRG1 locus to generate cell specific isoforms that could play important 370 roles in many diseases. Further detailed investigation on the molecular genetic features and functions of 371 these novel isoforms might uncover how NRG1-VII isoforms elicit differential receptor activity and 372 downstream effects as previously described for other NRG1 isoforms. This ultimately allows for targeted 373 therapies and better understanding of the complexity of an in vivo system and might help recreate the 374 processes in which appropriate signals are essential to model the desired biological mechanisms.

443
After isolation, monocytes could be differentiated into macrophages by culture the same conditions as iPSC

484
RNA was extracted from a pool of monocytes from 3 different blood donors, a pool of myeloid progenitors 485 derived in vitro, macrophages derived from monocytes, and macrophages differentiated from in vitro 486 derived progenitors. The samples were then prepared using specific primers (Table 1) and purified using 487 AMPure beads. 2 ng of purified cDNA from each sample was then barcoded following the EXP-PBC096 488 protocol from Oxford Nanopore Technologies. Samples were pooled (equimolar) and a sequencing library 489 was prepared as described in the SQK-LSK110 Oxford Nanopore protocol. Samples were then loaded on 490 a Flongle flow cell (FLG-001) and sequencing was performed using a GridION device.

501
Formalin-fixed paraffin-embedded (FFPE) GBM tissue sections were stained using the Bond RX automated 502 stainer (Leica Biosystems). Slides were deparaffinized in xylene followed by exposure to a graded series 503 of ethanol solutions for rehydration. Heat-induced epitope retrieval was performed with either a Citrate pH 504 6 buffer or Tris Ethylenediaminietetraacetic acid (EDTA) pH 9 buffer. Slides were blocked with 3% hydrogen 505 peroxide (H2O2) to block endogenous peroxidase activity. For multiplexed IHC staining the Opal 6-plex 506 Detection Kit (Akoya Biosciences) was used. Serial multiplexing was performed by repeating the sequence 507 of antigen retrieval, primary antibody, and Opal polymer incubation, followed by Opal fluorophore 508 visualisation for all antibodies as follows. GBM tissue was stained with CD68 (Abcam ab955, 1:100), NRG1