The Nβ motif of NaTrxh directs secretion as an endoplasmic reticulum transit peptide and variations might result in different cellular targeting

Soluble secretory proteins with a signal peptide reach the extracellular space through the endoplasmic reticulum-Golgi conventional pathway. During translation, the signal peptide is recognised by the secretory recognition particle and results in a co-translational translocation to the endoplasmic reticulum to continue the secretory pathway. However, soluble secretory proteins lacking a signal peptide are also abundant, and several unconventional (endoplasmic reticulum/Golgi independent) pathways have been proposed and some demonstrated. This work describes new features of the secretion signal called Nβ, originally identified in NaTrxh, a plant extracellular thioredoxin, that does not possess an orthodox signal peptide. We provide evidence that other proteins, including thioredoxins type h, with similar sequences are also signal peptide-lacking secretory proteins. To be a secretion signal, positions 5, 8 and 9 must contain neutral residues –a negative residue in position 9 in animal proteins– to maintain the Nβ motif negatively charged and a hydrophilic profile. Moreover, our results suggest that the NaTrxh translocation to the endoplasmic reticulum occurs as a post-translational event. Finally, the Nβ motif sequence at the N- or C-terminus could be a feature that may help to predict protein localisation, mainly in plant and animal proteins.

The small size of the N motif itself, just eleven amino acid residues long, may lead 149 to random identification of similar but functionally-unrelated sequences that would not 150 represent any relationship to a secretory signal as hypothesised. However, 191 of the animal 151 proteins detected are associated with membrane traffic, from which 98 are involved in 152 endocytosis and exocytosis; among the plant proteins found, four are identified as chaperones 153 belonging to the Nicotiana genus and annotated as Trxh-S2, where NaTrxh is grouped (S1 154 Table) [13]. The chance that these results were due to random factors is diminished because 155 most proteins containing the N motif of NaTrxh -or a similar one-are related to membrane-156 traffic mobility in cellular processes, suggesting an N presence-function relationship.

157
158 Amino acid positions in the N motif required for secretion in 159 plant cells 160 The consensus sequence of the N motif comprising all sequences found in the 161 different eukaryotic proteins ( Fig 1C) showed that positions 1 and 3 are the least conserved 162 and include two hydrophobic amino acid residues (Figs 1A and 1C). To determine if these 163 amino acids were functionally relevant and to define the minimum size of the functional N 164 motif, we generated two types of protein variants (Fig 2A). The first one used the NaTrxh-165 GFP fusion protein, to which deletions of different sizes at its N-terminus were made (single 166 residues at either 3 or 6 positions from the N motif): NaTrxhN(+3)-GFP and 167 NaTrxhN(+6)-GFP; the second type was generated by deleting 3 or 6 amino acid residues 168 from start of the N-GFP sequence: N(-3)-GFP and N(-6)-GFP. All these protein variants 169 were transiently expressed in onion epidermal cells. When the N deletions were transiently expressed in onion epidermal cells, we found 183 that the first three positions are not required for the N motif to direct protein secretion 184 because while both NaTrxhN(+3)-GFP and N(-3)-GFP were localised in the 185 extracellular space, NaTrxhN(+6)-GFP and N(-6)-GFP localised to the cytoplasm (Fig   186 2B). NaTrxh-GFP and N-GFP, used as controls, both localised to the extracellular space 187 ( Fig 2B) as previously reported by [15]. These results confirm that the N motif-guided 188 secretion is independent of the core of NaTrxh, as previously reported [15].

189
As indicated above, the consensus sequence of the N motif shown in Fig 1C is 190 formed from all the eukaryotic sequences found in the set of predicted secretory and 191 cytoplasmic proteins obtained from the BLASTP search (S1 Table). To

218
To assess which amino acid residues might be essential for the N motif to work as 219 an SP-like sequence, we compared the two distinctive consensus sequences generated by 220 contrasting localization-category proteins (featured as logos; Fig 3B): (1) generated from A 221 category proteins; and (2) from C category proteins. Comparing these two consensus 222 sequences revealed that positions 5, 8 and 9 vary from one group to another. While the A 223 category proteins contain a serine residue in positions 5 and 9, some C category ones contain 224 Gly-5 and Ala-9. Thus, to test whether these changes are relevant for the N motif to direct 225 secretion, we generated the N(S5G) and the N(S9A) variants -both replacing an A 226 category residue with a C category residue-using the N-GFP sequence as a template, which 227 contains the NaTrxh N sequence (Fig 1A). The results from transient expression in onion 228 epidermal cells showed that while N(S5G)-GFP was secreted, N(S9A)-GFP was localised 229 within the cytoplasm ( Fig 3C). This provides evidence that, while variations in position 5 of 230 the N motif appear not to be relevant, position 9 requires a serine residue (or a neutral one) 231 to maintain its role as a secretion signal. Therefore, the motif cannot be predicted to act as a 232 secretion signal sequence if an Ala-9 -or probably any hydrophobic residue at this position-233 is present.

234
Regarding position 8, while most of the category A proteins contain a glutamate 235 residue -NaTrxh being an exception since it contains a serine, which is considered in the 236 consensus sequence of Fig 3B-, the C category proteins mainly contain an aspartate residue 237 here and only a few have glutamate at this position ( Fig 3B). Therefore, we generated the 238 following variants fused to GFP: (1) N(S8D) to mimic the sequence found in proteins within 239 C category; (2) N(S8E), which is the predominant form in the A category and could be 240 expected to work as a signal peptide-like sequence; and (3) N(S8A) to assess the relevance 241 of a negatively charged residue at this position.

242
Results (Fig 3C) indicated that the protein variant N(S8A)-GFP is localised at the 243 extracellular space (Fig 3C), indicating that the change from a neutral to a hydrophobic 244 residue did not affect the role of the N motif as a secretion signal at this position. However, 245 when Ser-8 was replaced by a negative residue as found in the Asp and Glu residues, both 246 N(S8D)-and N(S8E)-GFP were localised within the cell ( Fig 3C). These data provide 247 different possible scenarios. First, Ser-8 appears essential for the N motif to direct protein 248 secretion. However, the amino acid predominantly found at this position is Glu in secretory 249 proteins (A category). Notably, as Glu-8 was found in animal proteins, a glutamate at this 250 position is suggested to be crucial for the N motif to work as a secretion signal within this 251 taxon. This would also explain why Asp-8 was found in cytoplasmic proteins (C category). 257 notably, they all possess the Ser-8 residue (Fig 4), reinforcing the hypothesis that the D 258 category -and probably also B-might be overrated, and A category underrated.

303
From the BLASTP results (see above), we identified four Trxh-S2 from Nicotiana 304 (S1 Table), namely: NaTrxh (from N. alata), NtTrxh2L (N. tabacum), NsTrxh2 (N. 305 sylvestris) and NtoTrxh2 (N. tomentosiformis). All four of them possess identical N 306 sequences (Fig 4). To include more Trxh-S2 sequences in the analysis, a second BLASTP search was 308 performed using the whole NaTrxh sequence, from which the AtTrxh2 sequence was 309 retrieved (Fig 4), which corresponds to the mitochondrial Trxh-2 from A. thaliana. This 310 sequence was used for an additional BLASTP. Only Trxh-S2 were considered for a multiple 311 alignment analysis using the E. coli Trx1 (EcTrx1) as an outgroup (Fig 4). We also included 312 the mitochondrial PtTrxh2, as well as AtTrxh9 and AtTrxh8 because they are known to be 313 membrane-associated proteins [35].

327
The N(S8E)-GFP variant is accumulated in the cytosol when expressed in onion 328 epidermal cells (Fig 3C). A group of sequences was found to contain Glu at this position and, 329 except for one, they have Ala-5 instead of Ser-5; these all were grouped with the 330 mitochondrial localised PtTrxh2 (Fig 4, purple cluster), raising the possibility that these 331 differences result in formation of a transit peptide to direct the proteins to the mitochondria.

332
Our data suggest that this short sequence might be an evolutionary source for different The N motif of secretory proteins is predominantly located 341 towards the N-terminus and a structural trait appears to make it 342 function as a secretion signal 343 The hallmark feature of Trxh-S2 proteins is that they possess N-terminal extensions 344 whose sequences are quite variable [14]. The N motif is located between positions 17 and 345 27 in NaTrxh (Fig 1A), and when it is fused to the GFP N-terminus (N-GFP fusion protein), 346 it leads to GFP secretion from plant cells [15]. These data suggest that the N motif must be 347 located towards the N-terminal to exert an SP-like role, similar to a hydrophobic typical SP.
348 Therefore, we classified the output sequences considered in Fig 1B -and S1 Table-

373
While all the A category proteins shown in Fig 3A contain the N motif in P1, none 374 from the category C proteins did. Instead, for category C proteins, the N sequences were 375 distributed in P2, P3 and P4 (Fig 5C). These data together led us to hypothesise that the N 376 must be located towards the N-terminus of the protein to function as an SP-like sequence.

377
To test this hypothesis, we generated two different constructs ( Fig 6A): (1)  When the NaTrxhN-N-GFP chimeric protein was transiently expressed in onion 394 epidermal cells, GFP fluorescence, as predicted, was observed in the cytoplasm (Fig 6B), 395 indicating that the N motif was not identified as a secretion signal within the middle core 396 of the protein, which corresponds to P2 and P3 positions of our previous analysis (Fig 5).
397 Unexpectedly, when the N motif was localised at the C-terminus (GFP-N construct), the 398 GFP signal was detected in the apoplast of the onion epidermal cells (Fig 6B), indicating that 399 the N motif is able to direct secretion whether localised at the N-or the C-terminus. It is 400 noteworthy that, although none of the proteins that contain the N motif towards its C-401 terminus (P4) are annotated as secretory proteins (A category; Fig 5C), in none of the cases 402 is it present at the very C-terminus; the closest N sequence to the C-terminus end of the 403 cytoplasmic proteins (C category) is located 122 residues far from it (NP_694967.3 accession 404 code, which is 648 amino acids long; S1 Table).

405
In the case of the GFP-N protein, the N motif is not only at the very C-terminal 406 end of the protein, but it is also likely to be a mobile element that would be free and exposed 407 to the solvent in a similar manner to the N-terminal extension of NaTrxh [36]. As long as the 408 N motif is free and solvent-exposed, no matter if it localises at the N-or C-terminus, factors 409 -yet to identify-interact with it, directing translocation of GFP into the ER. This latter 410 assumption is based on the tertiary structure of GFP (PDB 6YLQ), in which both N-and C-411 termini are outside the -barrel and are oriented towards the same direction (Fig 6C). From 414 residues from its C-terminal, respectively) were included in the B category (S1 Table).
415 According to the models predicted by AlphaFold in UniProtKB, the N motif might be 416 mobile and solvent exposed in both cases, suggesting that these proteins might be secreted.
417 As expected, PLN81894.1 is associated with PM and transport activity (S1 Table). These 418 data reinforce the hypothesis that the B category is possibly overrated.

419
To assess the relevance of the N motif itself, its charge and/or the free and solvent-420 exposed structure it needs to function as a signal peptide, an inverted sequence version of the 421 N was fused to the GFP N-terminus (Ninv-GFP; Fig 6A). When the Ninv-GFP fusion 422 protein was transiently expressed in onion epidermal cells it localised to the extracellular 423 space (Fig 6B) as the N-GFP, GFP-N and NaTrxh-GFP do. This outcome indicates that 424 the N motif's overall charge -negative at physiological pH-is essential to lead protein 425 secretion rather than the amino acid position. The negative charge is mainly provided by Glu-426 4 and Glu-10, both present in A and C category N sequences (Fig 3B). Still, position 9 must 427 contain a serine residue, as shown in Fig 3C, or at least a neutral polar residue to maintain 428 the hydrophilic profile of the N sequence. 439 Nevertheless, we discard this scenario for N-directed secretion because N is 440 predominantly hydrophilic.

441
The fact that the GFP-N protein was found in the extracellular space (Fig 6B) 442 discards the possibility of GFP-N translation coupled to the ER translocation by SRP.
512 to the corresponding organelle. A good example of this latter group are nuclear proteins, such 513 as transcription factors, that remain in a latent form in the cytosol and only in response to 514 certain stimuli are they re-located to the nucleus. An example of this is the NF-B complex, 515 which contains a nuclear transit peptide but remains in the cytosol because its inhibitor, I-, 516 is tightly bound and obscures the transit peptide [37,38]. During an immune or inflammatory 517 response, for instance, I-B is phosphorylated and degraded, releasing NF-B and exposing 518 the nuclear localisation sequence, resulting in its translocation towards the nucleus, where it 519 will activate transcription of specific target genes [37,38]. A similar scenario could occur 520 with NaTrxh or any N-containing protein with the features described here, but translocated 521 towards the ER rather than nucleus. This would represent an efficient mechanism to regulate 522 a portion of soluble protein secretion, which would occur when the N is exposed.

523
NaTrxh is directly involved in Nicotiana's gametophytic SI system, known as S-

546
If not completely hidden, the NaTrxh N-terminal extension might become a stable 547 and ordered element [36]. This evidence also explains why proteins, including the 548 NaTrxhN-N-GFP chimera, that contain the N in an inner position, although 549 potentially exposed to the solvent, are not secreted, because this motif might form stable 550 secondary structural elements. Additionally, in the case of the GFP-N protein, the GFP C-551 terminus is likely to be accessible, away from the GFP -barrel. The N motif would be 552 mobile, disordered and fully exposed to the solvent, which might be the reason it works as 553 an ER translocation and further secretion signal in the GFP-N fusion protein (Figs 6 and 7).

554
Trxs are widely distributed proteins, from prokaryotes to eukaryotes, that reduce

572
Dissecting both NaTrxh extensions has provided useful information on how they are 573 involved in NaTrxh cellular localisation and its specificity towards its identified target 574 protein in N. alata styles -the S-RNase-. Both E. coli Trx and NaTrxh reduce insulin 575 disulphide bonds as expected from any Trx [13,46], but E. coli Trx is not able to reduce S-576 RNase [16]; the major difference between these two Trxs is the occurrence of the N-and C-577 terminal sequences in NaTrxh. In addition, according to the structural model of the NaTrxh-578 S-RNase protein complex, the NaTrxh N-terminus assists the correct orientation of the 579 interaction to reduce only one of the four disulphide bonds that the S-RNases typically 580 contain [36]. The differences detected in the N sequences of the Trxh-S2 (Fig 4), apart from 581 those in positions 5, 8 and 9, might contribute to different specificities regarding their 582 respective target proteins. However, there is also the N motif, which in the case of NaTrxh, 583 contributes to the interaction with S-RNase and its reduction [15,16]. This motif, with some 584 variation, is also present in the other analysed Trxh-S2 (Fig 4) and could contribute to their 585 target specificity too.

586
Finally, finding other leader-less proteins containing a similar N motif towards the 587 N-terminus -like NaTrxh-with annotated functions related to protein traffic, membrane 588 mobility and secretion (some experimentally confirmed) raises the possibility that their 589 cellular localisation is due to this short sequence. Therefore, the N motif sequence might 590 help to predict the protein localisation of those proteins that contain it.

591
The cellular localisation of some proteins containing a similar N motif is not yet 592 known (B and D categories of our analysis; Figs 3 and 5). Some, particularly the Trxh-S2 593 proteins, might move from the B or D category to the A category, as was the case of NaTrxh 594 itself. Notably, the N motif of NaTrxh and other plant proteins containing this motif, must 595 contain a serine at position 8. However, a glutamate residue was usually found at this position 596 in animal-secreted proteins (Fig 3). This opens the possibility of distinctive functional 597 secretion motifs between plant and animal proteins that are worthy of being investigated. 609 Sequence logos were constructed using the WebLogo server