A unified evolutionary origin for SecY and YidC

Cells use transporters to move protein across membranes, but the most ancient transporters’ origins are unknown. Here, we analyse the protein-conducting channel SecY and deduce a plausible path to its evolution. We find that each of its pseudosymmetric halves consists of a three-helix bundle interrupted by a two-helix hairpin. Unexpectedly, we identify this same motif in the YidC family of membrane protein biogenesis factors, which is similarly ancient as SecY. In YidC, the two-helix hairpin faces the cytosol and facilitates substrate delivery, whereas in SecY it forms the substratebinding transmembrane helices of the lateral gate. We propose that SecY originated as a YidC homolog which formed a channel by juxtaposing two hydrophilic grooves in an antiparallel homodimer. Archaeal and eukaryotic YidC family members have repurposed this interface to heterodimerise with conserved partners. Unification of the two ancient membrane protein biogenesis factors reconstructs a key step in the evolution of cells.


Introduction
By the time of the last universal common ancestor (cenancestor), cells had already evolved a hydrophobic membrane and integral membrane proteins (IMPs) which carried out core metabolic functions (Lombard et al., 2012). Among those IMPs was SecY, a protein-conducting channel (Park and Rapoport, 2012). As is typical for channels, SecY (termed Sec61 in eukaryotes) catalyses the translocation of hydrophilic substrates across the hydrophobic membrane by creating a conducive hydrophilic environment inside the membrane. The substrates which it translocates are secretory proteins and the extracytoplasmic segments of IMPs.
SecY requires that its hydrophilic translocation substrates be connected to a hydrophobic α-helix (von Heijne, 1985;Krogh et al., 2001;Petersen et al., 2011). These hydrophobic helices are essential because they serve as signals which open the SecY channel (Jungnickel and Rapoport, 1995;Li et al., 2016;Voorhees and Hegde, 2016). SecY is comprised of two rigid halves ( Van den Berg et al., 2004), which open like a clamshell when a helix binds to the lipid interface between them (Figure 1a). Spreading apart the clamshell destabilises a plug which sits between the halves, opening a hydrophilic pore that spans the membrane. Binding at this site also threads one of the signal's hydrophilic flanking regions through the hydrophilic pore, thereby initiating its translocation.
The site between SecY's halves where signals bind is called the lateral gate. After binding and initiating translocation, the hydrophobic signal can diffuse away from the lateral gate into the surrounding hydrophobic membrane. Many signals, particularly those at the N-terminus of secretory proteins, are then cleaved off by signal peptidase, a membrane-anchored protease whose active site resides on the extracytoplasmic side of the membrane (Paetzel et al., 2002). Longer and more hydrophobic signals that are not cleaved serve as the transmembrane helices (TMHs) of IMPs (White and von Heijne, 2005). SecY is the only universally conserved transporter for protein secretion. There is however a second universally conserved protein transporter that is specialised for IMP integration: YidC (Hennon et al., 2015). Unlike SecY, YidC is asymmetric (Kumazaki et al., 2014a), and so widely divergent across species that its universality was only recently appreciated (Anghel et al., 2017). Nonetheless a conserved core of three TMHs is evident in all the YidC homologs of known structure: YidC of the bacterial plasma membrane (Kumazaki et al., 2014b(Kumazaki et al., , 2014a, Ylp1 of the archaeal plasma membrane (Borowska et al., 2015), and TMCO1, EMC3, and GET1 of the eukaryotic endoplasmic reticulum (ER) (Anghel et al., 2017). The chloroplast and mitochondrial inner membranes also contain YidC homologs, Alb3 and Oxa1, respectively (Bauer et al., 1994;Bonnefoy et al., 1994;Sundberg et al., 1997). Thus, each membrane that is topologically and functionally equivalent to the plasma membrane of the cenancestor contains at least one YidC homolog.
Like SecY, YidC facilitates the integration of IMPs by translocating their extracytoplasmic segments across the membrane (Hell et al., 1998;Samuelson et al., 2000). Unlike SecY substrates however, YidC substrates are limited in the length of translocated polypeptide, typically to less than 30 amino acids (Shanmugam et al., 2019). This limitation may be due to its lack of a membranespanning hydrophilic pore; instead, structures of YidC show a membrane-exposed hydrophilic groove that only penetrates partway into the membrane (Kumazaki et al., 2014a).
The halves of the SecY clamshell are structurally similar and related by a two-fold rotational (C2) pseudosymmetry axis which bisects the membrane plane (Van den Berg et al. 2004; Figure 1b). Such pseudosymmetry is common among membrane proteins, and is believed to arise when the gene encoding an asymmetric progenitor undergoes duplication and fusion (Forrest, 2015). Channels are particularly likely to have a membrane-bisecting C2 axis of structural symmetry because they have the same axis of functional symmetry: they passively facilitate bi-directional diffusion across the membrane. Although transport through SecY is usually unidirectional, it is no 2 of 29 exception to this symmetry rule; polypeptides can indeed slide through it bidirectionally (Ooi and Weiss, 1992), with unidirectionality arising from other factors (Erlandson et al., 2008;Matlack et al., 1999).
The ubiquity and essentiality of the SecY channel motivated us to investigate how it might have evolved. By focusing on structural elements which were conserved between domains and across species, we identified a core motif of three TMHs which bury a hydrophilic patch inside the membrane. Speculating that this three-TMH motif may resemble an evolutionary ancestor, we compared it to cenancestor membrane proteins and detected a previously unrecognised similarity with YidC.
Drawing on the extensive functional literature about SecY and YidC, we analyse their structural similarities and differences in terms of functional consequences. Based on this analysis, we propose that in a parsimonious model, SecY evolved from a dimeric YidC homologue by gene duplication and fusion. One prediction of this model is that YidC should conserve a tendency to form dimers via the same interface as the SecY progenitor, and indeed we discover novel heterodimers formed via this interface by archaeal and eukaryotic YidC. We discuss the implications of this model for the evolution of YidC itself, and other components of the general secretory pathway.

Conserved ancestral features in the SecY structure
Although the N-and C-domains of SecY have significantly diverged, the architecture of their five TMHs is conserved (Figure 1b; Figure 2a; Van den Berg et al., 2004). The conservation of these five TMHs between the N-and C-domains indicates that the same five TMHs were also present in their last common ancestor. This last pre-duplication ancestor of the SecY domains we term proto-SecY. To facilitate comparisons, we label these five consensus helices H1-H5 (Figure 2a). A prefix of N or C is used when referring to a specific instance of a consensus element in the N-or Cdomain of SecY. For example, TM6 of SecY is labelled C.H1 because it is located in the C-domain and corresponds to H1 of proto-SecY, as does TM1 (N.H1) in the N-domain. Flanking and intervening segments are labelled using lower-case references to the nearest consensus elements. For example, the ribosome-binding loop between C.H1 and C.H2 is C.h1h2. The N-terminal peripheral helix of each domain, which we argue later was probably also conserved in proto-SecY, is named H0.
The nearest non-duplicated ancestor of a channel is rarely detectable (Hennerdal et al., 2010), presumably because it is made redundant by the duplicated, fused form. Indeed, no obvious candidates for proto-SecY are evident in sequence or structure databases. We speculated that an even earlier ancestor not redundant to SecY might persist in a more divergent form that nonetheless retains recognisable similarity with SecY. To facilitate a search for this more distant ancestor, we considered sub-domains within the five-TMH core of proto-SecY that might represent a more widely conserved precursor structure. Our analysis was guided by the fact that the folding pathway of a protein imposes constraints on how it is subsequently elaborated during evolution.
In this context, we noted that H1, which is synthesized and inserted into the membrane first, makes negligible contacts with the tightly packed H2/3 hairpin ( Figure 2b). Instead, H1 packs against H4 and H5, both of which are located between H1 and the H2/3 hairpin. This arrangement, where sequential TMHs are separated, is highly unusual in IMPs (Bowie, 1997), suggesting that it represents a divergence from the ancestral fold. Because H4/5 passes between H1 and H2/3 from 3 of 29 one side to the other, it is unlikely that H1 and H2/3 were separated by gradual adjustments to the helices' tilts or positions, as would seem likely if instead H4/5 penetrated the space between them from a single side. Instead the most conservative explanation for the separation of H1 and H2/3 is that one or both of them were not transmembrane in the ancestral fold.
Omission of the H2/3 hairpin preserves H1/4/5 as a three-TMH bundle (Figure 2c), whereas omission of H1 isolates H5, yielding a much less compact fold. This suggests that an ancestral fold containing the H1/4/5 three-TMH bundle subsequently acquired the H2/3 hairpin, which packed against the bundle's surface. Acquisition of a transmembrane hairpin is highly plausible because it is a common transition in membrane protein evolution. Mutations which increase the hydrophobicity of a structural element tend to promote its membrane insertion. Insertion as a hairpin (generally defined as an antiparallel self-associating motif) is both more physically favourable (Engelman and Steitz, 1981) and less topologically disruptive than insertion as a single TMH, which would by contrast invert the topology of any subsequent TMHs. One of many examples of acquired transmembrane hairpins is provided by SecE, which in some proteobacteria such as E. coli has acquired a transmembrane hairpin in its N-terminal peripheral helix (Cao and Saier 2003;   The N-domain as in a, except with H1 (cyan) and H2/3 (green) coloured and overlaid with a semitransparent representation of their solvent-excluded surfaces. Lateral (top) and axial (bottom) views are shown. c As in b, except with the H1/4/5 three-TMH bundle (orange) recoloured. These considerations led us to posit that although the five-TMH precursor of SecY may have been lost, the three-TMH bundle of H1/4/5 might persist in a protein family that diverged at an earlier point. Notably, a significant portion of the hydrophilic pore of SecY is lined by the three-TMH bundle, hinting that even the putative distant ancestor could have had a transport function.

Identification of YidC as a candidate proto-SecY homolog
Any non-duplicated homolog of SecY that persists today would also have been present in the cenancestor, which was the last common ancestor of the two fundamental phylogenetic domains, Archaea and Bacteria. Attribution to the cenancestor is difficult, but several IMPs that are widely conserved display divergences between phylogenetic domains that allow us to exclude trans-domain gene transfer as an explanation for their conservation. These include SecY, SecE, TatC, signal peptidase, the rotor-stator ATPase, the multiple resistance and pH (Mrp) sodium-proton antiporters, and certain redox factors (Lombard et al., 2012). An additional IMP, YidC, recently joined this group; its divergence across domains is so great that its universality was only gradually uncovered.
When YidC was first studied in mitochondria (as Oxa1/2), sequence similarity alone was sufficient for detection of its homologues in bacteria (SpoIIIJ, YidC;Bauer et al. 1994, 199;Bonnefoy et al. 1994), and also subsequently in chloroplasts (Alb3; Sundberg et al. 1997). Archaeal homologues were not detectable by these methods, but homology candidates were subsequently identified using position-specific scoring matrices (Yen et al., 2001;Zhang et al., 2009). These candidates were, however, only sporadically distributed within a single archaeal phylum, Euryarcheota, and their marginal detectability engendered little consensus between reports. Validation and extension to Crenarchaeota came from genomic neighbourhood analysis, which identified a widely conserved cluster of YidC, SecY, and ribosomal proteins (Makarova et al., 2015). YidC thus displays the twodomain phylogeny and wide conservation required for confident attribution to the cenancestor.
Because of YidC's divergence across phylogenetic domains, its conserved structural features were also only recently identified. Crystal structures of bacterial YidC show at least five TMHs (Kumazaki et al., 2014b(Kumazaki et al., , 2014a, but a crystal structure of an archaeal YidC-like protein (Ylp1 from Methanocaldococcus jannaschii) showed only three TMHs (Borowska et al., 2015). Because the crystal structure of Ylp1 displayed domain swapping between adjacent chains (Figure 3-Figure supplement 1a), the three-TMH fold's physiological relevance was initially questioned (Kuhn and Kiefer, 2017).
The three-TMH fold is now well supported by the identification of similar homologs in the eukaryotic ER (Anghel et al., 2017). Although not detected by less sensitive methods, comparisons of profile hidden Markov models for archaeal YidC and eukaryotic proteins identified three YidC paralogs in the ER: TMCO1, EMC3, and GET1. Recent single-particle electron cryomicroscopy (cryo-EM) studies yielded structural models for all three paralogs (Bai et al., 2020;McDowell et al., 2020;McGilvray et al., 2020;Miller-Vedam et al., 2020;O'Donnell et al., 2020;Pleiner et al., 2020), which together with the prokaryotic crystal structures reveal a conserved core consisting of a three-TMH bundle interrupted after the first TMH by a cytosolic helical hairpin. In the prokaryotic forms, a sixth, N-terminal peripheral helix is also present.
Among the cenancestor IMPs, the hairpin-interrupted three-TMH motif of YidC is strikingly similar to the consensus proto-SecY elements identified above. Each consensus helix from the YidC family can be matched to a consensus helix from proto-SecY, unambiguously and with the same connectivity ( Figure 3, Table 1). This surprising structural similarity identifies the YidC family as a uniquely good candidate for the origin of proto-SecY. The functional similarity between SecY and YidC as mediators of IMP integration further supports this idea. The remainder of this paper articulates the evidence for and implications of this possibility. For convenience, we use YidC to refer to the family as a whole, as we have used SecY to refer to both SecY and its Sec61 homologs, and will specify particular clades only when discussing characteristics not shared by the whole.

The cores of SecY and YidC are structurally and functionally similar
Like SecY, YidC facilitates the diffusion of hydrophilic protein segments across the hydrophobic membrane by burying hydrophilic groups inside the membrane (Figure 4a). SecY buries a hydrophilic funnel on each side of the membrane and thereby forms a continuous hydrophilic pore across it. By contrast, YidC's hydrophilic groove is only open to the cytosol, and only penetrates part-way into the membrane. Biophysical considerations and molecular dynamics simulations suggest that the groove's exposure of hydrophilic groups to the hydrophobic membrane distorts and thins the membrane in its vicinity (Chen et al., 2017).
YidC's hydrophilic groove is similar to those recently observed in components of the retrotranslocation machinery for ER-associated degradation (ERAD; Wu et al., 2020). Here, the membrane proteins Hrd1 and Der1 each display hydrophilic grooves, which are open to the cytosol and ER lumen, respectively. The juxtaposition of these two 'half-channels' forms a nearly continuous hydrophilic pore, interrupted by only a thin membrane through which polypeptide translocation is thought to occur. A YidC-derived proto-SecY can similarly be considered a halfchannel capable of forming a near-complete channel by antiparallel homodimerisation.
As argued above, the most ancient core of proto-SecY is the three-TMH bundle of H1/4/5 which lines the hydrophilic translocation pore. Strikingly, the corresponding H1/4/5 bundle in YidC forms its hydrophilic translocation groove (Figure 4a,b). The three-TMH bundles in both SecY and YidC have a right-handed twist, with H1 and H4 near parallel and H5 packing crossways against them. Of the three helices, H5 makes the closest contacts with the translocating hydrophilic substrate in SecY ( Figure 4b) and in E. coli YidC as determined by chemical crosslinking experiments (He et al., 2020). These crosslinking data indicate that YidC's substrates initiate translocation in a looped configuration, analogous to that of SecY's substrates (Mothes et al., 1994;Shaw et al., 1988;Figure 4a  Gram-positive Gram-negative   N   h0  TM1, P1  H0  EH1  EH1  EH1  TM1  H1  TM1  TM1  TM2  Plug TM2a  h1h2  C1  Lateral gate  TM2b  H2  CH1  CH1  CH1  TM3  H3  CH2  CH2  CH2  TM4  H4  TM2  TM2  TM3  h4h5  EH2  TM3/4  TM4/5  TM5  H5  TM3  TM5  In YidC, the point where H4 and H5 meet forms the hydrophobic end of the hydrophilic groove. This intersection remains hydrophobic in SecY, but is shifted toward the pseudosymmetry axis. The consequence of this shift is that the corresponding amino acids of the N-and C-terminal halves of SecY are now juxtaposed to form a ring of hydrophobic amino acids known as the pore ring ( Figure  4b). The pore ring lies close to the center of the membrane and represents the point where the hydrophilic vestibules from each side of the membrane connect. Thus, key structural features of YidC are not only recognizable in SecY, but also match with similar functional roles. This structurefunction correspondence satisfies an important prediction for a putative SecY homolog.
Alongside these elements which are both structurally and functionally similar, there is also an element, H0, which is structurally similar despite not being known to have any direct function in translocation ( Figure 5a). It is clearly not essential for function in SecY, having been largely eliminated from some bacterial SecY ( Figure 3). Archaeal and eukaryotic SecY N.H0 and YidC H0 are similar in their orientation, length, contact with H4, and the position of that contact site (Figure . The hydropathy of the lipidic and aqueous phases is represented on a separate scale, ranging from hydrophilic (white) to hydrophobic (grey). The hydropathy of the interfacial layers is approximated by linear gradients, each half the width of the hydrophobic layer. The algorithm used to estimate the membrane thickness and relative position does not account for any anisotropic membrane thinning which lipid-exposed hydrophilic residues may induce (see Methods), and thus none is shown. A schematic representation of a substrate signal and translocating polypeptide is superimposed on YidC, indicating the experimentally determined interface across which substrates translocate. The YidC surface and model are clipped to allow a lateral view of the hydrophilic groove which would otherwise be occluded by the non-conserved h4h5 transmembrane hairpin (B. halodurans YidC2 TM3/4). b Left: Lateral view of the SecY/substrate complex, with H1/4/5 shown as a solvent-excluded surface and the translocating substrate shown as a cartoon, with its signal helix hidden. The surface is shown colour-coded by hydropathy (top) or by consensus element (bottom). Right: as at left, except for M. jannaschii Ylp1 (5c8j). 5a). SecY C.H0 is similarly peripheral but different in length and orientation, a difference which is attributable to the confining effect of its fusion to N.H5.
The similarity between SecY N.H0 and YidC H0 is particularly good evidence for homology because without a direct functional role in SecY, it is unlikely to be the result of convergent evolution. Instead it indicates a conserved structural role. This independent evidence supports homology as an explanation for the structural and functional similarity of their conserved cores. Considered together, we conclude that SecY and YidC share a structural core composed of a membrane-embedded H1/4/5 bundle and a peripheral H0 brace (Figure 5b).

SecY's structural differences from YidC support its unique secretory function
Whereas the conserved cores of SecY and YidC are similar, their structural differences are concentrated in regions which are hypervariable among YidC homologs: h4h5 and H2/3 ( Figure 3). The H2/3 hairpin takes many structural forms among the YidC and SecY families. The relatively compact cytosolic hairpin in bacterial and archaeal YidC is markedly elongated and rigid in GET1, tethered via long flexible loops in EMC3, and retained in a roughly similar architecture in TMCO1 ( In contrast to each of these examples, the H2/3 hairpin in SecY is folded back toward the H1/4/5 bundle and is embedded in the membrane. Despite all these differences in topology, length, and linker properties, H2/3 appears to uniformly retain strong coupling between its two helices, and their lengths in all but GET1 remain within a relatively narrow range of ~15-30 amino acids. These similarities are consistent with H2/3 in YidC and SecY sharing a common evolutionary origin. As already noted above, transmembrane hairpin acquisition is frequently observed during membrane protein evolution. In addition to the SecE example noted above (Figure 2-Figure supplement 1), YidC h4h5 is a peripheral helix in archaea but a transmembrane hairpin in bacteria ( Figure 3). In the same way, H2/3 appears to be a hairpin that inserted alongside the H1/4/5 three helix bundle. This hairpin is cytosolic or membrane-peripheral in YidC, but could have become more hydrophobic and membrane-embedded to generate the five-TMH fold of proto-SecY. 9 of 29 Consistent with this idea of a common evolutionary origin, H2/3 in SecY and YidC displays not only structural but also functional similarity: it participates in signal recognition in both SecY ( Figure 6a) and across diverse YidC homologs. The methionine-rich membrane-facing side of YidC H2/3 is thought to initially engage the TMHs of its substrates, at least in bacteria (Kumazaki et al., 2014a), archaea (Borowska et al., 2015), and eukaryotic TMCO1 and EMC3 (McGilvray et al., 2020;Pleiner et al., 2020). In contrast to direct TMH interaction, the rigid and elongated H2/3 coiled coil of GET1 ( Stefer et al. 2011;). This adaptation may be due to the particularly hydrophobic TMHs inserted by this pathway (Guna et al., 2018), warranting a specialised machinery to shield them in the cytosol.
The migration of H2/3 into the membrane in SecY encloses the translocation channel which in YidC is exposed to the membrane ( Figure 6b). This allows SecY to create a significantly more hydrophilic and aqueous environment for its hydrophilic substrates, facilitating their translocation. This is particularly important for SecY's secretory function, which involves translocating much larger hydrophilic domains than those translocated by YidC.
As a secondary consequence, transmembrane insertion of H2/3 makes the site where signals initiate translocation more proteinaceous and hydrophilic (Figure 4a; Gogala et al., 2014;Park et al., 2014;Plath et al., 1998;Voorhees and Hegde, 2016;Weng et al., 2020). Because of this, translocation via SecY can be initiated via signals which are much less hydrophobic than the TMHs which initiate translocation via YidC. This, too, is important for SecY's secretory function, because the signal peptides of secretory proteins are distinguished from TMHs by their relative hydrophilicity (von Heijne, 1985). This biophysical difference allows signal peptidase to specifically recognise and cleave them (Paetzel et al., 2002). Cleavage frees the translocated domain from the membrane to complete secretion.
After H2/3, the next most conspicuous difference between SecY and YidC is in h4h5, which is nearly eliminated in SecY (Figure 3). Whereas the H2/3 transmembrane insertion differentiates how 10 of 29 SecY and YidC receive and recognise hydrophobic domains, the h4h5 elimination helped clear the channel through which hydrophilic substrates translocate. As mentioned previously, h4h5 is, like H2/3, hypervariable among YidC homologs, forming a peripheral helix in archaea and eukarya and a transmembrane hairpin in bacteria. If h4h5 were not altered in proto-SecY, its dimerisation would place h4h5 inside the hydrophilic funnel of the opposite monomer, instead of in contact with the membrane (Figure 6b). Thus, atrophy of h4h5 in SecY, driven by a change in chemical environment, would have opened a membrane-spanning hydrophilic pore and facilitated translocation.
Both proto-SecY and YidC use the distal face of H5 for dimerisation SecY's channel is formed from similar hydrophilic grooves buried on each side of the membrane (Figure 4b), suggesting that proto-SecY functioned as an antiparallel homodimer. This is consistent with how more recent antiparallel fusions are inferred to have evolved, via trajectories which consistently feature antiparallel homodimerisation as an intermediate step (Lolkema et al., 2008;Rapp et al., 2006). Subsequent gene duplication and fusion yields a pseudosymmetric protein, in which each domain can specialise for a single orientation.
Antiparallel homodimerisation requires that the protomer possess two characteristics: a tendency to be produced in opposite topologies, and an interface suitable for dimerisation. Although dualtopology is not evident in the YidC proteins which have been assayed, distant ancestors could easily have had this property with relatively few changes. A variety of studies have shown that making only a few changes to charged amino acids flanking the first TMH of an IMP can influence its topology, and that an inverted first TMH can invert an entire IMP containing several TMHs (Beltzer et al., 1991;Brown et al., 2018;Rapp et al., 2006Rapp et al., , 2007. Such changes in topology occur naturally in protein evolution (Rapp et al., 2006;Sääf et al., 1999), and YidC is not known to contain any conserved charges in its soluble segments that would impede this evolutionary process.
More important is the second required characteristic: possession of a proto-SecY-like interface suitable for dimerisation. Because SecY N.H2/3 and C.H2/3 separate during gating, the major interaction between the N and C domains occurs on the opposite side of the channel, via N.H5 and C.H5 (Figure 6b). The face of H5 used for this interaction is the side furthest from the rest of the H1/4/5 bundle, and so we will refer to it as the distal face of H5. From this feature of the SecY structure, we infer that proto-SecY formed antiparallel homodimers via the distal face of H5.
The evolutionary relationship between proto-SecY and YidC suggests that some extant YidC proteins may retain a tendency to form protein-protein interactions via the distal face of H5. This surface forms an intramolecular interaction with the h4h5 hairpin in bacterial YidC, but remains exposed in archaeal and eukaryotic YidC (Figure 3). There are no data available about YidC biochemistry in archaeal cells, but at least two eukaryotic YidC proteins, EMC3 and GET1, form functionally important complexes, and structural models show that they use the distal face of H5 to do so (Figure 3-Figure supplement 2; Pleiner et al. 2020;Bai et al. 2020;O'Donnell et al. 2020;Miller-Vedam et al. 2020;McDowell et al. 2020). These interactions via H5 are heterodimeric, rather than homodimeric, but nonetheless demonstrate that EMC3 and GET1 can dimerise (with EMC6 and GET2, respectively) along the same interface as proto-SecY without impeding their translocation activities. descended from the cenancestor via archaea. To test this prediction, we queried nine diverse archaeal proteomes for homologs of H. sapiens EMC6 or GET2 using HHpred (Zimmermann et al. 2018). Although none displayed significant similarity with GET2, every proteome queried contained exactly one protein similar to EMC6 (Figure 7b). Among these archaeal proteins, those 12 of 29     most similar to eukaryotic EMC6 tend to come from the species most closely related to Eukarya: the Asgard archaean, then the TACK archaean, and then the euryarchaeans. This phylogenetic concordance indicates that the archeal proteins are homologs of the eukaryotic protein, and that their ubiquity is due to an ancient origin.

Dimerisation via YidC H5 is ancient and widely conserved
Reciprocal queries of H. sapiens and S. cerevisiae proteomes with the putative Asgard EMC6 protein (Lokiarch_50810) identified EMC6 in both cases as high-confidence hits. Unexpectedly, the H. sapiens search also identified a second equally confident hit, C20orf24 (Figure 7b). These proteins all share a highly similar core structure of three TMHs ( Although GET2 lacks significant sequence similarity with these EMC6 family proteins, its structural similarity with EMC6 was immediately recognised (McDowell et al., 2020;Pleiner et al., 2020). Our identification of an ancient EMC6 family reveals a plausible origin for GET2. Consistent with this, although our GET2 query of the lokiarchaean proteome did not identify any high-similarity proteins, the most similar was indeed EMC6 (Lokiarch_50810). This similarity was not detected in the archaeal proteomes more distant from Eukarya. Moreover the aligned columns between GET2 and Lokiarch_50810 correspond exactly to their structurally similar transmembrane domains. The single large gap in this alignment corresponds to the cytosolic extension of GET2 TM3, which brings it into contact with GET3 ( Figure 7c). Thus the major difference between GET2 and EMC6 can be explained as a functional adaptation for GET3 recognition, not unlike GET1's elongation of H2/3. We therefore propose that GET2 is a member of the EMC6 superfamily.
The absence of a similar heterodimer in bacteria suggests that EMC6 was acquired in Archaea after divergence from Bacteria, which instead acquired the H5-occluding transmembrane hairpin in h4h5 ( Figure 3). An archaeal origin for EMC6 is consistent with its genomic location, which is distant from the widely conserved cluster of cenancestral ribosomal genes, SecY and YidC (Makarova et al., 2015). In the putative period of YidC evolution prior to EMC6, H5 would not have been occupied by heterodimerisation with EMC6, and instead could have mediated homodimerisation, as in proto-SecY. YidC's universal tendency to occlude the distal face of H5 supports this possibility. Insight into this question of in vivo sufficiency can be obtained by inspection of the only clade known to have survived SecY deletion: the mitochondrial symbionts. SecY has been lost from all mitochondrial genomes for which sequences are available, except in one group of eukaryotes, and it has not been observed to relocate to the nuclear genome (Janouškovec et al., 2017). The exceptional group is the jakobids, only a subset of which retain SecY. The incomplete presence of SecY in this group implies that it was lost multiple times from the jakobids and their sister groups. SecY deletion is therefore a general tendency of mitochondria, rather than a single deleterious accident.

Reductive selection in symbionts demonstrates the functional range of YidC
Mitochondria retain two YidC homologues, Oxa1 and Oxa2 (Cox18), the genes for which relocated from the mitochondrial genome to the nuclear genome (Bauer et al., 1994;Bonnefoy et al., 1994). As nuclear-encoded mitochondrial proteins, they are translated by cytosolic ribosomes and then imported into mitochondria via channels in the inner and outer mitochondrial membranes (Wiedemann and Pfanner, 2017). These channels are essential for the import of nuclear-encoded proteins, but they are not known to function in the integration of mitochondrially encoded IMPs (meIMPs), which instead requires export from the matrix, where they are synthesized by mitochondrial ribosomes. This export is generally Oxa1-dependent (Hell et al., 2001).
The meIMPs have diverse properties, including 1 to 19 TMHs and exported parts of various sizes and charges (Figure 8a-c). Oxa1's sufficiency for their biogenesis in vivo is consistent with in vitro results showing that E. coli YidC is sufficient for the biogenesis of certain 6-and 12-TMH model substrates (Serdiuk et al., 2019;Welte et al., 2011). The only apparent constraint on the meIMPs is that they tend to have short (~15 a.a.) soluble segments. This is consistent with observations from E. coli that fusing long soluble segments to a YidC-dependent IMP can induce SecY dependence (Andersson and von Heijne, 1993;Kuhn, 1988;Shanmugam et al., 2019). Among the meIMPs, Cox2 is an exception which proves the rule, because Oxa1 cannot efficiently translocate its exceptionally long (~140 a.a.) C-terminal tail; instead it is translocated by Oxa2 in cooperation with two accessory proteins (Saracco and Fox, 2002).
This constraint is less consequential than it may at first appear, because prokaryotic IMPs in general tend to have only short soluble segments (Figure 8c; Wallin and von Heijne, 1998). Thus most prokaryotic IMPs may be amenable to SecY-independent, YidC-dependent biogenesis. Consistent with this, in E. coli, the signal recognition particle (SRP) has been found to promiscuously target nascent IMPs to either SecY or YidC (Welte et al., 2011), and YidC is present at a concentration 1-2x that of SecY (Schmidt et al., 2016). By contrast, IMPs with large translocated domains became much more common in eukaryotes (Wallin and von Heijne, 1998) concomitant with YidC's divergence into three niche paralogs, none of which are essential at the single-cell level (Guna et al., 2018;Jonikas et al., 2009;McGilvray et al., 2020).
Even without extrapolating from the meIMPs to other similar IMPs, it is clear that chemiosmotic complexes are amenable to YidC-dependent, SecY-independent biogenesis (Figure 8d). These complexes couple chemical reactions to the transfer of ions across the membrane, and are sufficient for the membrane's core bioenergetic function.
Although the complexes shown participate in aerobic metabolism, which presumably post-dates cyanobacteria's photosynthetic oxygenation of Earth's atmosphere, they have homologs which enable chemiosmosis in anaerobes. In particular, chemiosmosis in methanogenic and acetogenic archaea employs the rotor-stator ATPase, Mrp antiporters, and an energy-converting hydrogenase (Ech; Lane and Martin 2012), all of which have homologs of their IMP subunits among the meIMPs (Figure 8d). These archaeans' metabolism has been proposed to resemble primordial anaerobic metabolism at alkaline hydrothermal vents (Weiss et al., 2016).  Thus if YidC had preceded SecY, it would have been sufficient for the biogenesis of diverse and important IMPs, but likely not the translocation of large soluble domains. This is supported by the results of reductive selection in chloroplasts, which retain both SecY (cpSecY) and YidC (Alb3) (Xu et al., 2020). cpSecY imports soluble proteins across the chloroplast's third, innermost membrane, the thylakoid membrane (Peltier et al., 2002). This thylakoid membrane was originally part of the chloroplast inner membrane (equivalent to the bacterial plasma membrane), much like the mitochondrial cristae, but subsequently detached and now forms a separate compartment (Vothknecht and Westhoff, 2001). Because the thylakoid membrane is derived from the plasma membrane, import across the thylakoid membrane is homologous to secretion across the plasma membrane. Thus when symbiosis removed the need for secretion, SecY was eliminated from mitochondria, whereas it was retained in chloroplasts for an internal function homologous to secretion.
A primordial YidC-dependent cell may simply not have secreted protein, or may instead have used a different secretion system. Notably one primordial protein secretion system has been proposed: a protein translocase homologous to the rotor-stator ATPases (Mulkidjanian et al., 2007). This putative primordial protein translocase used its ATPase domain to unfold and feed substrates through the homo-oligomeric channel formed by F 0c , now occupied by the central stalk ( Figure 8d). The strict YidC-dependence of F 0c biogenesis in E. coli (Yi et al., 2003) hints that they shared a primordial era of co-evolution, as a laterally closed channel for the secretion of soluble proteins (F 0c ) and a laterally open channel for the integration of membrane proteins (YidC), including F 0c itself. The subsequent advent of a laterally gated channel, SecY, would have enabled the biogenesis of a hybrid class of proteins: IMPs with large translocated domains.

Discussion
Analysis of the SecY structure in the context of the principles of membrane protein folding and evolution led us to re-frame its architecture. In addition to its long-recognised pseudosymmetry, we identified within each half a three-helix bundle abutting a two-helix transmembrane hairpin. The frequent acquisition of transmembrane hairpins during membrane protein evolution argued that the three-helix bundle was its ancestral core. This core element seeded our search for a SecY precursor among cenancestor proteins, leading us to the YidC family. Although the structural similarity of the three-TMH cores of YidC and SecY is striking on its own, the overlaying of key functions onto each structural element strongly reinforced the hypothesis that they are evolutionarily related. Reasoning that YidC may conserve a tendency to dimerise via the same interface as the SecY precursor, we identified a ubiquitous and ancient family of YidC-interacting proteins homologous to EMC6. The surprising conclusion of our study is that an ancestral YidC could have both preceded and evolved into a proto-SecY whose gene duplication and fusion originated the present-day SecY family.

Evaluation of the homology hypothesis
Inferences about early evolution are necessarily made on the principle of maximum parsimony, i.e. by preferring the hypothesis which explains the most evidence while invoking the fewest ad hoc assumptions (Koonin, 2003). By that standard, we weigh the hypothesis that SecY and YidC are homologues against the null hypothesis that they are unrelated. This null hypothesis holds that the similarities shown here all arose by convergent evolution or random chance.
The only positive evidence for the null hypothesis is the presence of certain structural differences between SecY and YidC. We showed, however, that these structural differences are concentrated in the hypervariable H2/3 and h4h5 regions, and can be explained as adaptations which created a bilayer-spanning pore and lateral gate. Moreover the major difference between SecY and YidC, transmembrane insertion of the H2/3 hairpin, is no greater than the difference between archaeal YidC and bacterial YidC (acquisition and transmembrane insertion of the bacterial h4h5 hairpin). The positive evidence for SecY-YidC homology, by contrast, includes the key similarity in both structure and function of their hydrophilic channels. Homology is further supported by the fact that YidC ubiquitously forms proto-SecY-like interactions via the distal face of H5.
Convergence alone cannot explain the structural similarity between SecY and YidC, because it is not a necessary consequence of their functional similarity. This is evident in the fact that there are other laterally open protein channels, such as the symbiont import channels and the ERAD channel, which have not converged on the YidC fold (Figure 9). Furthermore there are structural similarities which have no essential function in SecY, namely in the peripheral helix H0, and thus admit no convergent explanation.
The null hypothesis therefore requires that we invoke random chance to explain the similarity between SecY and YidC. But chance appears unlikely to create two protein channels this similar, because all other laterally open helical protein channel families of known structure are dissimilar to each other (Figure 9). This inference is not sensitive to the set of transporters considered; extending 17 of 29 it to include the other laterally open channels such as lipid scramblases (Brunner et al., 2014) and βbarrel assembly factors (Bakelar et al., 2016), or laterally open active transporters such as the P5A-ATPase (McKenna et al., 2020), would still yield no recurrences. Moreover, it is especially unlikely that the YidC fold evolved twice just in the limited time prior to the cenancestor, and then seemingly never again. Weighed against a null hypothesis that is supported by so little evidence and requires these several ad hoc assumptions, we conclude that the homology hypothesis is the most parsimonious.

Implications for the evolution of protein transport
Besides illuminating SecY's origins, identifying YidC as its progenitor also implies that YidC is the oldest known protein channel. This has implications for the evolution of IMPs generally, including other components of the general secretory pathway and YidC itself. The following is a stepwise model for the evolution of YidC and proto-SecY from a spontaneously inserting precursor ( Figure  10a), which we propose parsimoniously explains the available data.
Step 1. The precursor to YidC was a membrane-anchored ribosome receptor. This simple function can be achieved with just two low-complexity domains: a hydrophobic anchor and a polybasic extension. Despite its simplicity, this receptor would function to reduce the deleterious aggregation of hydrophobic domains in the aqueous phase by creating a population of membranebound ribosomes, from which any nascent IMPs would be more likely to encounter the 18 of 29 Figure 10. Model for the evolution of the SecY/YidC superfamily from a spontaneously-inserting precursor. a Intermediate stages. The spaces on either side of the membrane are marked 'in' and 'out,' but the membrane may not have been continuous enough to make this distinction meaningful at some of the early stages. Charged side chains and termini are indicated at stage 1 by grey symbols. At top, additional components of the secretory pathway label the stage at which they arise: the signal recognition particle (SRP), the SRP receptor (SR), signal peptidase, signal peptides, and SecE. b Archaeal YidC and EMC6, represented schematically at left and by structural models at right (M. jannaschii Ylp1 and Mj0606, as in Figure 7c). Archaeal YidC is connected via an arrow to either stage 4 or 5 of panel a. c Bacterial YidC (B. halodurans YidC2, PDB 3wo6). d SecY (G. thermodenitrificans SecY, 6itc).
membrane. Similar polybasic C-terminal tails are known to occur in YidC and can compensate for deletion of SRP or SRP receptor (SR) (Seitl et al., 2014;Szyrach et al., 2003). We assume that the initial anchor was a peripheral helix because a primitive membrane protein derived from a soluble protein or quasi-random sequence would initially lack the hydrophobicity to spontaneously insert (Mulkidjanian et al., 2009).
Step 2. The peripheral helix acquires a transmembrane hairpin, thereby integrating into the membrane. A hairpin is a likelier anchor than a single TMH, because in the absence of any protein transporters insertion would need to proceed spontaneously, and hairpins insert more efficiently than single TMHs (Engelman and Steitz, 1981). We assume that this insertion preceded SRP/SR-dependent targeting because it is simpler than SRP/SR, and at least as simple as any SRP/SR substrate. The proximity of this hairpin to nascent IMPs imposes a selective pressure on the hairpin to evolve features that facilitate IMP integration and folding, such as membrane-buried hydrophilic residues.
Step 3. Acquisition of a second transmembrane hairpin produces a 4-TMH protein containing the conserved three-helix bundle and hydrophilic groove. The loop between the first and second transmembrane hairpins becomes the cytosolic hairpin H2/3.
Step 4. The hydrophilic groove allows hydrophilic termini to efficiently translocate, including the N-terminus of YidC itself, which acquires a new position as the extracytoplasmic peripheral helix H0. Thus the full YidC fold is now acquired, as it is found in archaea. By this time, SRP/SR have evolved, and H2/3 evolves interactions with SR that will be retained in both SecY and YidC Petriman et al., 2018).
Step 5. A subpopulation of YidC integrates with an inverted topology and forms antiparallel homodimers. We assume that antiparallel dimerisation precedes duplication because this is common in the evolution of antiparallel fusions (Lolkema et al., 2008;Rapp et al., 2006), and because both domains of SecY conserve the transmembrane insertion of H2/3, which appears to be an adaptation to antiparallel dimerisation. We assume that reorientation did not occur while the C-terminal tail was long and positively charged, because this tail would need to be translocated in the inverted orientation. In the presence of SRP/SR, a ribosome-binding tail is redundant and can be eliminated, facilitating reorientation.
Antiparallel homodimerisation positions hydrophilic grooves on both sides of the membrane, leaving at most a thin hydrophobic layer between them, as in the heterodimeric ERAD channel . This facilitates the translocation of IMPs with large soluble domains, including signal peptidase. In the presence of signal peptidase, signal-dependent secretion becomes possible, with the first cleavable signal peptides being the TMHs of IMPs which had previously anchored their now-secreted extracytoplasmic domains. Signal peptides' origin as TMHs explains why both engage SecY in a similar way.
At this stage or later, SecE is acquired and binds symmetrically to each half of the antiparallel homodimer. Its association would stabilise the dimer, particularly when they separate to accommodate substrates. In SecY, SecE remains the axis about which the N-domain pivots during engagement (Voorhees and Hegde, 2016). Evolution of SecE after YidC but before proto-SecY would explain why SecE integration is YidC-dependent (Yi et al., 2003).
Step 6. Transmembrane insertion of H2/3 creates a lateral gate. By inserting between the hydrophilic grooves and the membrane, H2/3 makes those grooves even deeper and more hydrophilic, tantamount to a pore when the lateral gate is closed. This more hydrophilic 19 of 29 475 480 485 490 495 500 505 510 environment further facilitates translocation. As a secondary consequence, it also creates a more hydrophilic environment for signal recognition. This allows cleavable signal peptides to become less hydrophobic than TMHs, and more easily distinguished by signal peptidase.
Duplication and fusion of the proto-SecY gene would allow each half of this initially symmetric protein to specialise for cytoplasmic and extracytoplasmic functions. For example, the C.h1h2 and C.h4h5 loops would continue to bind ribosomes, whereas these same loops in the N-domain atrophy. One such loop was repurposed as the plug domain.
At step 4 or 5, YidC diverges from SecY ( Figure 10). It is not determinable a priori whether SecY and YidC diverged as orthologs or paralogs, since the cenancestor may have been capable of lateral gene transfer (Fournier, Andam, and Gogarten 2015). However, the essentiality of YidC suggests that the mutations required to generate proto-SecY would have been best tolerated if they occurred in a paralog, alongside YidC. Paralogous origin in a tandem duplication event is consistent with the commonly observed juxtaposition of YidC and SecY in prokaryotic genomes (Makarova, Galperin, and Koonin 2015). Once diverged from SecY, YidC in archaea evolves to heterodimerise via the distal face of H5 while this same surface in bacteria is covered by the h4h5 transmembrane hairpin.

Outlook
The novel EMC6 superfamily members identified here are intriguing subjects for further study. Details of their origin and distribution, and the context and function of their interactions with archaeal and eukaryotic YidC, remain to be determined. Bacterial YidC, despite being the moststudied YidC clade, also remains uncharacterised in certain important ways. Our analysis highlighted the fact that much of the prokaryotic membrane proteome has characteristics which suggest amenability to either SecY-or YidC-dependent integration, but it is not known how frequently IMPs use each pathway in vivo, and very few substrates have been characterised in vitro.
This analysis of distant evolutionary relationships was enabled by recent advances in structural techniques, both cryo-EM and structure prediction, which have generated a wealth of structural data about membrane proteins and complexes. The rapidly increasing availability of structures from diverse homologs will similarly facilitate the discernment of conserved structural features in other superfamilies. It may even yield new insight into other channels' origins, which have previously proved to be largely undetectable from sequence data (Hennerdal et al., 2010).
It is plausible, however, that the detectability of SecY's origins will prove to be unusual among fused channels, and is a result of the unusual properties of protein as a transport substrate. Unlike most substrates, protein can be sufficiently amphipathic to assist in its own translocation, making an incompletely penetrant channel such as YidC functionally sufficient. Moreover YidC is thought to serve a second function as a chaperone for IMP folding, which makes it non-redundant to SecY. The same hydrophilic groove used for transport is thought to mediate this chaperone function (Kumazaki et al., 2014b;Nagamori et al., 2004;Serdiuk et al., 2016). Other pre-fusion channel precursors may contain similar grooves for transport, but this secondary chaperone function is unique to protein substrates, because other substrates do not fold. Thus pre-fusion precursors to other channels may not be so well conserved.
Although theories about early evolutionary transitions are not experimentally testable, experimental reconstructions can at least demonstrate their plausibility. Toward that end, one could seek to engineer a pseudosymmetric channel from YidC. But it is uncontroversial that pseudosymmetric channels are formed from proteins like YidC, even if YidC duplication did not form SecY in particular. More intriguing is the implication that YidC supported the evolution of protocell membranes. Efforts to reconstruct protocells could capitalise on the synergy detailed above between YidC and the putative primordial protein-secreting translocase (Mulkidjanian et al., 2007). This protein translocase is itself thought to have to descended from an RNA translocase, in part because its ATPase domain derives from an RNA helicase. By facilitating the integration of such an RNA translocase, YidC could indirectly facilitate the exchange of genetic information among a community of protocells, and thereby accelerate their evolution.

Methods
Structural models were predicted from amino acid sequences using the trRosetta algorithm (Yang et al., 2020). End-to-end pipelines which automate the intermediate steps of multiple-sequence alignment generation and homology template selection were used, which reduces user input to only the single protein sequence of interest. The Baker lab's web server (for Ylp1) and the Yang lab's web server (for the EMC6 family) were used.  Heterodimeric contacts were predicted from amino acid sequences using the RaptorX ComplexContact algorithm (Zeng et al., 2018) as provided by the Xu group's web server. The multiple-sequence alignments generated by RaptorX ComplexContact for Mj0606 and C20orf24 were reviewed to ensure that they did not include any proteins annotated as EMC6.
All models were aligned and rendered in UCSF ChimeraX (Pettersen et al., 2020). Surface hydrophobicity was computed in ChimeraX by its default method: pyMLP (Broto et al., 1984;Laguerre et al., 1997) with Fauchere propagation and lipophilicity values from Ghose et al., 1998. Models depicted relative to a membrane are positioned and oriented according to the prediction algorithm provided by the Orientations of Proteins in Membranes server (Lomize et al. 2012).
Comparisons of profile hidden Markov models were performed using HHpred (Zimmermann et al., 2018)  Hydropathy and charge were computed from protein sequences using EMBOSS Pepinfo (Madeira et al. 2019), topology predicted using TMHMM (Krogh et al. 2001), and plotted in Veusz. The 2-D histogram of IMP length vs TMH count was likewise plotted in Veusz.