Phage-shock-protein ( Psp ) Envelope Stress Response : Evolutionary History & Discovery of Novel Players

The phage shock protein (Psp) stress-response system protects bacteria from envelope stress and stabilizes the cell membrane. Recent work from our group suggests that the psp systems have evolved independently in distinct Gram-positive and Gram-negative bacterial clades to effect similar stress response functions. Despite the prevalence of the key effector, PspA, and the functional Psp system, the various genomic contexts of Psp proteins, as well as their evolution across the kingdoms of life, have not yet been characterized. We have developed a computational pipeline for comparative genomics and protein sequence-structure-function analyses to identify sequence homologs, phyletic patterns, domain architectures, gene neighborhoods, and evolution of the candidates across the tree of life. This integrative pipeline enabled us to make several new discoveries, including the truly universal nature of PspA and the ancestry of the PspA/Snf7 dating all the way back to the Last Universal Common Ancestor. Using contextual information from conserved gene neighborhoods and their domain architectures, we delineated the phyletic patterns of all the Psp members. Next, we systematically identified all possible ‘flavors’ and genomic neighborhoods of the Psp systems. Finally, we traced their evolution, leading us to several interesting findings of their occurrence and co-migration that together point to the functions and roles of Psp in stress-response systems that are often lineage-specific. Conservation of the Psp systems across bacterial phyla emphasizes the established importance of this stress response system in prokaryotes, while the modularity in various lineages is indicative of adaptation to bacteria-specific cell-envelope structures, lifestyles, and adaptation strategies. We also developed an interactive web application that hosts all the data and results in this study that researchers can explore ( https://jravilab.shinyapps.io/psp-evolution ).


Introduction
Membranes are complex dynamical structures enclosing all living cells. They typically comprise lipid bilayers and other non-bilayer lipids as well as proteins [1] . The membrane and its associated elements partake in a variety of critical dynamic processes such as membrane biogenesis, cell division, maintenance of cell shape, transport of small molecules, cell-cell and cell-environment signaling, maintenance of proton motive force (PMF), and other structural processes involving the cytoskeletal proteins [2] . Even though the precise composition and structure of the membranes and associated lipids and proteins could vary widely, most of the membrane functions enumerated above are conserved across the tree of life [1] . Cellular membranes are constantly adapting and surviving in their ever-changing environments, especially when subject to various kinds of surface stress. When the membrane integrity is compromised due to damage, a specialized suite of lipids and proteins participate in sensing and responding to the stress [3,4] , instigating a chain of events to restore membrane function. Extra-cytoplasmic envelope stress responses are common virulence mechanisms employed by bacterial pathogens [5][6][7] . Similarly, antibiotics, antimicrobial peptides (natural or synthetic) could trigger the cell envelope stress responses [8,9] . Some of the pathways involved in the restoration of biophysical properties could be conserved. Others, such as the envelope stress response machinery maintaining membrane homeostasis, could comprise both conserved and lineage-specific molecular systems. Achieving similar solutions to broad-spectrum problems such as environmental stress via phenotypic convergence is a common occurrence, even if it's not always paired with genotypic convergence [10] . Therefore, to fully understand the molecular basis of membrane function and restoration, it is essential to map out how these systems have evolved alongside each other on the evolutionary timescale, an aspect that remains unclear.
Multiple protein systems are intrinsically tied to membrane dynamics, remodeling, and stabilization, such as ESCRT systems. One such respondent is Vipp1, a predominantly plant protein with a cyanobacterial origin [11][12][13][14] . This peripheral membrane protein is involved in the vesicular transport of lipids and thylakoid biogenesis [12,13,15,16] . This inner membrane protein bears similarity to other bacterial and eukaryotic protein families involved in secretory pathways, protein sorting and transport, and stabilization of bent membranes. Either directly via membrane biogenesis and stabilization, or indirectly via regulatory systems, Vipp1 is known to preserve membrane function under stress. Phylogenetic analyses have revealed that Vipp1 has homology to PspA [11,14,17] , a bacterial peripheral-membrane protein with known association to stress response functions. Since the membranes of unicellular organisms interact directly with the external environment and stimuli, sensing of these stress cues and repair/maintenance of membrane integrity becomes paramount. In line with this argument, it has been reported that the primary role of PspA, similar to Vipp1, is the preservation of envelope function in response to a wide range of surface stresses via membrane fusion [15] and prevention of leakage of protons through the membrane [2,18] . These perturbations include changes in membrane potential, ∆pH changes [2,18] , and PMF [19][20][21][22] , (l)antibiotic stress [9] , heat and osmotic shock [23] , and mechanical stress via stored curvature elastic stress [24][25][26][27] , among others [23] .
In this paper, we have taken a generalized and global approach while remaining unbiased to 'the known' to delineate all possible occurrences of PspA and its cognate partners, their associated genomic contexts, and phyletic spreads across the bacterial and archaeal kingdoms. These molecular characterization steps would help ascertain the evolutionary and functional significance by answering the following questions: i) Which are the most distinct and commonly occurring themes in terms of a) domain architectures and b) conserved gene neighborhoods that house the Psp proteins? ii) Have the Psp proteins evolved to function in other (i.e., not one of the known three) genomic contexts? iii) Are there common patterns/themes that emerge from the functional characterization of these Psp systems, and what do we understand about their lineage-specificity? vi) Finally, is the migration and, thereby, the functional implications of these different Psp members more independent than originally anticipated? To address the conservation of PspA and its potential partners, we conducted an evolutionary study across all the major kingdoms of life [64,65] by querying for Vipp1, PspA, and its many distinct paralogs (across bacterial, plant, and other archaeal and eukaryotic clades [11,12,14,31] ), and other PspA-like proteins that function to preserve membranes across the tree of life. We subsequently analyze the different variants of PspA, and PspA-associated proteins (PspBC, PspMN, LiaIGF), based on sequence-structure relationships overlaid on phyletic contexts. The resulting findings help discern the truly universal nature of this envelope stress-response system in an evolutionary light.

Results and Discussion
To understand the evolution of the Psp system and its ability to function across phyla, we needed to reconcile two seemingly disparate facts: the universality of PspA and the distinct genomic neighborhoods containing PspA [34,38,61] in each of the distinct bacterial, archaeal, and eukaryotic lineages. The first step towards this goal was to identify and characterize all close and remote homologs of PspA and cognate partner domains across the tree of life.
To locate and stratify all Psp homologs, we started with the previously characterized Psp proteins from the best-studied operons pspF||ABCDE-G (from E. coli ), liaIHGFSR (from B. subtilis ), and clgRpspAMN (from M. tuberculosis ), and began our analyses by exploring their phyletic spreads [ Fig.  1 ]. To ensure an exhaustive search and identification of all related Psp proteins including remote homologs, we first resolved these Psp proteins into their constituent domains and used each of the individual domains as queries for all our phylogenetic analyses. This approach allows us to find homologies beyond those pertaining to full-length proteins only. Homology searches for each of the constituent domains were performed across the tree of life (all 6500+ completed genomes; see Methods). We used a combination of protein domain and orthology databases, iterative homology searches, and multiple sequence alignment for the detection of domains, signal peptides, transmembrane regions to help construct the domain architecture of each of the query Psp proteins [See Methods]. Due to the prevalence of transcription factor domains and two-component systems in bacterial phyla (PspF and ClgR with the HTH domains, NtrC-AAA-FIS-HTH, XRE_HTH; LiaRS with HAMP-HISKIN, REC-LYTTR domains Fig. 1A ), we did not perform dedicated searches for these proteins/domains; instead, we noted their occurrence in the genomic neighborhood of core Psp genes. Where required, as in the case of the Histidine Kinase, we identified the closest orthologs to study the full extent of the diversity.
The section below describes the delineation of the core Psp proteins in proteobacteria, actinobacteria, and firmicutes, in terms of their underlying domain architectures, and their phyletic spreads [ Fig. 1 ]. Then, we outlined the conservation and evolution of the central protein, PspA, the systematic identification of PspA-containing species (spanning all the completed genomes across the bacterial, archaeal, and eukaryotic kingdoms) [ Fig. 2 ], followed by the molecular characterization of the homologs by determining their domain architecture, genomic contexts, and phyletic spreads [ Fig. 3 ]. Finally, we report the findings from our in-depth analysis of domain architectures [ Fig. 4 ], genomic contexts [ Fig. 5 ], and phyletic spreads to every one of the Psp cognate partners to identify novel components (domains and genomic neighborhoods) of the phage shock protein stress response system across the tree of life [ Fig. 4, 5 ]. All these new findings are reconciled in the form of a proximity network of domains with respect to PspA along with their phyletic spreads and frequencies [ Fig. 6 ]. All our findings (data, results, and visual summaries) are available for in-depth exploration in an interactive and queryable web application https://jravilab.shinyapps.io/psp-evolution .

The known components of Psp and their domain architectures
The Psp system in proteobacteria We started with the Psp operon that was identified in E. coli , and subsequently, in other proteobacteria [14,28,29,[66][67][68] . We resolved the core members of the operon, PspF||ABC, into their domain architectures. PspA, PspB, and PspC each comprising their namesake domains, PspA_IM30 ( IM30 standing for inner membrane 30KDa domain, henceforth called PspA), PspB, and PspC, respectively, and each domain spanning almost the entire protein length [ Fig. 1A ; Table 1 ]. In line with previous studies [46] , we identified that PspF, the transcriptional regulator of the Psp operon in E. coli , contains an enhancer-binding NtrC-AAA (AAA-ATPase) domain in addition to the FIS-HTH (helix-turn-helix) DNA-binding domain [ Fig. 1B ].

The Psp system in actinobacteria
The protein central to the identification of the Psp operon in M. tuberculosis (actinobacteria) is PspA (encoded by rv2744c ). We confirmed that PspA contains the PspA domain, albeit with <30% similarity to the E. coli homolog. However, in contrast to a PspF-like transcriptional regulator in E. coli , we observed that the M. tuberculosis operon comprises an XRE-type HTH transcription factor family member, ClgR, that is unique to actinobacteria [  Table 1 ]. Despite its requirement for the full activity of the Psp system [37] , the molecular nature of the Rv2742c protein remains to be characterized. These findings are consistent with previous studies [37,38,61] . To remain consistent with the nomenclature of Psp cognate proteins, we rename Rv2743c (membrane protein) and Rv2742c (carrying a C-terminal DUF3046) to 'PspM' and 'PspN', respectively [38] .

The Psp system in firmicutes
Finally, we examined the domain architectures of the well-studied Lia operon in B. subtilis , revealing some interesting features. We initially identified, in line with recent studies [39,51,60] , that: LiaI is a small transmembrane protein; LiaF is a protein with N-terminal transmembrane helices (DUF2157) and a C-terminal globular domain (DUF2154); LiaG contains another domain of unknown function (DUF4097 that is similar to DUF2154) [ Fig. 1B ]. Our analysis additionally showed that the Transmembrane domains containing DUF2157 encompass the LiaI-LiaF-TM domain and those containing the DUF4097/ DUF2154 encompass the Toast_rack domain (see below for details on our newly defined nomenclatures, LiaI-LiaF-TM and Toast_rack) [ Fig. 1B ]. As documented previously [39,43] , we found that, in this firmicute operon, the transcriptional response is brought about by the two-component system, LiaRS, comprising the REC-LYTTR and HAMP-HISKIN domains [ Fig. 1B ].
In summary , the above data show that, in addition to domain architectures previously described, such as PspA, PspB, PspC in proteobacteria, and the transcription regulators, we have identified novel domain definitions, such as LiaI-LiaF-TM and Toast_rack, in firmicutes [ Fig. 1B ]. Having deconvolved the Psp member proteins of the 'known' systems, we next focused on where these proteins are found across the tree of life.

PspA
We started with a detailed analysis of the phyletic spread of PspA, the namesake protein of the system. In line with previous PspA evolutionary studies, we found that PspA, containing the PspA domain (previously PspA_IM30; spanning almost the entire length of the E. coli , B. subtilis , and M. tuberculosis PspA proteins; Table 1 ), occurs in most of the major bacterial and archaeal lineages as one or more copies [ Fig. 1C ]. We observed that, within eukaryota, only the SAR group (the clade that includes stramenopiles, alveolates, and rhizaria) and archaeplastids (including plants and algae) carry PspA homologs [ Fig. 1C ]. We then performed a more detailed analysis of the PspA homologs in plants (Vipp1) using homology searches with multiple distinct and distant PspA proteins were collected and collated. This analysis revealed that the PspA_IM30 (now referred to as PspA) domain is homologous to the eukaryotic Snf7 superfamily [ Fig. 1B ], which is part of the membrane remodeling ESCRT system [69][70][71][72][73][74] . Further, our findings showed that iterative searches with PspA/Vipp1 and Snf7 recover each other, suggesting that they are indeed part of the same superfamily. Iterative searches with the Snf7 domain recovered several non-plant eukaryotes and archaeal homologs [ Fig. 1B ]. Taken together, the PspA and Snf7 superfamilies are widely prevalent, extending to almost every distinct phyletic lineage spanning all three kingdoms [ Fig. 1C , with the PspA branch conserved in Bacteria with transfers to archaea and eukaryotes, and the Snf7 branch conserved in Archaea and Eukaryotes with transfers to bacteria. Taken together, we found that PspA/Snf7 is the only 'universal protein' in the Psp system present in all three kingdoms: bacteria, archaea, and eukaryota [ Fig. 1B, Table S1 ].

LiaI-LiaF-TM and Toast_rack
Next, we tracked the presence of LiaF, LiaG, and LiaI across the tree of life using their whole protein sequences and their constituent domains, including our new domain definitions, LiaI-LiaF-TM and Toast_rack. [ Fig. 1B ]. Specifically, we used PSI-BLAST searches from the three sub-sequences of the whole proteins, N-terminal transmembrane region of LiaF, the C-terminus globular domain (DUF2154) of LiaF, and the globular domain in LiaG (DUF4097), followed by structure-informed sequence alignment ['MSA' tab in the webapp ]. These analyses revealed that LiaI and LiaG bear remarkable similarities to the N-terminal transmembrane and C-terminal globular regions of the LiaF protein, respectively. In fact, we discovered that the globular domains of these LiaG-LiaF proteins are homologs of each other and that the profiles detected by PFAM in this region, DUF2154, DUF4097, and DUF2807 are unified into a single domain that has a single-stranded right-handed beta-helix like structure called Toast_rack (PDB: 4QRK ; Table 1 ; webapp ). We will, therefore, collectively refer to these three domains of unknown function as the ' Toast_rack ' domain [ Fig. 1A ; Table 1 ]. Likewise, the high level of similarity between the 4TM (four transmembrane) regions of LiaI and the N-terminal domain, DUF2157, of LiaF led us to rename the 4TM region (previously referred to as Toast_rack_N, Pfam: PF17115 ) to be ' LiaI-LiaF-TM ' [ Fig. 1B ; Table 1 ]. Toast_rack-containing proteins are panbacterial with transfers to archaea and eukaryotes [ Fig. 1B ]. We found a few metazoan, fungal, and SAR-group eukaryotic genomes that carry this globular Toast_rack domain as well [ Fig. 1B ]. LiaI-LiaF-TM is found mainly in firmicutes, actinobacteria, bacteroidetes, unclassified bacteria, thermotogae, acidobacteria, chloroflexi, with sporadic occurrences in other clades and transfers to euryarchaeota [ Fig. 1B ]. Taken together, the results of our analyses define two new domains: LiaI-LiaF-TM and Toast_rack.

PspC
PspC, a membrane protein first identified in the proteobacterial Psp system, is critical in sensing the membrane stress response and restoring envelope integrity [54,67,75] . A few recent studies have shown that PspC may function independently of PspA in some bacterial species in response to membrane stressors such as secretin toxicity [54,56,67,75,76] . Our analysis showed that PspC has two transmembrane helices, the first of them being a cryptic TM. A conserved amino acid residue, R, is found in between the two TM regions, suggesting a role in sensing ['MSA' tab in the webapp ]. The PspC domain is panbacterial and is found in a few archaeal clades [ Fig. 1C ].

PspB
PspB is another membrane protein, often found to be part of the proteobacterial Psp operon, and implicated in stress sensing and virulence [67,76,77] . PspB has an N-terminal anchoring transmembrane helix followed by an alphahelical globular domain. We found that PspB is almost completely restricted to Proteobacteria with a few transfers and with fusions only to PspC [ Fig. 1B ; See the section below on PspC domain architectures for further details].

PspM/PspN
Based on our previous [37,38,61] and this current study, we found no discernible homologs for these proteins (and constituent domains) outside the phylum actinobacteria [ Fig. 1C ].
PspM (encoded by rv2743c ), the corynebacterial integral membrane partner, comprises two TM regions and no other distinct domain. Our phylogenetic analyses of the homologs revealed that PspM has minimum variation as evident from the multiple sequence alignment ['MSA' tab in the webapp ], and a relatively narrow phyletic spread restricted to the genus Mycobacterium and its Corynebacterial neighbors, Rhodococcus and Nocardia, Actinopolysporales, Frankiales, Micromonosporales, Nakamurellales and Pseudonocardiales [ Fig. 1B ] in accordance with our previous findings [37,38,61] .
The fourth member in the Mycobacterial operon, PspN (Rv2742c), contains a short domain at the C-terminus, DUF3046, and a yet uncharacterized N-terminal domain, which we now call PspN_N [ Table 1 ]. As observed in our previous studies [37,38] , we found that the DUF3046 domain is widely prevalent across actinobacteria, but not present in other phyla [ Fig. 1C ]. Moreover, the M. tuberculosis genome shows the existence of a second copy of this domain DUF3046, only four genes downstream ( rv2738c ). We found that DUF3046 is alpha-helical with mostly conserved T and C residues as seen in the multiple sequence alignment.
We examined the DUF3046 homologs in more detail and searched for the connection between the two mycobacterial copies of the domain using the genome sequences (rather than translated protein ORFs) followed by sequence alignment reveals. This analysis revealed a few interesting features [ Fig.  1B ]. First, we found that the DUF3046 domain that is widespread across actinobacteria is closer to the short downstream protein, Rv2738c, rather than the C-terminus of the fourth member of the M. tuberculosis operon, PspN. We thus infer that Rv2738c, rather than PspN, contains the ancestral copy of the DUF3046. The DUF3046 domain that is part of the open reading frame (ORF) in PspN is likely a duplicated copy of the domain that jumped into the PspN ORF within mycobacteria, especially the M. tuberculosis complex. Moreover, unlike the pan-actinobacterial DUF3046, the N-terminal portion of PspN, PspN_N, is conserved only within M. tuberculosis , with remnants of the coding region existing as potential pseudo-genes or degraded sequences in a few closely related mycobacteria ( e.g., M. avium complex).
In summary , we have defined novel domains with careful multiple sequence alignments [ Fig. 1B ; webapp ] and delineated the phyletic spreads of PspA and its cognate partner proteins/domains, PspBC, PspMN (DUF3046), LiaIFG (LiaI-LiaF-TM, Toast_rack) across the tree of life [ Fig. 1C ]. While the membrane domains, PspB, PspM, LiaI-LiaF-TM, are highly lineage-specific (restricted to proteobacteria, actinobacteria, and firmicutes, respectively), the other two domains, PspC and Toast_rack, are much more widely-spread [ Fig. 1C ]. In addition to being present in a wide range of bacterial phyla, PspC is present in other archaeal clades and Toast_rack is present in both archaea and a few eukaryotic lineages [ Fig. 1C ]. The complete list of homologs and phyletic spreads are available under the 'Data' and 'Phylogeny' tabs in our interactive webapp . To further characterize the homologs of PspA, its cognate partner domains, and other novel domains that we found in the proximity of these Psp systems, we next carried out a detailed analysis of domain architectures, genomic contexts, and the phyletic spreads of each of these homologous domains.

Evolution of PspA
We first delved deeper into our finding of the ancestral PspA/Snf7 superfamily with the help of structure-informed multiple sequence alignment and phylogenetic tree of homologs of the PspA/Snf7 superfamily.

Identifying PspA+ and Snf7+ clades
PspA+ : As noted above, to perform an extensive and inclusive search to identify all the close and remote homologs of PspA, we started with six distinct starting points as queries (PspA from E. coli , M. tuberculosis , two copies of B. subtilis , and representatives from cyanobacteria, viridiplantae (Vipp1)). We found that most of the bacterial and archaeal clades contain PspA homologs [ Tracing the phylogeny of PspA/Snf7 to the last common ancestor To resolve the intertwined history and evolution of the PspA-Snf7 superfamily, we started with a comprehensive selection of PspA/Snf7 homologs with archetypical representatives from the distinct clades across the tree of life ['Data' tab of webapp ; [64,65] ] as well as different domain architectures [ Fig. 2, 3 ; detailed in the next section]. We also included PspA/Snf7 paralogs, when available. We further incorporated any available structural information for PspA and Snf7 to better inform the sequence alignment, such as low complexity coiled-coils in both proteins ( E. coli PspA ( 4WHE ) and S. cerevisiae Snf7 ( 5FD7 ) [ Fig. 2 ; Table 1 [80,81] ]. With the representative PspA/Snf7 proteins spanning the three kingdoms, we performed a carefully-curated multiple sequence alignment with the selected ~450 PspA/Snf7 homologs [ webapp ; See Methods ]. Finally, we used this multiple sequence alignment to generate a PspA/Snf7 phylogenetic tree [ Fig. 2 ] to trace the evolutionary history of the superfamily. The multiple sequence alignment of PspA/Snf7 also highlights variation in the position, length, and nature of the alpha-helices, e.g., four helices in most bacterial and archaeal lineages [ webapp ]. Of particular note is a unique insertion of heptad repeats in actinobacteria, likely conferring a membrane-and partner-specific adaptation. We also found a C-terminal extension in a few cyanobacterial PspA homologs that we find to be the source of the plant variant, Vipp1 [31,82] .
To complete our understanding of the phylogeny, we used the sequence alignment of a representative subset of PspA/Snf7 containing species [ Fig. 2 ] to construct a phylogenetic tree [ Fig.  2 ; See Methods ]. The Snf7 homologs serve as the out-group in our PspA tree. The first striking finding is that several bacterial (such as proteobacteria, actinobacteria, firmicutes) and archaeal clades, along with presumably their ancestral cyanobacterial relatives [ Fig. 2 ] self-organize into distinct clusters (each major clade represented by a different color), due to their sequence-structural similarity and, presumably, the natural course of evolution. In addition to the clade-specific segregation of the PspA/Snf7 homologs, the key observations from the tree (and the underlying sequence alignment) [ Fig. 2 ] are: i) The homologs from Actinobacteria, Firmicutes, and Proteobacteria form easily distinguishable clusters; ii) The cyanobacterial and eukaryotic homologs branch out (top left), in agreement with our earlier observation suggesting the PspA-Vipp1 connection in cyanobacteria; and iii) The Snf7 domain-containing homologs from archaea and eukaryota form a well-defined branch away from the PspA containing branches.
In summary , homologs from the PspA and Snf7 superfamilies result in a noteworthy finding that the helical membrane-stress response domain, PspA , is likely ancient with the earliest copy existing in the last universal common ancestor (LUCA). We have constructed a representative phylogenetic tree of PspA/Snf7 ( Fig. 2 ) based on a carefully hand-curated multiple sequence alignment ( webapp ). Next, we proceeded to characterize the PspA/Snf7 homologs in greater detail.

PspA: Novel architectures and neighborhoods
We leveraged our domain-level search to delineate the various domain architectures and genomic contexts of the PspA homologs in diverse genomes.

Domain Architectures
Our comprehensive searches starting with eight representative PspA proteins revealed that most PspA homologs (>98%; ~2500 homologs) do not show much variation in terms of their underlying domain architecture. In most lineages, PspA homologs only contain the characteristic PspA (PspA_IM30) domain without additional fusions [ Fig. 3; Table S1 ]. The remaining small fraction of homologs (<2%), they contain either repeated PspA domains or fusions with domains other than PspA [ Fig. 3 ]. For instance, cyanobacterial PspA homologs show some interesting variations: a few have dyads or triads of PspA domains stringed together [ Fig. 3 ; e.g., BAG06017.1 ] while others have an additional NLPC peptidase domain at the N-terminal end, suggesting a cell wall peptidase role for these inner membrane PspA homologs [ Fig. 3; Table S1 ; AFZ52345.1 ; [83] ].
Similar to the 98% of PspA homologs, a search for the related superfamily, Snf7, again revealed minimal variation in domain architecture, with fusions only in eukaryotes. The homologs carry one or more copies of the Snf7 domain (only found in eukaryota and some archaea) [ Fig. 3b ; Table A2 ], suggestive of a conventional role in oligomerization and membrane stabilization attributed to the eukaryotic ESCRT systems [81] .

Paralogs
We found that in genomes that contain multiple copies of PspA, the paralogs do not always have the same domain architecture and/or genomic contexts (discussed further below), e.g., firmicutes, spirochaetes [ Fig. 3 ; Table S1 ; 'Phylogeny -> Paralog' tab of the webapp ]. We noted a special case in cyanobacteria with 2-3 paralogs, two of which are adjacent proteins in the genome [ Fig. 3 ; Table  S1 ; BAG06017.1 ]. This is likely related to the scenario described above involving cyanobacterial dyads/triads with the variation that an insertion (or deletion) of intergenic sequence could have resulted in neighboring proteins rather than multi-PspA domain fusions (or vice-versa). Similar to the adjoining PspA proteins in cyanobacteria, a few archaeal and eukaryotic species have adjoining Snf7 proteins ( CBY21170.1 ; Oikopleura dioica ) suggestive of local gene duplication. The webapp ('Phylogeny' -> 'Paralog') contains the full list of PspA/Snf7 paralogs along with their domain architectures and genomic contexts with likely insights on their shared history, gene duplication vs horizontal gene transfer.
The domain architectures of PspA and Snf7, although limited in their repertoire, are prevalent in all three kingdoms of life, sometimes containing multiple copies within a genome. The ubiquity of the PspA/Snf7 superfamily indicates their inherent utility as maintainers of envelope integrity.

Novel variations of known genomic contexts
To understand their multifarious roles better, we next explored the gene neighborhoods in the vicinity of each of the Psp members [ Fig. 3 ; Table S1 ]. The genomic context analysis uncovered many distinct and novel themes of the psp system; the major contextual themes are summarized below. The full list of genomic contexts and their phyletic spreads are available in the webapp ('Data' and 'Genomic Contexts' tabs, PspA/Snf7 selection in the dropdown).

PspFABC operon
We found that PspA, PspB, and PspC appear in an operon in alpha, delta, and gammaproteobacteria, and a few spirochaetes and nitrospirae [ Fig. 3 ; Table S1 ]. We also found a few variations to this 'classic' theme. In a few species, PspC is fused to a divergent C-terminal PspB in addition to a solo PspB in the operon [ Fig. 3 ; Table S1 ; e.g., ANW39986.1 , E. coli ], while others have multiple PspB copies in the operon [ Fig. 3 ; Table S1 ; e.g., AOL22920.1 , Erythrobacter litoralis ]. PspD occurs along with this operon only in gammaproteobacteria [ Fig. 3 ; Table S1 ; e.g., ANW39986.1 , E. coli ]. The transcription regulator PspF (NtrC-AAA and HTH) occurs with a common promoter with the PspABC operon and in a few gammaproteobacteria with only PspA. The whole operon has been transferred to a few spirochaetes ( e.g., Spico_0138) with additional variations involving PspB, PspC fusions, and Toast_rack containing proteins. We found that a few operons in gammaproteobacteria ( e.g., Entas_2342 AEN65073.1 ) have an ACT and PAS domain followed by the NtrC-AAA, and in these cases, two proteins, a GTPase and a membrane-associated IIGP, have been inserted into the operon between the NtrC-AAA protein and PspABCD. A few of these have an additional canonical PspF in the operon. In some alphaproteobacteria, the PspC in PspFABC has been replaced by multiple Toast_rack containing proteins [ Fig. 3 ; Table S1 ; e.g., ANQ40502.1 , Gluconobacter oxydans ].

Classical-AAA-ATPase
Snf7 and the VPS4-like AAA-ATPase (with an N-terminal MIT domain, TPR-like) are known to occur together in archaea in the ESCRT system ( e.g., OLS27540.1 ) [84] . We found additional divergent Snf7 genes in the operon, including fusion to HTH ( e.g., OLB70630.1 ) in the archaeal eukaryotic clade, with a few bacterial transfers, mostly without the ATPase in the operon [ Table S1 ]. The corresponding VPS4-like AAA-ATPase transferred from archaea to bacteria in cyanobacteria, bacteroidetes, verrucomicrobia, nitrospirae and planctomycetes ( e.g., ACB74714.1 Opitutus terrae ) have TPR repeats followed by TM region, instead of Snf7. Further, our analysis showed that the bacterial eukaryotic PspA also occurs with an AAA-ATPase, in various bacterial clades. This gene dyad also occurs with either a previously unidentified signal peptide and transmembrane helix containing protein with a divergent Snf7 domain ( OGG56892.1 ) or other coiled-coil or alphahelical domain-containing proteins. Both PspA and the membrane-associated Snf7 along with the AAA-ATPase occur in a longer operon with ABC-ATPase, ABC Transporter, and a PBPB-OmpA containing protein.

Membrane dynamics with PspA and ATPase
Snf7 and VPS4-ATPase have previously been shown to be involved in the ESCRT-III mediated membrane remodeling in archaea [69][70][71]73] . Here, we found a similar operon of PspA and ATPase in bacteria, suggesting that a similar interaction takes place in bacteria, where PspA and ATPase are involved in the bacterial membrane dynamics [ Fig. 3 ]. Sporadically in bacteria, the PspA-AAA-ATPase module is also involved in a novel complex with both cell-surface/membrane-linked transporter and cytoplasmic PspA/Snf7 elements. The OmpA domain interacting with extracellular peptidoglycan and the ABC transporter transport a ligand in response to stress, which in turn interacts with the PspA AAA-ATPase module. Such a system could link extracellular events with PspA internally.

PspA with PspM (ClgRPspAMN) or Thioredoxin
PspA occurs as a dyad with Thioredoxin in cyanobacteria and actinobacteria [38] [ Fig. 3 ]. Analysis of this dyad showed that the actinobacterial PspA is typified by RsmP from Corynebacterium testudinoris ( AKK09942.1 ; Fig. 3 ; Table S1 ), phosphorylation of which has been shown to regulate the rod-shaped morphology in Corynebacterium glutamicum [85] . This actinobacterial PspA homolog belongs to an RsmP family cluster comprising mostly of rod-shaped actinobacteria that have PspA occurring either with Thioredoxin or with a ClgR-HTH , PspM , and an occasional low complexity protein ( AOS62694.1 , Actinoalloteichus) [ Fig. 3 ; Table S1 ]. The Corynebacterial members of the family have paralogs with both versions of PspA contexts ['Paralog' tab of the webapp ]. The association of the ClgR-HTH-PspM and Thioredoxin and their mutual exclusion in the operon suggest that they act as repressors in the case of the HTH and redox agent in the case of the thioredoxin to control the PspA [ Fig. S1 ]. The association of ClgR-HTH-PspAM is also confined to this RsmP family, suggesting that these are also determinants of rod-shaped morphology of the cell.

PspA with TM-Flotillin dyad and Band_7
Further, in bacteroidetes, proteobacteria, firmicutes, and acidobacteria, PspA occurs with another chaperone (Band_7) in the form of a conserved dyad, the TM-Flotillin dyad , with a TM protein (Signal and 2 TM and a C-terminal domain, Pfam: DUF1449), a protein with an anchoring TM, Band_7 domain, coiled-coil linker, and the flotillin domain [ Fig. 3 ; Table S1 ; ANH61663.1 , Dokdonia]. In most bacteroidetes and a few proteobacteria, both CesT/Tir and TM-Flotillin dyads occur together with PspA [ Fig. 3 ; Table S1 ; AAN56746.1 , Shewanella]. In most of these combinations, an additional protein with beta-propeller and AAA-ATPase is seen in the operon.
We found that a few of the Flotillin operon dyads without proximal PspA have the beta-propeller/ATPase in the operon in some actinobacteria, but they all have PspA elsewhere in the genome. An example is the TM protein of the Flotillin operon dyad, belonging to the family typified by yuaF in Bacillus subtilis , which is implicated in the resistance to cell wall antibiotics [86] ; also similar to the yqiJ family in Escherichia coli . Analysis of this yqiJ protein showed that the TM-Flotillin operon dyad is widespread in archaea and bacteria. Even though most of these dyads have PspA elsewhere in the genome and in the same operon, yqiK (Band_7/Flotillin containing protein with two N-terminal TM and operonic partner of yqiJ) has been shown to support the stress control function of PspA [87] . The cardiolipin-rich inner membrane and the Flotillin complex support the PspA components -either the associated members in the operon or the PspFABC operon elsewhere in the genome -to control membrane dynamics [REF].
In some firmicutes, PspA ( AAS43862.1 , Bacillus) occurs with the yjFL-likeTM protein (DUF350 ) and DUF4178. In actinobacteria, the membrane-associated Spermine Synthase (Rv2601 M. tuberculosis H37Rv ) occurs in an operon, without PspA, with the yjFL-like TM and DUF4178 protein (Rv2597, Mycobacterium) and two other uncharacterized domains. Most of these actinobacteria have a PspA elsewhere in the genome. In beta-and deltaproteobacteria, many of which do not have PspA in the genome, the membrane-associated Spermine synthase-yjfL-TM-DUF4178 triad occurs without PspA and with the amino oxidase and adenosylmethionine decarboxylase proteins and a Band_7 chaperone with a C-terminal SHOCT (in betaproteobacteria). Spermidine is a polyamine that is involved in the mechanical stability of membranes, and Spermidine synthase, speE, is implicated in its biosynthesis [88,89] , the amino oxidase and decarboxylase respectively lengthening and shortening the polyamine. Taken together, the membrane-associated Spermine synthase operon regulates the spermidine synthesis and PspA ports into this operon to control the membrane thickness.

PspA with TPM_phosphatase
Another novel operon with a chaperone is found in some firmicutes typified by ydjF ( ANX09535.1 , Bacillus), where PspA is found with a tail-anchored protein ( ydjG ) comprising two ZnR and a globular domain, a membrane-associated TPM_phosphatase domain ( ydjH ) [90] , and a Band_7 with a C-terminal ZnR ( ydjL ) [ Fig. 3 ; Table S1 ]. The ZnR domain in this system is likely involved in membrane binding. A variation occurs in actinobacteria, wherein PspA ( AKL67689.1 , Streptomyces) is seen in an operon with a protein containing extracellular.

Membrane dynamics with Chaperone
The strong association of PspA with the CesT-Tir type chaperone of two divergent families and with the Band_7 chaperone suggests that the chaperones help in assembling the complexes/ superstructures with PspA and/or Beta-propeller, with the AAA-ATPase used for energy, whenever present. This superstructure building can be triggered at times of stress and can be used to stabilize the membrane. A similar theme is followed in the association with the Band_7 chaperone, where the Flotillin complex may be used as a base to build the PspA complex to strengthen the membrane. We found that an additional element in the form of the Beta-propeller is introduced in the complex in some bacteria. The PspA/CesT_Tir dyad and PspA by itself in some cases is recruited by the Spermine synthase operon to control the membrane stability by increasing or decreasing spermine synthesis in response to environmental stimuli [88] .

Classic Two-component Transcriptional signaling system
The two-component signaling system utilizes either the PspA dominant system in firmicutes and other clades or the PspC dominant system mainly in actinobacteria. As in all two-component systems, the membrane-bound Histidine Kinase communicates a signal based on environmental stress to the response regulator REC/HTH containing protein, which regulates the expression of target genes. We note that PspA, LiaI-LiaF-TM, and Toast_rack tie into this system to strengthen the membrane in response to the stress signal. In actinobacteria where the Histidine Kinase is fused to PspC, the signal is sensed by PspC, and Toast_rack can act as the internal sensor. Moreover, other PspA systems elsewhere in the genome could be recruited in addition whenever they are not found in the immediate operonic neighborhood. Following the detailed analyses of PspA that uncovered novel insights into the many variations in its genomic contexts, we proceeded to analyze the domain architectures of Psp partner proteins to get a more holistic understanding of the stress response system.

PspA-free variations of domain architectures
Conducting searches with multiple proteins and their domains also shed light on the existence of several domain architectures and genomic contexts involving Psp components that do not carry the central protein PspA. Hence, we next investigated these instances to determine their possible relevance to the extended phage-shock protein stress response systems. Below we have described some of our major findings related to the Toast_rack domain and its homologs and the PspC domain before outlining a range of novel Psp-A free genomic contexts.

Toast_rack Domain Architectures
We first examined the Toast_rack domain and all its protein homologs obtained from various starting points because we found it to be the most widely-spread domain [ Fig. 4 ; Table S2 ; see Methods]. The first notable finding is that the Toast_rack domains in most proteins are C-terminal with an N-terminal transmembrane region comprising the defined PspC, LiaI-LiaF-TM, other generic TMs, or a combination thereof [ Fig. 4 ]. We also found versions with an N-terminal anchoring TM region or with N-terminal alphahelical domains. Analysis of the Toast_rack containing proteins revealed various fusions including two previously uncharacterized N-terminal alphahelical domains described below.

Toast_rack in actinobacteria and firmicutes: SHOCT and SHOCT-like proteins
Toast_rack containing proteins in actinobacteria ( CCP43715.1 , Mycobacterium) with a few transfers to proteobacteria, spirochaetes, and other bacteria are fused to an N-terminal SHOCT-bihelical domain (Pfam: DUF1707 [91] ) and a C-terminal Toast_rack [ Fig. 4 ]. Based on sequence analysis, we predict that the SHOCT domain will act as an anchoring domain to the sub-membrane region.
We found another novel variant of the SHOCT domain, with a bihelical secondary structure (that we call the SHOCT-like domain to distinguish from the SHOCT domain), fused to the N-terminus of Toast_rack domain-containing protein typified by Bacillus subtilis yvlB ( SHOCT-like is at the C-terminus of the current DUF2089 profile in Pfam. CAB15517.1 , Bacillus) [ Fig. 4 ; Table S2 ]. YvlB has been implicated in the activation of the LiaFSR operon [92] . In a few members of the Lactobacillus and Enterococcus genera, a coiled-coil linker is found between the two domains (SHOCT-like and Toast_rack; AQL53774.1 ) [ Fig. 4 ; Table S2 ]. Analysis of the SHOCT-like domain showed that it has independent and widespread existence with a different C-terminal globular domain ( BMQ_3691 , Bacillus) [ Fig. 4 ; Table S2 ]. Several of the members of the BMQ_3691 family occur in an operon with another divergent version of the domain ( BMQ_3692 ) with an N-terminal ZnR fusion, a DUF2089-HTH in the middle, and a C-terminal SHOCT-like domain. Both the protein families BMQ_3691/3692 are found in firmicutes, chloroflexi, thermotogae, deinococcus, armatimonadetes, a few actinobacteria, and unclassified bacteria [ Fig. 4 ; Table S2 ].

Toast_rack in Bacteroidetes: DUF1700 and DUF1700-like proteins
We find another recurring transmembrane fusion theme in Toast_rack containing proteins in Bacteroidetes ( AOW09537.1 , Flavobacterium), firmicutes, betaproteobacteria, and a few other bacterial clades, with an N-terminal DUF1700-like alphahelical domain (that we call, DUF1700-ahelical), followed by PspC, LiaI-LiaF-TM or other TM regions and a C-terminal Toast_rack domain (Pfam: DUF1700, DUF1129, clan Yip1) [ Fig. 4 ; Table S2 ]. The DUF1700-like alpha-helical domain belongs to the Yip1 clan in PFAM. An HHpred search with the alignment retrieves PDB structure 2O3LA of DUF1048 family, mostly in firmicutes, belonging to the same clan. These members also have PadR-HTH in the operon. Analysis of the sequence alignment shows that the structure has three core helices, and the third helix has a bump in the middle due to a conserved GxP motif, which forms the edge of a binding groove. Other than its fusions to Toast_rack, the DUF1700-like alphahelical domain is fused to either intracellular sensor domain like the Pentapeptide repeats [ AOH56696.1 , Bacillus] (mostly in firmicutes), or to various transmembrane or intramembrane sensor domains, instead of PspC and LiaI-LiaF-TM [ Fig. 4 ; Table S1 ]. The DUF1700-ahelical domain also occurs at the N-terminus of distinct TM proteins such as the MacB_PCD extracellular domain (PDB:3IS6_B -like) containing FtsX proteins (acidobacteria, gemmatimonadetes, verrucomicrobia with transfers to few other clades), certain members of the firmicutes FtsW/RodA/SpoVE cell cycle family proteins [ Fig. 4 ; Table S1 ; CAC98500.1 , Listeria], the vanZ protein (actinobacteria), and various other 6-TM, 4-TM and 2-TM proteins. While the 4-TM family forms a major group and is likely to be a divergent LiaI-LiaF-TM domain, members of the 6-TM family [ Fig. 4 ; Table S1 ; AAO82145.1 , Enterococcus] are fused at the N-terminus either to a wHTH or to the DUF1700-like alphahelical domain, while some paralogs are fused sporadically to the SHOCT-like bihelical domain. The N-terminal TMs or alphahelical domains help attach Toast_rack to the membrane or sub-membrane where it acts as a sensor domain.

Toast_rack in Cyanobacteria
Toast_rack also has sporadic fusions to Caspase in cyanobacteria [ Fig. 4 ; Table S2 ; AFY83227.1 , Oscillatoria]. Our analysis of the topology of the above architectures indicates that Toast_rack is an intracellular domain.

PspC domain architectures
In summary , we have discovered several novel domain architectures containing Psp partner domains, in addition to those previously reported. For instance, we have identified several lineage-specific domain architectures of Toast_rack with N-terminal transmembrane domain fusions, such as DUF1700-ahelical in bacteroidetes, LiaI-LiaF-TM in firmicutes, SHOCT-bihelical and SHOCT-like domains in actinobacteria and firmicutes, and PspC in actinobacteria [ Fig. 4 , Table S2 ]. We also found that, in addition to solo PspC domains being the most prevalent, PspC can be fused to PspB, DUF1700-ahelical, Toast_rack, and LiaI-LiaF-TM domains [ Fig. 4 , Table S2 ]. All variations of domain architectures that do not contain PspA can be explored in the webapp .
Novel PspA-free Genomic contexts Since genomic contexts are critical to the function of bacterial and archaeal proteins, we next explored the neighborhoods of each of the 'PspA-free' homologs of the Psp partner domains.
Unlike the PspA-associated Lia-like two-component systems (detailed in the PspA genomic context section above), in most other actinobacteria, a different type of Histidine Kinase, one with an N-terminal PspC, followed by LiaF-LiaI-TM region and a terminal Histidine Kinase ( e.g., BAB98004.1 , Corynebacterium) occurs with a Receiver/HTH-containing gene [ Fig. 5 ; Table S2 ]. PspA is not seen in these contexts. This dyad shares a promoter region with a protein containing PspC, additional TMs, and Toast_rack [ Fig. 5 ; Table S2 ]. Some actinobacterial genomes ( e.g., BAD60342.1 , Nocardia) have an additional two-component dyad of the firmicute type with only a SHOCT-bihelical and Toast_rack containing protein in the operon. In addition, in a few deinococci, a Toast_rack protein ( ACO46500.1 ) is sandwiched between a two-component dyad [ Fig. 5 ; Table S2 ].

DUF1700 with PadR-HTH
The DUF1700-like alphahelical domain (HHpred 2O3L_A) in all its fusions almost always occurs in a tightly associated operon with a wHTH (Pfam: PadR ; lltR / lstR in Listeria monocytogenes [96] ]). Given the widespread phyletic spread of the domain and its association with this operon, the operon can be traced back to the base of the bacterial lineage. In the Psp system, the DUF1700 domain is either found fused to PspC, LiaI-LiaF-TM, and Toast_rack or occurs in an operon with them [ Fig. 5 ; Table  S2 ; AKD55949.1 , Spirosoma]. In all these cases the PadR-like wHTH is found in the operon. In the family with 6TM domain ( AAO82145.1 , Enterococcus), that is fused to either a wHTH at the N-terminus or to the DUF1700-ahelical domain, the PadR-HTH is found in the operon only in fusions to the DUF1700-ahelical domain [ Fig. 5 ; Table S2 ].
The only instance where wHTH occurs without the DUF1700 alphahelical domain-containing Toast_rack is in a family of intracellular Toast_rack in actinobacteria typified by ( CAC05954.1 ) in Streptomyces coelicolor . The DNA-binding domain LytTR is found in an operon with Toast_rack fused to LiaI-LiaF-TM (LA14_0400) in firmicutes Clostridium and Lactobacillus [ Fig. 5 ; Table S2 ].

DUF2089-HTH
The DUF2089-HTH containing BMQ_3692 family, with N-terminal ZnR fusion and C-terminal SHOCT-like domain is found in an operon with the SHOCT-like and the globular C-terminal domain (BMQ_3691, Bacillus megaterium ) in firmicutes, chloroflexi, thermotogae, deinococcus, armatimonadetes, unclassified bacteria, and a few actinobacteria [ Fig. 5 ; Table S2 ]. Some members ( e.g., DR_0017, Deinococcus radiodurans ) have a Toast_rack with N-terminal SHOCT-like ( AAF09613.1 , Deinococcus) in the operon in addition to or instead of the SHOCT-like and the globular C-terminal domain (BMQ_3691, Bacillus megaterium ) [ Fig. 5 ; Table S2 ]. The versions of the BMQ_3692 family without the N-terminal ZnR ( e.g., ADJ59725.1 LLNZ_03680 of Lactococcus lactis ) are found in an operon with a 2 TM protein [ Fig. 5 ; Table S2 ]. The versions of the BMQ_3692 family without the C-terminal SHOCT-like domain, but retaining an intact ZnR, in Bacteroidetes is found in an operon with a secreted AB-Hydrolase, a membrane protein, and a membrane attached PH domain ( e.g., AIL47037.1 , Elizabethkingia) [ Fig. 5 ; Table S2 ].

SIGMA-HTH
We found that various Bacteroidetes, firmicutes, acidobacteria, verrucomicrobia, gemmatimonadetes, few chloroflexi, and actinobacteria contain a triad of SIGMA-HTH followed by a protein with ZF and TPR connected by a Transmembrane Helix (with some lacking the ZF) and an anchored Toast_rack domain [ Fig. 5 ; Table S2 ]. The third gene in this system may be followed by LiaI-LiaF-TM and Toast_rack combinations. In a few acidobacteria ( AEU34960.1 , Granulicella), a protein with BBOX domain fused to a LiaI-LiaF-TM is found in the operon [ Fig. 5 ; Table S2 ].

GNTR-HTH and TraJ-RHH
A transport operon with ABC-ATPase and a transporter protein along with a protein with GNTR-HTH and a C-terminal alphahelical domain and a protein with an anchored Toast_rack (typified by AAM36414.1 , Xanthomonas) in the operon is found in gammaproteobacteria and a few Bacteroidetes, firmicutes, and spirochaetes [ Fig. 5 ; Table S2 ]. Many of these operons have a DUF2884 domain-containing lipoprotein. A convergent operon is found in actinobacteria where a Toast_rack containing protein occurs with the transport operon along with a TraJ-RHH containing protein instead of the GNTR-HTH, while B737_2034 in Amycolatopsis mediterranei ( AGT82698.1 ) has an additional GNTR-HTH fused to an FCD small molecule binding domain in the operon. The TraJ-RHH containing protein has an N-terminal alphahelical domain and a C-terminal strand.
The transport operon with ABC-ATPase and a transporter protein is found in an operon with an N-terminally anchored Toast_rack containing protein sporadically in firmicutes ( AJI31452.1 Bacillus cereus BF28_3762), actinobacteria (two tandem proteins with Toast_rack ALR80200.1 Streptococcus salivarius ) and euryarchaeota [ Fig. 5 ; Table S2 ]. Some actinobacterial DUF1707-SHOCT-like bihelical domain and Toast_rack containing protein also occur with the ABC transport operon with two permease proteins ( CAB88834.1 Streptomyces coelicolor A3 SCO2893) [ Fig. 5 ; Table S2 ]. Some of the aforementioned ABC transporter operons have a two-component system inserted in the operon.

Lessons from the Novel Transcriptional Signaling systems
The obligate nature of the pairing of the DUF1700-ahelical domain and PadR-HTH suggests that they bind to each other and, upon sensing a stimulus by the PspC or equivalent TM region fused to the DUF1700-ahelical domain, the PadR-HTH is released to induce transcription. This kind of signal transmission is similar to the two-component system, where the signal is transferred by phosphorylation of the REC domain. Thus, the DUF1700-ahelical domain and PadR-HTH define a novel signaling paradigm, and PspC and Toast_rack elements of the Psp system are ported into this novel signaling system, whereby it regulates the membrane dynamics. In the cases where Toast_rack and LytTR-HTH occur in an operon together or as a fusion without the DUF1700-ahelical domain, Toast_rack is predicted to perform a similar function. The DUF2089-HTH, Toast_rack, and SHOCT-like systems define a novel transcriptional regulatory system where the SHOCT-like domain is used to attach to the membrane, and the signal is transmitted by Toast_rack. The GNTR-HTH and TraJ-RHH associated operons port the Toast_rack into an ABC transporter system, which allows control of transport and links it to membrane strengthening. While these operons do not have PspA or PspC, the organisms have these psp elements elsewhere in the genome, and these would be recruited into the system as demonstrated by the PadR-HTH deletion experiments affecting the Lia system in Listeria [96] .
In summary , in addition to novel extensions and variations of the previously known Psp operons (with PspBC, LiaI-LiaF-TM/Toast_rack, and two-component systems), we have discovered and discussed in detail several novel players and related genomic contexts. For example, we found several novel transcriptional signaling systems (such as PadR-HTH, wHTH, DUF2089-HTH, SIGMA-HTH, GNTR-HTH, and TraJ-RHH) co-occurring with Toast_rack, DUF1700-ahelical, and SHOCT-like proteins [ Fig. 5 , Table S2 ]. Our study has identified several new extensions to the membrane stress response system. To conclude our study, we finally proceeded to consolidate all our findings in the context of PspA and Psp partner domains to identify the extended network of stress response players in bacteria, archaea, and eukaryotes.

Psp Consolidated
In the sections above, we have identified a variety of genomic configurations of the Psp systems and novel partner domains that likely function alongside or independent of PspA. We have also determined key neighbors of PspC and Toast_rack, the most wide-spread PspA partners, as well as the domains that frequently co-occur with our initial set of proteins of interest. We next asked how to reconcile the different pieces of the Psp puzzle that we have identified.

The proximity network of PspA and cognate partner domains
To generate a global picture, we generated a network representation that is based on the domain architectures of the larger Psp system reported above [ Fig. 6A ]. The domains are the nodes of this network, and they are connected pair-wise if they co-occur in a single protein. The network thus summarizes the connections between PspA, Psp cognate partner domains (such as PspC and Toast_rack), and other novel domain connections such as DUF1700-ahelical and SHOCT-bihelical/SHOCT-like proteins.

The phyletic spread of Psp members and their prevalent partner domains
To bring together the evolutionary patterns of all the Psp members and their key partners, we next created a single heatmap that displays their phyletic spreads along with their prevalence across lineages [ Fig. 6B ]. This view recapitulates the findings that (i) only PspA, Snf7, and Toast_rack are present in eukaryotes; (ii) while PspC, PspA/Snf7 are present in most archaeal lineages, there are occasional transfers of Toast_rack, LiaI-LiaF-TM, and SHOCT-bihelical to euryarchaeota; (iii) co-migrations in bacteria of domains such as Toast_rack, PspC, PspA, and LiaI-LiaF-TM.

The prevalence of most common neighbors of Psp members
We next adopted the upset plot to summarize the overlap of several sets of domain architectures concurrently. Using this visualization, we quantified the relative occurrences of domain architectures from Psp members and their most common neighbors [ Fig. 6C , blue histogram] as well as combinations [ Fig. 6C ; dots and connections] and their frequencies [ Fig. 6C , red histogram] of co-occurrences of these domain architectures in specific genomic contexts. For example, we found that the Toast_rack and LiaI-LiaF-TM context that contains the two-component system and DUF1700-ahelical co-occurring with PadR-HTH are among the most prevalent neighborhoods closely following singletons such as PspA and PspC without characterized neighborhoods. It is noted that Snf7 appearing alone most frequently is an artifact of the genomic context analysis being restricted (and relevant) to bacteria and archaea.
In summary , these visualizations help us build a consolidated picture of all the aspects of the much-expanded Psp universe as ascertained by our detailed molecular evolution and phylogenetic analysis [ Fig. 6 ].

Conclusions
Recent work with the phage shock protein (Psp) stress response system in bacteria suggests that even though Psp function appears to be maintained across phyla, the sequence and structure of the accessory Psp proteins, the regulatory mechanisms, and the membrane stress response dynamics of the system vary widely among bacterial species [12,14,38,61] . PspA has also been shown to be needed for maintaining the cell shape in Corynebacterium [85] . Studies in archaea show that PspA is involved in cell division [35]. In eukaryotes, PspA is involved in thylakoid biogenesis and vesicular budding [99] . These variations likely reflect different envelope structures and genetic and environmental factors that necessitate the concomitant evolution of these membrane stress response systems.
In this study, we report the results of a comprehensive and detailed analysis of the evolution of the Psp system and all its partner proteins, focusing on the phylogenetic distribution, sequence/structural features of PspA, its known and unknown partners, and their genomic neighborhoods across the tree of life. We used this analysis to explore and identify novel components in the Psp stress response system. Notably, we first establish that PspA is indeed universal and that the ancestry of the PspA/Snf7 superfamily (with the eukaryotic ESCRT system) dates back to the last universal common ancestor (LUCA). We also define novel domains such as LiaI-LiaF-TM and Toast_rack. We then go beyond the global phyletic distribution of PspA and its known partners, to identify distinct and commonly occurring themes (domain architectures and genomic contexts) in Psp systems that are both widely distributed ( e.g., PspA/Snf7, PspC, and Toast_rack) and restricted to specific lineages ( e.g., discovering DUF1700-ahelical and SHOCT-bihelical often in the proximity of Toast_rack, and mostly in firmicutes and actinobacteria, respectively). We also find several novel proximal domains including PspA/Snf7 alongside AAA-ATPase, PspA with PspAA (a novel PspA-associated domain that we find in actinobacteria and archaea), and PspC often co-occurring with Toast_rack and two-component systems in actinobacteria rather the 'classic' proteobacterial context. Finally, in addition to novel extensions and variations of known Psp operons (with PspBC, LiaI-LiaF-TM/Toast_rack, and two-component systems), we have discovered several novel components such new transcriptional signaling systems and their pertinent genomic contexts.
We present all the findings (data, results, and visual summaries) from this study for in-depth exploration in an interactive and queryable web application available at https://jravilab.shinyapps.io/psp-evolution .

Methods
We used a computational evolutionary approach for the molecular characterization and neighborhood analyses of the Psp stress-response system and its partners.

Identification and characterization of protein homologs
To ensure the identification of a comprehensive set of homologs (close and remote) for each queried protein, we performed iterative searches using PSIBLAST [106] with sequences of both full-length proteins and the corresponding constituent domains. For each protein, searches were conducted using homologous copies from multiple species as starting points. Search results were aggregated and the numbers of homologs per species and of genomes carrying each of the query proteins were recorded [ webapp ]. These proteins were clustered into orthologous families using the similarity-based clustering program BLASTCLUST [107] . HHPred, SignalP, TMHMM, Phobius, JPred, Pfam, and custom profile databases [108][109][110][111][112][113][114][115] were used to identify signal peptides, transmembrane regions, known domains, and the secondary protein structures in every genome. Homolog information including domain architectures for each of the Psp member proteins is shown in the webapp ('Data' and 'Domain architectures' tabs).

Neighborhood search
Bacterial gene neighborhoods -± 7 genes flanking each protein homolog -were retrieved using custom scripts from GenBank [100,101] . Gene orientation, domains, and secondary structures of the neighboring proteins were characterized using the same methods applied to query homologs above . Genomic contexts for each of the Psp members are shown in the webapp ('Genomic contexts' tab).

Phylogenetic analysis
Multiple sequence alignment (MSA) of the identified homologs was performed using Kalign [116,117] and MUSCLE [118,119] . The phylogenetic tree was constructed using FastTree 2.1 with default parameters [120] . MSA and phylogenetic trees for each of the Psp members are shown in the webapp ('Phylogeny' tab).

Network reconstruction
TThe Psp proximal neighborhood network was constructed based on the domain architectures and genomic contexts of PspA and its cognate partner proteins. The nodes represented the domains and edges indicated a shared neighborhood (domains of the same protein or neighboring proteins). Proximity networks for each of the Psp member proteins are shown in the webapp ('Domain architectures' tab).

Web application
The interactive and queriable web application was built using R Shiny [121,122] . All data analyses and visualizations were carried out using R/RStudio [123,124] and R-packages [125][126][127][127][128][129][130][131][132][133][134][135][136][137] . All the data and results from our study are available on the web application ( https://jravilab.shinyapps.io/psp-evolution ).  The phylogenetic tree was constructed based on a multiple sequence alignment performed using representative PspA homologs across all the major kingdoms/phyla (see Methods). The key lineages are labeled next to distinct clusters of similar PspA proteins. The insets show the 3D structures for PspA ( 4WHE ) and Snf7 ( 5FD7 ) from the Protein Data Bank. The tree leaves are labeled by lineage, species, and accession numbers. The outgroup at the bottom of the tree includes Snf7 homologs.   The genomic contexts are presented using the same schematic as in Figure 3 . The focus is on Psp partner domains such as Toast_rack (blue), PspC, LiaI-LiaF-TM, DUF1700-ahelical, SHOCT-bihelical (in shades of orange), and the various genomic neighborhoods with SHOCT-like proteins, transcription regulators (PadR-HTH, SIGMA-HTH, GerE-HTH, LytTR, GNTR-HTH), and two-component systems. Further details of the genomic contexts of all PspA-free partner domain homologs, and their phyletic spreads are in the webapp , and representatives indicated in the figure are shown in Table S2 .