Giant viruses encode novel types of actins possibly related to the origin of eukaryotic actin: the viractins

Actin is a major component of the eukaryotic cytoskeleton. Many related actin homologues can be found in eukaryotes1, some of them being present in most or all eukaryotic lineages. The gene repertoire of the Last Eukaryotic Common Ancestor (LECA) therefore would have harbored both actin and various actin-related proteins (ARPs). A current hypothesis is that the different ARPs originated by gene duplication in the proto-eukaryotic lineage from an actin gene that was inherited from Asgard archaea. Here, we report the first detection of actin-related genes in viruses (viractins), encoded by 19 genomes belonging to the Imitervirales, a viral order encompassing the giant Mimiviridae. Most viractins were closely related to the actin, contrasting with actin-related genes of Asgard archaea and Bathyarchaea (a newly discovered clade). Our phylogenetic analysis suggests viractins could have been acquired from proto-eukaryotes and possibly gave rise to the conventional eukaryotic actin after being reintroduced into the pre-LECA eukaryotic lineage.


Introduction
Actin is a major component of the eukaryotic cytoskeleton. Many related actin homologues can be found in eukaryotes 1 , some of them being present in most or all eukaryotic lineages 2 . The gene repertoire of the Last Eukaryotic Common Ancestor (LECA) therefore would have harbored both actin and various actin-related proteins (ARPs) 1,2 . A current hypothesis is that the different ARPs originated by gene duplication in the proto-eukaryotic lineage from an actin gene that was inherited from Asgard archaea 3,4 . Here, we report the first detection of actinrelated genes in viruses (viractins), encoded by 19 genomes belonging to the Imitervirales, a viral order encompassing the giant Mimiviridae 5 . Most viractins were closely related to the actin, contrasting with actin-related genes of Asgard archaea and Bathyarchaea (a newly discovered clade). Our phylogenetic analysis suggests viractins could have been acquired from proto-eukaryotes and possibly gave rise to the conventional eukaryotic actin after being reintroduced into the pre-LECA eukaryotic lineage.

The discovery of viractins
We first detected an actin-like gene (thereafter dubbed viractin) in the giant virus Yasminevirus, recently isolated from sewage water by means of amoeba coculture 6 . Yasminevirus belongs to the Mimiviridae family 7,8 within the proposed Klosneuvirinae 6,9 subfamily. Mimiviridae are giant viruses belonging to the Nucleocytoviricota 5 viral phylum, previously known as the NucleoCytoplasmic Large DNA virus assemblage (NCLDV) 7 . Using the Yasminevirus viractin gene as a query, we detected additional viractins in 16 metagenome-assembled genomes (MAGs) of Nucleocytoviricota originating mostly from marine and freshwater systems 9-11 , as well as in two additional MAGs of Nucleocytoviricota we characterized from the sunlit ocean (see method section). Table 1 10,11 . These clades correspond to the newly revealed diversity of Mimiviridae relatives that have been recently included into the Imitervirales 5 order. The position of the viruses encoding viractin within the Imitervirales order was confirmed with a phylogenetic reconstruction of representative sequences using previously studied markers and datasets ( Fig   S1). Notably, at least two Imitervirales lineages (Yasminevirus-like and MVGL55) are enriched in viractin, suggesting a specific recruitment of actin and actin-related proteins by the viral common ancestors of these clades instead of recent and independent multiple acquisitions in few Mimiviridae-related genomes. The actin encoding MVGL55 viruses were identified in lakes (Lanier and Michigan Lake), oceans (Atlantic ocean, Pacific ocean, Arctic ocean), and seas ( North sea and Mediterranean sea )( Table 1, table S1). Moreover, the two Yasminevirus encoding viractins were characterized from very different environments, sewage water from Jeddah in Saudi Arabia and the Pacific Ocean. Thus, while the isolation of Yasminevirus provides proof of the existence of viractin, environmental surveys (including the metagenomic harvest of Tara Oceans expeditions 12 ) reveal that not one but multiple clades of Mimiviridaerelated viruses, within and beyond the Klosneuvirinae encode viractin, always found in single copy. Completion was estimated using HMMs targeting eight NCLDV gene markers 8 . The identity is given for a coverage ranging from 99% to 100% to the human actin (* for viractin 04 the identity is given for a coverage of 73%). Details in table S1.

At least four lineages of viractins in the Imitervirales order
Since viractin was previously unknown in the viral world, we hypothesized that Imitervirales containing viractin recruited this gene from their hosts. We performed a phylogenetic analysis to determine if this recruitment occurred only once or several times independently, and to possibly find out the original eukaryotic host(s). We included the 19 newly discovered viractins together with actin sequences, and those of various clades of eukaryotic ARPs (ARP-01 -also called centractin -to ARP-10). We also included all ARP sequences recently discovered in Asgard archaea (hereafter asgardactins) and a new group of ARPs, hereafter dubbed as bathyactins, that we unexpectedly identified in some Bathyarchaea 14,15 (see table S1). It is, to our knowledge, the first time that such a closely related homologue of eukaryotic actin is detected in Archaea not belonging to the Asgard archaea. These Bathyarchaea additionally encode crenactin, a more distantly related actin-like protein encoded by most Crenarchaea, Bathyarchaea, Aigarchaea and Korarchaea (Fig S2; see Method for the selection of sequences). We used these archaeal crenactins as the outgroup for rooting (Fig 1, Fig S2).
In our phylogenetic tree (Fig 1), no sequence nor clade was branching close to the crenactin outgroup, suggesting that bathyactins, asgardactins, and viractins are indeed all related to the eukaryotic actins and ARPs, forming a large clade hereafter dubbed the EL-actin (Eukaryotic-4 Like actin) clade. The actin and all eukaryotic ARP formed individual monophyletic clades that were clearly separated from each other in our tree (Fig 1). The 19 viractins were structured in four clades, viractin 01-04, corresponding to the four different groups of Imitervirales encoding viractins (see Table 1, Fig 1). On the front of Archaea, all bathyactins were grouped in a single clade while the asgardactins were separated in five clades (herein called asgardactin 01-05), in line with recent phylogenetic analyses 16  Comparing the predicted structure models of one reference sequence for each of the different clades of viractin, asgardactin, bathyactin, and ARPs proteins with the actin structure revealed that all predicted structural domains are conserved in EL-actins, indicating that these proteins could share similar biochemical functions (Fig S2).

The origin of viractins
Surprisingly, viractins did not branch within either the eukaryotic actins or any of the eukaryotic clades of ARPs, as would be expected in the case of a recent transfer from modern eukaryotes to Imitervirales. Instead, these monophyletic viractin clades branched at two positions between the different eukaryotic clades (Fig 1). Viractins 01, 02, and 03 were basal to the actin, whereas the shorter viractin 04 (ca. 75% of the average length) branched at the root of a clade grouping ARP-10 and asgardactin 04-05, clearly indicating that viractins were recruited at least twice independently by different Mimiviridae-related clades. Importantly, the eukaryotic actin and all ARP clades but ARP-10 include protists and pluricellular eukaryotes from different supergroups, including Amorphea, Archaeplastida, TSAR and Excavates 17 , indicating that they were most likely acquired before the emergence of modern eukaryotes and were hence already present in LECA. Consequently, most nodes at the base of each of these eukaryotic clades correspond to the relative position of LECA (Fig S2).
The basal position of the viractins could be an indication that viractins evolved more rapidly than ARPs and actin and were artificially attracted in our phylogenetic reconstruction outside of the cellular clades they should be branching with by a phenomenon of long branch attraction (LBA), however the lengths of their branches were similar to those of their cellular counterparts. It hence seems more likely that they were recruited by ancient Imitervirales from proto-eukaryotes, before LECA and the diversification of modern eukaryotes. The topology of the phylogenetic tree with viractins 01 to 03, corresponding to different Mimiviridae-related lineages ( Table 1 and Fig S1), closely related to each other but not as a single monophyletic clade, suggests a complex evolutionary history of transfers and losses through the co-evolution of Imitervirales and their hosts. Interestingly, this topology implies that the Nucleocytoviricota were not only already diversified at the family level before LECA, as suggested previously from analyses on the DNA-dependent RNA polymerase 8 and taxon richness and diversity 18 , but also at the subfamily level for the Mimiviridae.

Genome-resolved metagenomics. Two metagenome-assembled genomes (MAGs) of
Mimiviridae containing a viractin were characterized from metagenomes of TARA Oceans (Pacific Ocean and Mediterranean Sea), by performing manual binning and curation on large size fractions of surface ocean plankton (0.8-2,000 microns) 32 . Briefly, we used the anvi'o platform 33 and a co-assembly strategy followed by binning using sequence composition and differential coverage, as previously applied to a small size fraction of surface ocean plankton