Incorporation of genome-bound cellular proteins into HIV-1 particles regulates viral infection

The initial steps of the human immunodeficiency virus 1 (HIV-1) lifecycle are regulated by cellular RNA-binding proteins (RBPs). To understand the scope of these early host-virus interactions, we developed in virion RNA interactome capture (ivRIC), which allowed the comprehensive and systematic profiling of the proteins that interact with the HIV-1 genomic (g)RNA inside viral particles. ivRIC identified 104 cellular RBPs within the encapsidated HIV-1 ribonucleoprotein, many of which are typically found in the cellular nucleus. Notably, these nuclear RBPs interact with the HIV-1 RBP Rev, suggesting that they associate with HIV-1 gRNA during its nuclear life. Functional assays show that ivRBPs are important for HIV-1, including PURA and PURB, which control viral gene expression and infectivity through interaction with critical sequences in the gRNA. Our characterisation of the composition of the encapsidated ribonucleoprotein of HIV-1 uncovers new host-virus interactions that invokes new mechanisms for controlling HIV-1 infection.


INTRODUCTION
Human immunodeficiency virus type 1 (HIV-1) RNA genome (gRNA) is reverse transcribed (RT) into DNA and integrated into the chromosome of the host cell. From this step onwards, cellular RNA-binding proteins (RBPs) become central players in the HIV-1 lifecycle, regulating or mediating virtually all molecular processes involving the viral RNA.
Transcription and processing of HIV-1 RNAs is mediated by a plethora of cellular RBPs, including the RNA polymerase II, its associated co-factors and regulatory complexes, the capping and polyadenylation machinery and the spliceosome, amongst others 1 . HIV-1 produces several viral RNA species that can be divided in single spliced, fully spliced and unspliced (i.e. genome) RNAs 2 . Fully spliced HIV-1 RNAs are transported to the cytoplasm using the canonical mRNA export pathways involving many cellular RBPs 3 . However, unspliced and partially spliced RNAs are retained in the nucleus. The viral RBP Rev is expressed from a fully spliced RNA and imported into the nucleus, where it binds to the Rev response element (RRE) 4 present in unspliced (gRNA) and single spliced HIV-1 RNAs. Rev then mediates the nuclear export of these RNAs through the CRM1 pathway, involving several cellular RBPs as co-factors 2 . Many RBPs are known to also contribute to the downstream steps of viral gene expression, including viral RNA stability and decay 5 , translation 6 , RNA transport, and viral particle formation 7 . Little is known about the potential involvement of cellular RBPs in subsequent processes of the HIV-1 lifecycle, such as viral particle formation and RT. However, recent discoveries suggest that RBPs may participate in these processes as well 7 .
It was originally thought that HIV-1 particles disassemble upon cell entry, releasing the genomic (g)RNA molecules into the cytoplasm for reverse transcription (RT). However, recent paradigm-shifting advances have challenged this view, showing that the capsid core remains intact during its transit through the cytosol and can be visualised traversing the nuclear pore and in the nucleoplasm 8,9 . In addition, numerous lines of evidence revealed that RT can occur inside the capsid core, protecting the viral nucleic acids from the hostile cytosol 8,10,11 . Indeed, the capsid core harbours positively charged 8Å wide pores that allow deoxynucleotides to traverse from the cytosol to feed RT 10 . Interestingly, several individual examples of cellular RBPs regulating HIV-1 particle assembly 12 and RT [13][14][15][16][17][18][19][20] have been reported. However, the scope of cellular RBPs participating in these processes remain unknown. Moreover, we know very little about the mechanisms by which they are recruited to the gRNA and regulate viral particle formation and RT. Since the capsid core remains intact upon cell entry and RT occurs inside it, we hypothesise that relevant cellular RBPs should be packaged inside the HIV-1 particles in the producer cell and carried into the newly infected cell. In agreement, proteomic analyses of purified HIV-1 particles have revealed the presence of hundreds of RBPs within virions 7 . While exciting, these proteomic analyses were affected by biological and technical factors, including i) the uptake of a portion of the cytosol by budding particles, leading to the passive acquisition of bystander proteins; ii) the presence of extracellular vesicles with similar sizes to virions in the preparations; and iii) the lack of appropriate negative controls and/or quantitative information. Therefore, progress in understanding the RBP composition of virions requires the development of new strategies to differentiate between passive bystanders and functionally relevant proteins.
Here, we applied a new approach to determine the complement of cellular proteins that interact with the HIV-1 gRNA inside the viral particles. We discover that the in virion packaged genomic ribonucleoprotein (ivRNP) contains over one hundred cellular proteins, many of which are nuclear despite virion assembly occurring at the plasma membrane. To test if these nuclear RBPs associate with nuclear HIV-1 ribonucleoproteins (RBPs), we elucidated the first protein interactome of Rev in infected CD4+ T lymphocytic cells. To our surprise the ivRNP and the Rev interactome heavily overlap, suggesting a previously unknown connection between the nucleus and the formation of HIV-1 particles. Functional analysis revealed that the components of the ivRNPs play important regulatory roles in HIV-1 infection. For example, we found that PURA and PURB bind to critical regulatory elements on the HIV-1 gRNA and regulate viral gene expression and viral particle infectivity. Our study provides a new landscape of host-HIV interactions with regulatory potential.

ivRIC, a new approach to analyse the composition of the RNPs assembled into virions
Proteomic analysis of purified viral particles has revealed that hundreds of cellular proteins are incorporated into virions, including cellular RBPs 7 . Comparison of these datasets revealed sparse overlapping ( Figure S1A), raising question about the biological significance of the identified proteins. The limited consistency of these studies is likely because total viral particle analysis is unable to discriminate between bystander proteins captured passively during virion assembly and RBPs actively interacting with the HIV-1 gRNA. To exclusively identify RBPs bound to gRNA in HIV-1 particles, we developed a new approach referred to here as in virion RNA interactome capture (ivRIC). In brief, viral particles are purified in a sucrose cushion, followed by protein-RNA 'zero distance' ultraviolet (UV) crosslinking, lysis under denaturing conditions and isolation of the polyadenylated HIV-1 gRNA with oligo(dT) magnetic beads via denaturing washes ( Figure   1A). RBPs crosslinked to the gRNA are then released by RNase treatment and identified by quantitative proteomics. ivRIC was applied to HIV-1 mCherry-Nef particles (Figure S1B and S1C) purified from the supernatant of transfected HEK293T ( Figure S1D) or infected CD4+ T lymphocytic cells (SupT1, Figure 1B). Interestingly, ivRIC showed that HIV-1 capsid (CA) is only isolated when forming part of the Gag polyprotein (also p55) and in a UVdependent manner ( Figure 1B and S1D). Free CA (also p24) was not detected in ivRIC eluates despite being very abundant (~2k copies/virion) and proximal to the ivRNP. These results can be explained by the Gag's RNA-binding properties, as it is the nucleocapsid (NC) polypeptide that interacts with RNA. Upon proteolytic processing of Gag-p55 by the HIV-1 protease (PR), NC remains attached to the gRNA whereas CA and matrix (MA) are released to form the inner capsid and outer matrix lattice ( Figure 1C). In addition, the integrase (IN, also p31) was also enriched in ivRIC eluates in a UV-dependent fashion, which is consistent with its reported RNA-binding activity 21 ( Figure 1B). Conversely, the glycoprotein gp120 was present in inputs (i.e. viral particles) but absent in ivRIC eluates as it localises at the envelope, far away from the gRNA. Altogether, these results support the efficiency and selectivity of ivRIC at isolating viral proteins that are known to interact with the HIV-1 gRNA inside virions.
The quality of ivRIC results relies on the efficient and specific isolation of HIV-1 gRNA. We thus quality controlled our results by quantifying the amount of viral RNA in ivRIC eluates.
Notably, HIV-1 gRNA represented ~95% of the isolated RNA ( Figure 1D), which agrees well with the previously described proportion of viral versus cellular polyadenylated RNA found in purified HIV-1 particle preparations 22 . The dominance of HIV-1 gRNA in eluates supports the high efficiency and selectivity of ivRIC, while implying that viral RNA must be the major (if not the sole) contributor to the subsequent proteomic analyses.

Uncovering the composition of the ivRNP
To comprehensively and systematically discover the cellular proteins that compose the HIV-1 ivRNP, we analysed the eluates of ivRIC from CD4+ T lymphocytic cells by  Figure 1F). Importantly, 104 cellular and the 9 viral proteins (113 in total) were consistently enriched over the two controls ( Figure 1G), reflecting that these proteins i) interact with the gRNA isolated from viral particle preparations, and ii) are not derived from cellular vesicles with virus-like sizes.
We refer to these high confidence components of the ivRNP as in virion RNA-binding proteins (ivRBPs).
We next validated our proteomic results using Western blot as an orthogonal approach.
Beyond the excellent recovery of viral proteins known to interact with RNA (see above), the cellular helicase MOV10 was strongly enriched by ivRIC in a UV and infectiondependent manner ( Figure 1B), reinforcing earlier findings showing its incorporation into HIV-1 particles 20,23,24 . Our results further show that MOV10 actively engages with gRNA in the context of the ivRNP. Moreover, other ivRBPs such as PURA, CIRBP and SUB1 were also detected in ivRIC eluates by Western blotting in a UV and infection dependent manner ( Figure 1F and Figure S2B). Collectively, these results confirm the ability of ivRIC to uncover cellular RBPs that interact with HIV-1 gRNA inside virions.
To gain functional insights into the composition of the HIV-1 ivRNP, we used available literature and gene ontology (GO) annotation. As expected from a high-quality RNA interactome, most ivRBPs were annotated with the 'RNA binding' GO term and were linked to functions associated with RNA metabolism ( Figure 1H, 2A and S2C). Interestingly, ~25% of the ivRBPs also contain DNA-binding activity ( Figure 1H and 2A), which is potentially very significant given the RNA/DNA duality of the HIV-1 genome 10,25 . Moreover, over half of the discovered ivRBPs are annotated with GO terms associated with virus infection, and ~25% of them also linked to HIV-1 ( Figure 1H). Indeed DHX9 16,17 , EEF1A 18 , UPF1 14 , MOV10 20,24 , PDCD6IP (ALIX) 12,26 , PACSIN2 27 and LCK 28 have been related to viral particle assembly or infectivity. Other proteins such as DDX3 29 , YBX1 30 , PURA 31 and ZC3HAV1 (ZAP) 32,33 have been associated with the HIV-1 lifecycle, but their roles within viral particles (if any) are still unknown. Moreover, dozens of ivRBPs have no prior connection with HIV-1, calling for further investigation.

The total proteome of HIV-1 particles provides new insights into the RNA-binding properties of ivRBPs
To determine the composition of the HIV-1 particles prior to gRNA isolation, we analysed the inputs of the ivRIC experiment by quantitative proteomics (Figure 1A and S1E). 187 proteins were significantly enriched over mock controls, representing the total viral particle proteome ( Figure 2B, S3A-B). In contrast to the HIV-1 ivRNP, the proteome of viral particles was enriched in GO terms and pathways related to the plasma membrane and immunological receptors, likely reflecting the envelope acquired during budding from the plasma membrane ( Figure S3C-E). 'RNA binding' was also enriched, but the odds ratio and p-value were substantially lower than those of the HIV-1 ivRNP ( Figure S3C and E).
The different prevalence of GO terms between the two datasets is expected as the ivRNP is a subset of the total HIV-1 particle.
RBPs can display different modes of RNA binding, ranging from universal to very selective, and from long-lived to transitory RNA binding. The RNA-binding properties of an RBP do not only impact its function (i.e. global vs specific and transitory vs stable) 34 , but also its ability to crosslink to RNA. Indeed, UV-induced, RNA-to-protein crosslinking is enhanced by long-lived and geometrically optimal interactions between nucleotides and amino acids 35 . To assess the binding properties of the ivRBPs, we compared the protein intensity in eluates (ivRNP) and inputs (viral particles) of the ivRIC experiment. This analysis revealed two groups of proteins ( Figure 2C): i) ivRBPs with high eluate/input protein intensity ratios (i.e. high 'crosslink-ability'), and ii) ivRBPs with low eluate/input ratio (i.e. low crosslink-ability). ivRBPs in the first group are expected to establish long-lived and geometrically optimal interactions with the gRNA, while the opposite is expected for the second group. Although strong and specific interactions are expected to be functionally relevant, there are many examples of transitory and low affinity interactions with fundamental roles in RNA metabolism 34 . Indeed, some of the ivRBPs with low crosslinkability to HIV-1 gRNA, such as PDCD6IP 12,26 , LCK 28 , DHX9 16 , have been shown to play important regulatory roles in viral particle formation and RT.

ivRBPs are incorporated selectively into viral particles
To discriminate between abundance-driven passive incorporation of bystander proteins and selective packaging of proteins into virions, we normalised the protein intensity in ivRIC inputs (viral particles) and eluates (ivRNP) against protein intensity in the whole CD4+ T cell proteome (WCP). Interestingly, the intensity of proteins in inputs correlated well with protein abundance (R=~0.4; Figure 2D), suggesting that protein levels do influence overall virion composition. By contrast, protein intensity correlation was substantially lower when comparing the proteins enriched in the ivRNP with the WCP (R=0.15) ( Figure 2E). Non-enriched proteins exhibited similar trends to inputs (R=0.5).
When considering the fold change instead of raw intensity, the result was even more clear, with ivRBPs displaying anticorrelation with the WCP (Figure 2F and S2D). This result indicates that the stronger a protein is enriched in the ivRNP, the lower its abundance in the cell. Together, these data strongly favour the notion that ivRBPs are selectively incorporated into viral particles.
A remaining question is whether ivRBPs are present in sufficient quantities in ivRNPs to have functional consequences in infection. To test this, we compared the protein intensity and fold change distribution in eluates of ivRIC from mock-and HIV-infected cells.
Interestingly, many cellular ivRBPs exhibited similar intensity levels and fold changes to viral proteins with high stoichiometry in viral particles ( Figure 2G, grey vs purple). Several known regulators of HIV-1 infectivity were amongst the most prevalent ivRBPs in virions, including UPF1 14 and MOV10 23 . However, other prevalent ivRBPs such as PURA, PURB, and ZC3HAV1/ZAP have no known roles inside HIV-1 particles. The high abundance of MOV10 and PURA in particles was confirmed using Western blots ( Figure 1B). Our results are thus compatible with the notion that ivRBPs play potential roles in infection.

Elucidating the Rev interactome in infected CD4+ T cells
The presence of many nuclear RBPs (Figure 2A) in the ivRNP is surprising as HIV-1 particle assembly occurs at the plasma membrane. We hypothesised that these proteins may associate with the gRNA in the nucleus and remain bound to it until its packaging into viral particles. To test our hypothesis, we focused on a key component of the HIV-1 nuclear genomic ribonucleoprotein (gRNP), called Rev 2 . Therefore, the protein interactome of Rev is expected to reflect the composition of the gRNP to a high extent.
There have been several attempts at establishing the Rev interactome, however, these approaches were limited because Rev was expressed in non-infected cells and/or using cell types that are not infected by HIV-1, leading to very poor overlap between datasets ( Figure S4A To assess if tagging affects Rev function, we generated HIV-1 particles by co-transfection of the replicons with the glycoprotein of the vesicular stomatitis virus (VSV G). Infection of SupT1 cells with these viruses led to normal levels of Gag/p55, processed CA/p24 and mCherry-Nef, when compared to the parental HIV-1 -R-E-mCherry-Nef (Figure 3B-C and S4C).
By contrast, nullification of Rev by adding a stop codon (HIV-1 -R-E-∆Rev ) led to undetectable levels of Gag/p55 ( Figure 3D). Single molecule in situ RNA hybridisation (smFISH) of cells infected with HIV-1 -R-E-Rev-HaLo showed that the expression of the tagged Rev overcomes the nuclear accumulation of gRNA observed in absence of Rev ( Figure 3E). Rev exhibits a nucleolar localisation when overexpressed in non-infected cells 39 . Conversely, expression of Rev from our construct led to additional localisation in nuclear pores and nucleoplasm ( Figure 3E). Presence of Rev at nuclear pores is compatible with its known role in HIV-1 RNA nuclear/cytoplasmic export 2 . Collectively, our results confirmed that Rev is functional when tagged with Flag-Myc or HaLo.
To elucidate the Rev interactome, we infected SupT1 cells with the chimeric viruses, followed by Flag immunoprecipitation (IP) at 48 hours post infection (hpi) in presence of RNases ( Figure 3F). Proteomic analysis revealed high sample correlation for Rev-Flag IPs (Average Pearson correlation, R= 0.85; Figure S4D-E), and that Rev was the most enriched protein in the IP ( Figure 3G and S4F). In addition to Rev, 284 cellular proteins were significantly enriched in IP eluates at 10% false discovery rate (FDR) ( Figure   3G). 81% of the interactors were RBPs themselves ( Figure 3H and S4G), reflecting Rev's prominent role in viral RNA metabolism. We noticed the presence of large number of proteins from the ribosome and spliceosome in the IP eluates ( Figure S4H), supporting previous finding that Rev is a regulator of HIV-1 RNA splicing and translation 2 . In addition, Rev interacts with complexes involved in chromatin function and RNA polymerase II transcription ( Figure S4H), suggesting that it may engage with viral RNA cotranscriptionally.
The previously established Rev interactomes showed very poor overlapping between them, raising questions over their biological significance 2 . However, 43% of the proteins identified here were also reported in these earlier studies ( Figure 3I). Our results thus reconcile the contradictions in these datasets, while still adding many additional Rev interactors. We also observed a strong overlap between our Rev interactome and the annotation in the NCBI HIV database, with 25% of the proteins being already annotated as Rev interactors and a further 42% previously linked to HIV-1 in some capacity ( Figure 3J).
Altogether, these results highlight the consistency of our Rev interactome with previous data, while adding new Rev-host protein interactions with unknown roles in HIV-1 infection.

The HIV-1 ivRNP heavily overlaps with the Rev protein-protein interactome
Nuclear RBPs present in the ivRNP may associate with the HIV-1 gRNA in the nucleus of the infected cell. To determine to what extent the ivRNP resembles to the nuclear gRNP, we compared the ivRIC ( Figure 1) and Rev IP results ( Figure 3). Surprisingly, 68% of the components of the ivRNP are also interactors of Rev ( Figure 4A). Indeed, we observed that the proteins identified in both the Rev interactome and the ivRNP displayed similar fold changes in both datasets, showing a striking Pearson correlation of >0.61 ( Figure 4B). This suggests that the stoichiometry of these proteins in the Rev interactome and ivRNP is likely similar. Proteins shared between these two datasets are mostly involved in RNA transcription, splicing, nuclear export, and translation ( Figure 4C). By contrast, proteins only present in the ivRNP are associated mainly with membrane biology, including membrane trafficking, endocytosis, and vesicle transport. Our results are thus compatible with two populations of ivRBPs: one that associates with the gRNA in the nucleus and remains associated with it, and another that engages with the gRNA later at the plasma membrane. By contrast, proteins specific to the Rev interactome are nuclear and are broadly involved in chromosome biology, DNA damage, Tat-mediated transcription, rRNA processing and 3' to 5' RNA decay amongst others ( Figure 4C). Interestingly, we also observed that the previously established Gag 40 and Staufen 41 interactomes (proxies for the cytoplasmic gRNPs) overlap to some extent with the ivRNP and the Rev interactome ( Figure S5A). Moreover, several ivRBPs and Rev interactors were previously found associated with HIV-1 RNA in infected cells ( Figure S5B The observed binding of ivRBPs and Rev interactors to the RNAs of different virus species, together with their potential master-regulatory roles in infection, could be derived from low binding specificity that enable them to engage with a wide range of RNA sequences. To test this, we examined the RNA-binding preferences of all the RBPs in these two datasets for whose enhanced crosslinking and immunoprecipitation (eCLIP) sequencing data is available in ENCODE 46 . Most of the proteins tested exhibited sequence-specificity as well as preferences for defined regions on cellular RNAs ( Figure   4F and S5E-F). Despite a large overlap between the ivRNP and the Rev interactome, we observed some differences in the overall binding preferences, with 3' UTRs and coding sequences (CDS) being predominant binding sites for ivRBPs, and introns and 5' UTR for Rev interactors ( Figure 4F). This divergence can be explained by the fact that the Rev interactome is substantially larger and include additional RBPs, many of which are involved in nuclear processes ( Figure 4A and C). Our results therefore revealed binding preferences across the ivRNP and the Rev interactome components that are compatible with specific RNA binding.

PURA and PURB are regulators of HIV-1 infection
To test if ivRBPs and Rev interactors are functionally relevant to HIV-1 infection, we generated knock out CD4 + T lymphocytic cells for several candidates. These included the transcriptional activator protein Pur-alpha (PURA) ( Figure 1B), its homologue Pur-beta (PURB) and FAM120A that were identified by both ivRIC and Rev protein-protein interaction analysis. We also selected moesin (MSN) that was present only in the ivRNP, as expected from a membrane-associated protein. KOs were generated using To assess if viral production is affected by the lack of these ivRBPs, the supernatant of KO and WT cells was collected and analysed by RT-qPCR ( Figure 5A  The identification of PURA, PURB, MSN and FAM120A in the ivRNP suggests that they may also play a role in virus infectivity. To assess this, we infected WT cells with the same number of viral particles generated in WT or KO cells, using the RT-qPCR data for normalisation ( Figure 5A and D). Viruses produced in MSN KO cells exhibited no differences in infectivity when compared to those generated in WT cells, while a slight increase was observed for FAM120A KO samples ( Figure 5D). By contrast, viruses generated in PURA and PURB KO displayed a substantial decrease in HIV-1 positive cells, with PURA deficient viruses having the lowest infectivity ( Figure 5E). Altogether, our results reveal that all tested ivRBPs participate in HIV-1 gene expression, while PURA and PURB have additional roles in the infectivity of viral particles.

The PURA and PURB interactome in HIV-1 infected CD4+ T lymphocytic cells
To further characterise PURA and PURB roles in infection, we carried out an IP and proteomics experiment to determine their protein-protein interactomes. To achieve this in a This included a few ivRBPs such as MOV10, that displayed a consistent decrease in PURA/B-binding upon infection. Altogether, these results highlight the engagement of PURA and PURB with a wide range of cellular machineries that are important for the life cycle of HIV-1 RNAs. Moreover, they also interact with Gag and Gag-Pol components that are essential for the formation of HIV-1 particles and early steps of viral infection.

PURA and PURB bind to key regulatory regions in cellular mRNAs
We next aimed to elucidate the RNA-binding preferences of PURA and PURB. For this, we applied iCLIP2 48 to HIV-1 infected and uninfected PURA-eGFP and PURB-eGFP Jurkat cells ( Figure S7C). iCLIP2 employs UV cross-linking, RNase treatment, IP, cDNA library generation and sequencing to determine the footprints of RBPs on their target RNAs in a genome-wide manner and with single nucleotide resolution. Both PURA-eGFP and PURB-eGFP were strongly enriched in IP eluates ( Figure S8A and B). Moreover, ligation of a fluorescently labelled DNA adapter to the co-precipitated RNA revealed a high molecular weight smear that is compatible with PURA/B-RNA complexes ( Figure S8C). Upon cDNA generation and PCR amplification, the sequencing of the libraries revealed that PURA and PURB interact predominantly with protein coding cellular RNAs (i.e. mRNAs), with a bias towards coding sequences (CDS) followed by 3' UTRs ( Figure 6A and B and S8D-E).
Binding sites in CDS, 5' UTRs and introns displayed intriguing distributions. 5'UTR binding sites were prominent at the beginning (near the cap) and end (translation start codon) ( Figure 6C). Similarly, the binding site density at CDS displayed two notable peaks near the translation start and stop codons ( Figure 6C). The binding sites in introns also had a dual distribution ramping up at the 5' and 3' splice sites ( Figure 6D). No bias was observed for 3' UTRs. The striking binding site accumulation at the beginning and end of 5' UTRs, CDS and introns overlapped or was proximal to key regulatory elements such as the cap, the start and stop codons and the splice sites, suggesting that PURA and PURB may play regulatory roles in translation and splicing. Involvement in these processes is also supported by the high incidence of components of the translation and splicing apparatus in PURA and PURB protein-protein interactome ( Figure 5I). It is also noticeable that both PURA and PURB datasets grouped together in the principal component analysis, with HIV-1 and mock treatment and not the protein identity as the main differential factor ( Figure S8D). Moreover, 81% of the identified RNAs are targets for both proteins, displaying near identical binding patterns across their mRNA targets and a high degree of overlap of their binding sites ( Figure 6A-F and S8F). These results reinforce the idea that PURA and PURB play cooperative or redundant roles in RNA metabolism.
To test if PURA and PURB recognise specific sequences, we examined the binding sites for enriched motifs. We found two prominent classes, a purine-rich motif that correlates well with the known binding preferences of PURA 49 , and a U-rich motif that was not previously described ( Figure 6G and S8G). Detailed analysis of the occurrence of these motifs revealed that the U-rich motif precedes the AG-rich motif ( Figure S8H). These motifs are highly similar to the canonical 5' and 3' splice sites 50 , which explains the distribution of PURA and PURB at the two ends of introns ( Figure 6D). To rule out that these motifs are not an artifact derived from the proximity of PURA/B binding sites to splicing junctions, we searched again for motifs after removal of all intronic binding sites. In these conditions a single motif combining the U-rich sequence followed by the AG-rich sequence was observed ( Figure 6G and S8H). Altogether, these data suggests that PURA and PURB interact with a bi-partite motif composed by a U-rich tract followed by an AGrich sequence that is present at key regulatory elements within mRNAs.

PURA and PURB bind to functionally important sequences within the HIV-1 gRNA
To determine if PURA and PURB interact with HIV-1 RNA, we searched for the binding sites mapping to the NL4.3 HIV-1 genome. 5-20% of the reads in HIV-1 infected cells mapped to the HIV-1 genome, which was substantially higher than those in controls (size matched inputs, SMI, Figure S8I). There was a high density of binding sites distributed across the gRNA with several high confidence binding sites located at important regulatory elements ( Figure 6H). We observed a high density of reads at most canonical splice sites and splicing regulatory sequences in the gRNA ( Figure S9), implying a potential involvement PURA and PURB in splicing regulation. Binding sites in splicing junctions contained, in most cases, the AG-rich or/and U-rich motifs ( Figure S9). Moreover, a prominent PURA/B binding site was placed just before the slippery sequence that causes the Gag/Pol frameshift, which contained 2 repeats of the AG-and U-rich motifs that is compatible with the binding of more than one molecule of PURA/B ( Figure 6I). However, we did not observe a substantial difference in Gag/Pol ratio in PURA and PURB KO cells, suggesting that these RBPs might not regulate frameshift efficiency ( Figure S6G). While PURA was initially related to Tat-dependent regulation of HIV-1 transcription, we did not observe binding sites within the TAR structure that is the substrate for Tat ( Figure 6J).
PURA and PURB displayed nearly identical binding distribution across the gRNA, with a notable exception that is the binding of PURB to the RRE that was not as prominent for PURA ( Figure 6H). Another binding site downstream the RRE and overlapping with several 3' splice sites was detected for both PURA and PURB. Binding to RRE and its proximities is compatible with the efficient co-purification of these proteins in the Rev IP ( Figure 3). Altogether, these results confirm that PURA and PURB interact with the HIV-1 gRNA at multiple key regulatory sequences, although the exact roles of these binding preferences in viral gene expression and RT deserve further characterisation.

DISCUSSION
Many approaches have been previously undertaken to reveal the composition of viral particles 7 . However, the fact that virions assemble in the crowded cellular environment allows the passive incorporation of cellular proteins with no roles in the viral life cycle.
Since viral particle proteomes are usually large and prone to false positives, the distinction between bystander and functionally relevant host proteins becomes a challenge. This is clearly shown in our study, as the HIV-1 particle proteome (ivRIC input) correlates well with protein abundance. By contrast, enrichment of the ivRNP by ivRIC allows the identification of proteins directly interacting with the viral RNA inside virions. Since the ivRNP components display a poor correlation with protein abundance, the incorporation of these proteins into virions is likely to occur through active mechanisms. The high performance of ivRIC is supported by the high number of previously established regulators of HIV-1 particle formation and infectivity that are top candidates in our datasets (e.g. are captured and retained in the HIV-1 gRNP and whether they play a role prior to or after virus assembly requires a case-by-case study. In the present work we show that the ivRBPs MSN has no effect in virus infectivity but regulate viral gene expression and/or viral particle formation in the producer cell ( Figure 5). However, other ivRBPs such as UPF1 14 , EEF1A 15,19 , MOV10 23,24 , APOBEC 13 , and, as shown here, PURA and PURB ( Figure 5) aid or supress virus infectivity. In the future, ivRIC can be expanded to different producer cells, experimental conditions, and viruses to provide new insights into the cellular proteins that regulate the biology of viral particles. We applied here ivRIC to a virus with a polyadenylated genome; however, it can potentially be adapted to viruses lacking poly(A) tail by using specific antisense probe sets or total RNA purification methods 44 .
In the early 2000s, PURA was initially linked to Tat-dependent transcription using LTRcontrolled plasmids 31 . However, these early publications were refuted by following studies as reviewed in detail elsewhere 51  The function of PURA and PURB is not restricted to the producer cell but extends to the infectivity of the viral particles. The ivRBP EEF1A was proposed to modulate reverse transcription through its interaction with the RT 15 . PURA and PURB also interact with RT, the RNase H domain (p15) and IN. PURA and PURB could also contribute to RT directly through the interaction with the RT or indirectly through their activity on the gRNA. The fact that PURA is able to interact with both RNA and DNA 31 is compatible with a regulatory role involving also the proviral DNA. IN has been proposed to play roles that span different stages of viral infection, particularly integration and Tat-dependent transcription 52 . Our results suggest that the regulatory roles of cellular proteins such as PURA and PURB may also span different stages of HIV-1 infection, affecting HIV-1 gene expression and viral particle infectivity.

Contact for reagent and resource sharing
Further information and requests for resources and reagents should be directed to the Lead Contact, Alfredo Castello (alfredo.Castello@glasgow.ac.uk).

Cell culture
The

Viruses
Full length, infectious HIV-1 mCherry-Nef was obtained by transfection of pNL4.3-mCherry-T2A-Nef plasmid into HEK293T. For ivRIC, this primary stock was used to infect SupT1 cells so viral particles recovered for purification and UV crosslinking were produced in CD4+ T lymphocytic cells. The different pseudotyped HIV-1 were generated by cotransfection of HEK293T cells with pNL4-3.R-E-derived plasmids (Key Resources Table) and a plasmid encoding the vesicular stomatitis virus glycoprotein (pHEF-VSVG, NIH AIDS Reagent Program, #4693). Transfection and virus production methods are described below.

Plasmids and recombinant DNA procedures
To generate a stable mCherry-expressing, replication competent HIV-1 (named HIV-1 mCherry-Nef ), we followed a strategy analogous to Edmonds et al 53 . A synthetic DNA sequence containing mCherry followed by a T2A peptide coding sequence flanked by HIV-1 sequences was generated (GeneArt, Thermo Fisher Scientific). HIV-1 flanking sequences were derived from clone pNL4.3, starting at the unique BamHI site to the end of Env gene. A short DNA linker was introduced at the end of Env, including a consensus Kozak sequence before the mCherry start codon. At the end of the mCherry-T2A fusion, an unique XbaI site was introduced before continuing in frame with the start codon of Nef, finishing at the unique XhoI site within Nef. The resulting construct was cloned into pNL4.3 using the BamHI and XhoI unique restriction sites, generating pNL4.3-mCherry-T2A-Nef.
Firstly, pNL4.3.R-E-mCherry-T2A-Nef was restricted with BamHI-HF and EcoRI enzymes to generate a 2.7kb template containing the Rev ORF for subsequent PCRs. To generate the T5974C point mutation, PCR was performed firstly with primers T5974C_Primer_A and T5974C_Primer_D and next with primers T5974C_Primer_B and T5974C_Primer_C. PCR mixtures were extracted from agarose gel, isolating 2 fragments encoding the point mutation. PCR was next carried out without primers or DNA to anneal extracted fragments.
After 7 cycles, T5974C_Primer_A and T5974C_Primer_B were added and remaining cycles completed to produce a contiguous fragment harbouring the T5974C mutation.
PCR mixtures were gel-purified and ligated into the restricted into pNL4.3.R-E-mCherry-T2A-Nef backbone. This process was then repeated with primers targeting the T6041A point mutation.
Plasmids encoding RBP-tagged proteins for the generation of inducible cell lines were cloned by 1) amplification of the protein with specific primers from SupT1 cDNA or available template plasmids, and 2) insertion into a pcDNA5/FRT/TO-eGFP-linker vector containing the eGFP tag before (N-terminus tagging) the multicloning site. eGFP was followed by the flexible linker GGSGGSGG.
Single-guide (sg)RNA expression plasmids for CRISPR/Cas9-mediated knock out were generated by inserting annealed oligos into the BLADE plasmid as described in 55 . All plasmids were validated by sequencing.

Virus titration by RT-qPCR
The number of viral particles employed in ivRIC was calculated as follows. 100 µl of Samples were thawed at room temperature, and 10% of the sample was taken as input For conventional protein analyses, ivRIC eluates were concentrated on an Amicon Ultra-0.5 centrifugal filter unit 3KDa cut-off (Millipore, #UFC500324) by centrifugation for 20 min at 18000g and 4ºC. Filters were washed with 500 µl of 40 mM NaCl followed by centrifugation for 45 min at 18000g and 4ºC. Proteins were recovered by placing the filter unit upside down in a new tube and spinning for 2 min at 1000g. Finally, RNA was digested with RNases T1 and A as above.

Conventional protein analyses
Samples were mixed with NuPAGE LDS Sample Buffer 4X (Thermo Fisher Scientific, #NP0008), incubated for 10 min at 70ºC, resolved by SDS-PAGE and analyzed by 1) Western Blot using specific antibodies (Key Resources Table), LI-COR Odyssey Fc imaging system for visualization and the Image Studio Software for quantification, or 2) silver staining using the SilverQuest kit (Invitrogen, #LC6070). Next, cells were incubated in a wet chamber for 16 h at 37°C with 125 nM HIV-1 gRNAspecific Stellaris probes (LGC Biosearch Technologies) in hybridization buffer (2x SSC, 10% deionized formamide and 10% dextran sulfate in DEPC water). Cells were subsequently washed twice with pre-hybridization buffer for 10 min at 37°C and incubated with DAPI and mounted as above. In both cases, images were acquired on an API DeltaVision Elite widefield fluorescence microscope using a 100X oil UPlanSApo objective (1.4 NA) and deconvolved with SoftWoRx v6.5.2 (GE Healthcare).

Generation of knock-out SupT1 cells
To generate KO cells, we produced nanoblades loaded with Cas9 and specific sgRNAs targeting two regions for each gene as described before 47 . We then transduced 1x10 5

Analysis of cell viability and proliferation
To evaluate cell growth and viability, WT and KO cells were seeded at a concentration of

Analysis of HIV fitness
To assess if knocking out a protein affects virus gene expression and infectivity, we

Determination of PURA and PURB binding sites on target RNAs by iCLIP2
The original iCLIP2 protocol 48  50% IP library pool, 37.5% SMI library pool, and 12.5% negative control eGFP.

PROTEOMIC AND TRANSCRIPTOMIC ANALYSES
Sample preparation for mass spectrometry ivRIC eluates were processed by single-pot, solid-phase-enhanced sample preparation (SP3) and peptides were analysed by LC-MS/MS as described in detail in 61 . ivRIC samples and PURA-eGFP and PURB-eGFP IP were prepared for proteomics using SP3 as previously described 61

LC-MS/MS analysis
Tryptic peptides were analysed using Ultimate 3000 ultra-HPLC system (Thermo Fisher Scientific) connected to a Q-Exactive mass spectrometer (Thermo Fischer Scientific) through an EASY-Spray nano-electrospray ion source (Thermo Fischer Scientific). The peptides were first trapped on C18 PepMap100 pre-column (300 µm inner diameter x 5 mm, 100 Å, Thermo Fisher Scientific) in solvent A (0.1% FA in water), and were then separated on the analytical column (

Quantitative analysis of proteomics data
The proteinGroup files of MaxQuant search results were imported in RStudio (R Project) for further processing. Protein intensities were log 2 transformed. In each dataset proteins with less than 2 valid intensity measurements across experimental conditions were removed prior to downstream analysis. Batch effects in each comparison were assessed using principal component analysis (PCA).
Normalisation was performed to ivRIC inputs ad PPI analyses using variance stabilisation normalisation (vsn) method 66 . Missing values were imputed with deterministic minimum method (R package version 2.0. https://CRAN.R-project.org/package=imputeLCMD) using 1% quantile of global intensities. All missing values were imputed in ivRIC and Rev interactome data. For PURA/PURB interactome data, only proteins with all values missing in one condition were imputed as described in 67 . Linear modelling and Bayesian-modelbased moderated T-test was performed using the R-package limma 68 . Batch effects in ivRIC eluates and Rev and PURA/B interactomes were accounted for by incorporation in modelling using "block" argument provided in limma. P values obtained in the moderated T-test were adjusted to account for multiple testing using Benjamini-Hochberg method.

iCLIP2 data processing
The raw FASTQ files were demultiplexed according to the sample barcode using Je Suite The GRCh38 and HIV-1 genomic annotations were pre-processed to generate sliding windows (50nt window, 20nt step size) using HTSeq-clip 70 . Crosslink truncation sites (position -1 relative to the 5' end of the read start) were extracted using BEDTools 71 and quantified against the sliding windows using HTSeq-clip. For peak calling, a R/Bioconductor package DEW-Seq was used to identify significantly enriched sliding windows in PURA/PURB immunoprecipitated samples over the corresponding sizematched input control samples (log2FoldChange > 2 and p.adj < 0.01) 70 . The Independent Hypothesis Weighting (IHW) method was used for the multiple hypothesis correction 72 To remove background signal resulting from non-specific binding of RNA to GFP, significantly enriched sliding windows (log2FoldChange > 2 and p.adj < 0.01) from GFPimmunoprecipitated control samples were removed. Overlapping significant sliding windows were merged to binding regions, and these sites were curated to 8nt long binding sites based on peak width and maxima. Binding sites were queried against the genome annotation (ENSEMBL release 104) using the GenomicRanges R package to extract overlaps with genes and transcript features 73  (https://sourceforge.net/projects/bbmap/), then mapped to the custom transcriptome using STAR. Peak-calling was performed as described above using DEW-Seq except for using smaller sliding windows (10nt window, 2nt step size) and a more lenient threshold (log2FoldChange > 1.5 and p.adj < 0.01).
For the comparison of different iCLIP libraries, a principal component analysis was initially performed. First, the libraries were size-corrected and transformed using the variance stabilisation method implemented in DESeq2. Then, the 1,000 sliding windows showing the largest variance were used to perform the dimensionality reduction using the prcomp function implemented in the base R. To assess the spatial correlation of PURA and PURB binding sites, the relative distance metric was calculated using BEDTools reldist function.
The same density of PURA and PURB binding sites was randomly simulated across the genome to compare with the observed nearest neighbour distance.

Motif analysis in PURA and PURB binding sites
Sequences for motif enrichment analysis were defined for each binding site as a 70nucleotide region, centred on the peak in BigWig signal. The sequence was extracted directly from UCSC genome hg38 when intron sequence was included in the analysis.
When introns were excluded, sequence was extracted from the CDS sequence of the longest isoform of the gene in which the binding site occurred. For each binding site, a gene and gene region matched background sequence were defined to allow for differential enrichment. Enrichment analysis was performed using HOMER. Motifs were processed and plotted using the R packages universalmotif and ggseqlogo.

Mapping and comparing gene IDs
For all datasets analyses, including mass spectrometry data, genes were mapped to Hugo gene nomenclature committee IDs (HGNC) using R package BiomaRt. Any genes that could not be mapped were manually curated by manual addition or were removed if it was a pseudogene or its HGNC entry was missing. For HIV-1 analysis, the HIV-1 NCBI database was downloaded from the NCBI web server and parsed to gene IDs following the same strategy. Upset plots and Euler diagrams were generated using the R packages upsetR and venneuler respectively.

Classification of RNA-binding proteins
We classified proteins as RBPs if they were identified in at least 3 independent RNA interactome studies based on the EMBL RBPbase (https://rbpbase.shiny.embl.de/) resource.   I  R  D  y  e  6  8  0  R  D  D  o  n  k  e  y  a  n  t  i  -R  a  b  b  i  t  I  g  G  S  e  c  o  n  d  a  r  y   A  n  t  i  b  o  d  y   L  i  -C  o  r  C  a  t  #  9  2  6  -6  8  0  7  3   I  R  D  y  e  8  0  0  C  W  D  o  n  k  e  y  a  n  t  i  -R  a  b  b  i  t  I  g  G  S  e  c  o  n  d  a  r  y   A  n  t  i  b  o  d  y   L  i  -C  o  r  C  a  t  #  9  2  6  -3  2  2  1 3       We consider a 'master regulatory' protein to any candidate whose perturbation causes phenotypes in viruses from three or more families. F) Analysis of the binding preferences of ivRBPs and Rev interactors using the eCLIP database (ENCODE)46. Odds ratio of binding sites in 5' UTRs, CDSs, introns and 3' UTRs was estimated for each protein in the ivRNP and Rev interactome and the overall preferences of each protein group was represented as a bar plot.    . The significant (p<0.01) binding sites are indicated with a grey box, while the AG-and U-rich motifs are indicated with yellow and red boxes, respectively. The secondary structure of these RNA elements are shown using SHAPE data, indicating the fold change in the IP over the SMI control at each nucleotide position.