Identifying HIV-1 RNA splice variant protein interactomes using HyPR-MSSV

HIV-1 generates unspliced (US), partially spliced (PS), and completely spliced (CS) classes of RNAs; each playing distinct roles in viral replication. Elucidating their host protein “interactomes” is crucial to understanding virus-host interplay. Here, we present HyPR-MSSV for isolation of US, PS, and CS transcripts from a single population of infected CD4+ T-cells and mass spectrometric identification of their in vivo protein interactomes. Analysis revealed 212 proteins differentially associated with the unique RNA classes; including, preferential association of regulators of RNA stability with US- and PS-transcripts and, unexpectedly, mitochondria-linked proteins with US-transcripts. Remarkably, >80 of these factors screened by siRNA knock-down impacted HIV-1 gene expression. Fluorescence microscopy confirmed several to co-localize with HIV-1 US RNA and exhibit changes in abundance and/or localization over the course of infection. This study validates HyPR-MSSV for discovery of viral splice variant protein interactomes and provides an unprecedented resource of factors and pathways likely important to HIV-1 replication.

S1). Cell lysates were first depleted of the US HIV RNA through hybridization to the intron-1 CO, 100 followed by its capture with streptavidin-coated magnetic beads, and subsequent release using toehold- subsequently repeated iteratively using first the intron-2 CO and then the 3'-exon CO for isolation of 8 158 Heat map depicts relative intensities for each of the 212 proteins (rows) in each of the three biological replicates of the US, PS, and CS (columns) differential interactomes. B. Condensed list of gene ontology biological process or cellular component terms enriched in each of the HIV splice variant interactomes. The "protein #" column indicates the number of proteins in the interactome that are annotated with the biological process indicated. The "p-value" column indicates the likelihood that the proteins of the biological process are present in each interactome by random chance and were provided by GO term enrichment software (Mi et al., 2017). A lower p-value suggests non-random over-representation of a biological process."NE"= Not Enriched. C. Venn Diagrams of proteins annotated for biological processes or cellular components enriched in the splice variant differential interactomes. genes ( Figure 3C; Table S7). The KD of 33 host proteins affected the expression of US and CS protein products in the same direction (either both increased or both decreased) and with approximately the 163 same magnitude. By comparing mCherry:CFP fluorescence ratios for each protein KD to the negative 164 control, we determined that CS and US protein expression were differentially affected by KD of 51 host 165 proteins; for 26 of the proteins the expression changes were in the same direction but with different 166 magnitudes; for 18 only the expression of the US RNA protein product was affected; and for 7 only the expression of the CS RNA protein product was affected ( Figure 3C, Table S7). Based on the direction of the changes in HIV-1 gene expression (increased or decreased), we categorized 71 host genes as 169 putative "negative" effectors and 15 as putative "positive" effectors ( Figure 3C, Table S7). Interestingly, 170 of the 16 negative effectors,10 were implicated in mitochondria-associated pathways based on GO 171 analysis; of those ten, nine were identified by HyPR-MSsv to preferentially interact with the US HIV RNA 172 (Tables S4 and S7).

174
HyPR-MSSV candidates co-localize with US HIV RNA at distinct subcellular locations.

175
We selected a subset of 20 HyPR-MSsv identified host proteins for further validation studies. This 176 subset was, in part, chosen based on the commercial availability of antibodies that allowed for 177 immunoblot-and/or immunofluorescence-based detection of the host proteins (Table S8) Table S7). Fifteen of the 20 proteins were 10 185 In 96-well plates, 293T-ACT-YFP cells were transfected with gene-specific siRNAs for 4 hours, 48-hours later they were transfected again for 4 hours followed by incubation with the HIV reporter virus. The cells were then fixed at 48-hours post incubation. Fluorescence microscopy was used to quantify CFP and mCherry. C. Heatmap of HIV gene expression changes after siRNA knockdown of host proteins. Eighty-four of 121 proteins showed statistically significant changes in early and/or late HIV gene expression (p-value <0.05). D. Twenty proteins were selected for confirmation of KD efficacy using western blot. The table summarizes the HyPR-MS and KD results for the proteins for which the WB or IF showed significant KD of the targeted host protein (18 proteins). Note: KDs detection for proteins IGF2BP3, SRRM2, and DNM2 were unsuccessful by WB but were later shown to be effective using the same antibodies in fixed cell immunofluorescence (Tables S9 and S11). detected by immunoblot and siRNA KD was confirmed (31 to 95% relative to negative control siRNA) US HIV RNA-protein interactions may commence as early as production of the nascent HIV transcript 190 in the nucleus or as late as virus particle formation at the plasma membrane. To determine potential

196
Host proteins were detected using the primary antibodies employed for our immunoblot analysis (Table   197 S8); with 17 of the 18 host proteins (all but DLD) detected by IF and showing greater than 40% 198 decreases in IF signal after host protein siRNA KD. This imaging-based analysis also allowed 199 verification of the efficacy of siRNA KD for three of the host proteins (IGF2BP3, SRRM2, and DNM2) 200 that we had been unable to detect using immunoblot (Table S11).

219
We quantified the frequency of each co-localization phenotype for 17-52 cells per antibody ( Figure 4B, 220 Table S12), excluding cytoplasmic granules that were only rarely observed. The data revealed that 221 proteins that predominantly localize to the nucleus or proximal to the nuclear membrane (HNRNPR, 222 RBMX, RBM4, MBOAT7) had a higher frequency of co-localization with HIV RNA at nuclear puncta 223 (57-96%) relative to proteins that were predominantly localized to the cytoplasm (FAM120A, IGF2BP3,

228
generally with Gag also present ( Figure 4B, Table S12). A. Representative images of co-localization phenotypes observed using FISH/IF. For each, a merged image of a cell highlighting a site of co-localization (white square) is shown. Enlarged regions of interest (ROI) of each fluorescence channel are displayed in the associated small panels to separate overlapping US HIV RNA, host protein, HIV Gag polyprotein, and DAPI signals. Some images were obtained from experimental replicates that did not include Gag IF and therefore do not include images from the corresponding channel. Note: Brightness and contrast settings were adjusted individually for each color channel of the images to effectively show co-localization. These settings may be different for the ROIs. B. Table showing the

252
The FISH/IF single cell analyses of US HIV RNA, Gag, and host proteins also allowed for tracking of primarily localized to the nucleus of cells expressing no, or low amounts of, Gag and US RNA, but 255 exhibited marked shifts from the nucleus to the cytoplasm in cells with high levels of Gag and US RNA 256 expression ( Figure 5A). Changes to MBOAT7 were also striking, with much higher levels of expression 257 in cells with abundant Gag and US RNA than in uninfected or early infected cells ( Figure 5B).

264
HIV gene expression ( Figure S5, Table S13). For example, in early/uninfected cells we observed linear 265 increases in HNRNPR and MBOAT7 expression, positively correlating with the subtle increases in US

270
A similar analysis was performed after image-based segmentation of cells into nuclear and cytoplasmic 271 compartments to better discriminate the subcellular location in which host protein changes occurred 272 ( Figure S5, Table S13). For HNRNPR, the same trends were observed in the nucleus and cytoplasm as 273 were seen for the total cell ( Figure S5). For MBOAT7, nuclear expression plateaued as it did for total 274 cell expression, but the cytoplasmic expression increased slightly as US RNA and Gag abundance 275 increased ( Figure S5). In all, the expression of each of the 12 host proteins showed significant 276 correlation with the expression of US HIV RNA in at least one of the following sub-groups: early-For HNRNPR and MBOAT7, the most evident differences were in cytoplasmic expression (cyto 281 HNRNPR, median increase=21%, p=0.055; cyto MBOAT7, median increase=41%, p=3x10 -5 ) ( Figure   282 5G-5J, Figure S5, Table S13). In all, changes to nuclear or cytoplasmic abundance were observed for  Table S13). Three of these proteins showed differences to total cellular expression (MBOAT7, RBMX, 285 and TRIM56), with RBMX and TRIM56 only increasing in the cytoplasm. Two proteins did not show net 286 differences in overall expression, but exhibited statistically significant differences (p-value < 0.05) in 287 expression in the nucleus (MOV10) or the cytoplasm (IGF2BP3).

288
To identify potential host protein translocation events, we evaluated single cell nuclear-to-cytoplasmic 289 (nuc/cyto) ratios relative to US RNA abundance and looked for statistically significant differences in 290 early and late cells ( Figure S5, Table S13). HNRNPR nuc/cyto ratios ranged from 1 to 3 in 291 early/uninfected cells but only ranged from 0.6 to 0.9 in late infected cells; exhibiting a negative 292 correlation with US HIV RNA expression ( Figure 5L). For MBOAT7, the nuc/cyto ratio ranged from 1.7 293 to 3.9 in early cells and 1.5 to 3.9 in late cells; with no significant correlation with US RNA expression for either phase ( Figure 5M). However, overall nuc/cyto ratios were significantly lower for late cells p=7X10 -6 , respectively) ( Figure 5N and 5O). In all, the nuc/cyto ratios of six HyPR-MS candidate 297 proteins showed notable changes to nuc/cyto ratio (HNRNPR, MBOAT7, TRIM56, SRRM2, RBMX, and 298 RBM4); all, with the exception of SRRM2, exhibiting relative increases to cytoplasmic abundance 299 ( Figure 5P, Table S13).    (Table S7). Based on siRNA knockdown, at least 48 represent potential new host regulatory factors 354 (Table S7). Using Gene Ontology (GO) term enrichment analysis we showed that several biological Table S4), suggesting cellular pathways that may be uniquely involved in the processing of a subset of       After three hours, culture volume was increased to 300 mL using RPMI media and incubated at 3 rpm 449 for 45 hours. Infection was confirmed to be >90% by visualizing CFP expression via epifluorescence 450 microscopy. Cells were centrifuged at 1500rpm for 10 minutes, washed three times with PBS, then       (Table S1) was added and the bead mixture was nutated at room temperature for 30 minutes. Using a 479 magnet to collect the beads to the side of the tube, the supernatant containing the released RNA-480 protein complexes was transferred to a clean tube. The resulting sample was divided into two aliquots;

481
2% for RT-qPCR analysis and 98% for mass spectrometric analysis.    (Table S5) (Table S8) for 60 minutes followed by four, 5 minute, washes with blocking 624 buffer. Blocking buffer containing appropriate concentrations of the secondary antibodies and DAPI 625 stain (Table S8) were then incubated with the cells for 40 minutes followed by 4, 5 minute, washes with 626 PBS. Finally, the cells were fixed with 3.7% formaldehyde for 10 minutes followed by 3 washes with 627 PBS.

628
Fluorescence In Situ Hybridization (FISH). The FISH protocol was conducted using Stellaris 629 designed hybridization probes (Table S10) (Table S13). For determining 660 correlation of host protein expression with HIV gRNA expression, cells with outlier values in total HIV 661 gRNA fluorescence were excluded from the dataset. An outlier here is defined as a value that is more 662 than 1.5 interquartile ranges (IQRs) below the 1 st quartile (Q1) or above the 3 rd quartile (Q3). IQR is protein fluorescence expression in the nucleus, cytoplasm, total cell, and for the Nuc/Cyto ratios using 666 the CORREL function in Excel. R 2 values were calculated using the chart tools in Excel (Table S13).

667
For determining host protein expression and distribution changes outliers were determined, as HIV-1 RNA by hybridization capture and mass spectrometry. Sci Rep 7, 16965.