A two-color haploid genetic screen identifies novel host factors involved in HIV latency

To identify novel host factors as putative targets to reverse HIV latency, we performed an insertional mutagenesis genetic screen in a latently HIV-1-infected pseudo-haploid KBM7 cell line (Hap-Lat). Following mutagenesis, insertions were mapped to the genome and bioinformatic analysis resulted in the identification of 69 candidate host genes involved in maintaining HIV-1 latency. A select set of candidate genes was functionally validated using shRNA mediated depletion in latent HIV-1 infected J-Lat A2 and 11.1 T cell lines. We confirmed ADK, CHD9, CMSS1, EVI2B, EXOSC8, FAM19A, GRIK5, IRF2BP2, NF1, and USP15 as novel host factors involved in the maintenance of HIV latency. Chromatin immunoprecipitation assays indicated that CHD9, a Chromodomain Helicase DNA-binding protein, maintains HIV latency via direct association with the HIV 5’LTR, and its depletion results in increased histone acetylation at the HIV-1 promoter, concomitant with HIV-1 latency reversal. FDA-approved inhibitors 5-Iodotubercidin, Trametinib, and Topiramate, targeting ADK, NF1, and GRIK5, respectively were characterized for their latency reversal potential. While 5-Iodotubercidin exhibited significant cytotoxicity in both J-Lat and primary CD4+ T cells, Trametinib reversed latency in J-Lat cells but not in latently HIV-1-infected primary CD4+ T cells. Crucially, Topiramate reversed latency in cell-line models and latently infected primary CD4+ T cells, without inducing T cell activation or significant toxicity. Thus, using an adaptation of a haploid forward genetic screen, we identified novel and druggable host factors contributing to HIV-1 latency. Importance A reservoir of latent HIV-1-infected cells persists in the presence of combination antiretroviral therapy (cART), representing a major obstacle for viral eradication. Reactivation of the latent HIV-1 provirus is part of curative strategies which aim to promote clearance of the infected cells. Using a two-color haploid screen, we identified 69 candidate genes as latency maintaining host factors and functionally validated a subset of 10 of those in additional T-cell based cell line models of HIV-1 latency. We further demonstrated that CHD9 is associated with HIV-1’s promoter, the 5’LTR while this association is lost upon reactivation. Additionally, we characterized the latency reversal potential of FDA compounds targeting ADK, NF1, and GRIK5 and identify the GRIK5 inhibitor Topiramate as a viable latency reversal agent with clinical potential.


Introduction
Combination antiretroviral therapy (cART) has proven to effectively abrogate viral replication in HIV-1-infected patients and has substantially reduced AIDS-related mortality. However, cART is not curative and patients must remain on life-long medication regiments, as interruption of the therapy leads to rapid rebound of viral replication (1). This is due to the persistence of a reservoir of latently infected cells, harboring replication-competent provirus blocked at the level of gene expression, that escape clearance by the immune system (2). Therapeutic strategies toward HIV-1 cures aim to inactivate, reduce or completely eradicate the latent reservoir, such that, upon cessation of cART, the patient's immune system can effectively control the infection or fully clear it (2). Strategies aiming to reduce or eliminate the reservoir rely on drugs, termed latency-reversing agents (LRAs), capable of inducing the latent HIV infected cells to express viral genes to render infected cells susceptible to cytopathic effects and/or recognition and clearance by the immune system (3). Much focus has therefore been placed on finding small molecules for activating transcription of the latent HIV-1 provirus, which are not cytotoxic and do not induce harmful T cell activation or proliferation.
The identification of molecules capable of inducing HIV gene expression has been largely accomplished using candidate approaches, which build on existing knowledge of transcription factors and co-factors that bind to and regulate transcription at the HIV LTR (4)(5)(6). One of the most clinically studied classes of molecules in HIV latency reversal, HDAC inhibitors (HDACis) such as SAHA (7), valproic acid (8), rhomidepsin (9) and M344 (10), target HDACs, which have been shown to be recruited to the HIV LTR by multiple transcription factors (11)(12)(13) to deacetylate histones and repress transcription. Similarly, agonists of the PKC pathway, such as prostratin and bryostratin, have been studied as activators of the HIV LTR, as they induce nuclear localization and LTR binding of the NFκB (p65/p50) heterodimer, a potent activator of HIV transcription (14,15). While this candidate approach has led to the identification of druggable targets or potential candidate LRAs, none of the LRAs currently under clinical investigation are capable of strong latency reversal in patients or lead to a reduction in the size of the latent HIV reservoir (16,17).
Thus, to identify more potent and clinically relevant LRAs, it is critical to identify the full repertoire of functionally relevant host factors and pathways that play a role in the maintenance of latency. 4 Complementary to the approach of targeting candidate LTR-bound transcription factors and complexes for inhibition or activation, other studies have embarked on unbiased screens of small molecule libraries to identify compounds capable of reversing HIV latency (18). Alternatively, recent large-scale unbiased gene knock-out/knock-down screens have been employed to unravel the molecular mechanisms of latent HIV (19)(20)(21)(22)(23)(24)(25)(26)(27)(28). RNA interference methods, which rely on the reduction of mRNA expression levels, have been widely used as screening platforms in mammalian cells (29). However, this method has been shown to suffer from serious limitations, including the presence of false-positives due to off-target effects and the persistence of residual gene expression, which results in false negatives (30,31). For example, several RNAi screens have been performed to identify host cell factors essential for HIV infection and replication (21,(32)(33)(34); surprisingly, very little overlap was found between the lists of genes identified in these screens, pointing to the need for other screening methods that achieve complete inactivation of genes. Another large-scale unbiased gene disruption approach uses the precision of CRISPR-Cas9 targeting for complete gene knock-outs (35). A recent screen using a lentiviral sgRNA sub-library targeting nuclear proteins identified MINA53 as a possible latency-promoting gene (LPG) (28).
In mammalian cells, functional analysis via forward genetic screens and mutational analysis has largely been hampered due to diploidy, as, in somatic cells, when one copy of a gene essential for a cellular process is inactivated, the second copy often remains active and compensates for that partial loss. Pseudo-haploid screens are based on KBM7s, a chronic myelogenous leukemia (CML) cell line, which is haploid for all chromosomes except for chromosome 8 and a 30Mb stretch of chromosome 15 (36), allowing for forward genetics in mammalian cells (37). Using Gene-Trap (GT) retrovirus-mediated mutagenesis for generating a library of gene knockouts in KBM7 cells enables unbiased loss-of-function screening in mutant cells for the identification of host genes essential to a specific cell function. Haploid screens have proven to be a powerful method to identify genes involved in drug import (38,39), druggable target genes to treat cancer (40- 42), key components of cellular pathways (43)(44)(45)(46)(47), and genes involved in the pathogenesis of various viruses (48)(49)(50)(51)(52)(53)(54)(55) and susceptibility to toxins (37,56,57).
To identify novel host genes that could potentially serve as molecular targets for HIV latency reversal, we performed insertional mutagenesis in latently HIV-1-infected KBM7 cells. First, using a flow cytometry activated cell sorting-based strategy described previously (58,59), we generated a latent pseudo-haploid KBM7/(Hap-Lat) cell line that harbors an integrated 5 transcriptionally silent HIV-1 5'LTR controlling the expression of a GFP reporter gene. Hap-Lat cells were then subjected to GT insertional mutagenesis, using a mCherry reporter GT virus, and Hap-Lat cells expressing both GFP (as a reporter of HIV-1 expression) and mCherry (confirming the presence of a GT integration within an active gene) were then sorted using fluorescenceactivated cell sorting (FACS). GT integration sites within these two populations were then amplified by inverse PCR and mapped to the genome by high-throughput sequencing.
Computational candidate identification resulted in the identification of 69 candidate genes, whose function is required for maintenance of HIV latency, but which are not essential to cell survival.
For functional follow-up experiments, we selected a subset of 16 GT-identified putative HIV latency re-enforcing target genes, which are also expressed in primary CD4+ T cells, and examined the effect of their depletion via shRNA knock-down in maintaining latency in J-Lat T cell lines A2 and 11.1. Functional validation identified 10 putative novel latency promoting genes: ADK, CHD9, CMSS1, GRIK5, USP15, IRF2BP2, EXOSC8, NF1, EVI2B, and FAM19A. Because of the obvious mechanistic potential of CHD9, a Chromodomain Helicase DNA-binding protein, we explored its association with the 5'LTR using ChIP-qPCR analysis. We found CHD9 to be enriched at the 5'LTR in the latent state but to dissociate from the HIV-1 promoter after PMA treatment. Loss of CHD9 enrichment upon re-activation was accompanied by a relative increase in H3 acetylation at the HIV-1 LTR, indicating a functional shift in chromatin organization from repressed to transcriptionally active. We also found three genes in our candidate list, ADK, GRIK5, and NF1 to be targetable by existing small molecule inhibitors. The latency reversal potential of these compounds was evaluated in J-Lat T cell lines A2 and 11.1, and in a primary cell model of HIV latency. We found 5-Iodotubercidin, which inhibits ADK, not to be a viable LRA due to its toxicity in both A2, 11.1, and primary CD4+ T-cells. While the NF1 inhibitor Trametinib reactivated latent HIV in the J-Lat cell lines, it was not effective in the primary cell model of latency. On the other hand, the GRIK5 inhibitor Topiramate reversed HIV-1 latency in both J-Lat T-cell lines as well as in primary CD4+ T cells harboring latent HIV-1, without significant associated cytotoxicity or T cell activation, and thus presents an interesting potential novel LRA. 6

Establishment of a latent pseudo-haploid cell line
To identify potentially druggable host genes as putative molecular targets for HIV latency reversal, we generated pseudo-haploid latent HIV-1-infected cells in which we performed insertional mutagenesis according to a strategy schematically depicted in Figure 1A. A latent HIV-1-infected pseudo-haploid KBM7 cell line was generated according to a previously described strategy used in Jurkat cells (58, 59) ( Figure 1B). Subsequently, haploid latent (Hap-Lat) cells harboring a latent integrated HIV-1-derived virus containing a GFP reporter, were subjected to insertional mutagenesis using gene trap virus carrying an mCherry reporter. Instead of using a lethalityselecting scheme, our system relies on fluorescence activated cell sorting (FACS) to select for reactivated cells, marked by elevated GFP expression resulting from insertional mutagenesis of genes essential for maintenance of HIV latency. Cells expressing both GFP and mCherry were FACS-sorted and expanded for multiple rounds, after which GT insertion sites were mapped and identified by inverse PCR and high-throughput sequencing. To establish a latent HIV infection in the pseudo haploid KBM7 system, near-haploid KBM7 cells were infected at low MOI with the single infectious cycle HIV-derived virus LTR-GFP-LTR, in which GFP reporter expression is controlled by the activity of the HIV-1 promoter or 5'LTR ( Figure 1A). After infection, the population of GFP negative cells comprising mainly uninfected cells and putative latently infected cells were sorted by FACS and subsequently stimulated with 5ng/μl TNF-α and 350μM Vorinostat.
In response, a small percentage of existing latent HIV-infected cells were transcriptionally reactivated and expressed GFP. These GFP positive cells were -sorted by FACS as single cells and expanded ( Figure 1B). The resulting clonal latent KBM7 haploid lines were characterized by flow cytometry to determine GFP expression at basal and stimulated states. From the clonal lines established, Hap-Lat#1 was selected for low basal activity and significant reactivation upon stimulation ( Figure 1C). To ensure maintenance of haploidy, Haplat#1 cells were periodically sorted to enrich for the 5% of smallest cells (Supplemental figure 1A).

Gene-trap mutagenesis of Hap-Lat cells
Hap-Lat#1 cells were mutagenized by infection with a murine stem cell virus (MSCV)-derived viral gene-trap (GT) vector containing an inactivated 3' LTR, an adenoviral splice-acceptor site, an mCherry reporter cassette and a polyA terminator tail (37). Dendritic and myeloid cells are notoriously refractory to retroviral infection (60,61). We indeed found that infectivity of KBM7 is poor compared to T-cell-derived cell lines SupT1 and Jurkat, as well as other myeloid-derived cell lines (Supplemental Figure 1B). SAMHD1, a nucleotide scavenger, has been identified as a causative restricting host factor that limits the free available pool of nucleotides for reverse transcription (62). To bypass SAMHD1-mediated restriction, we supplemented cells with 2μM nucleosides (dNs), which increased GT infectivity by approximately two-fold (Supplemental figure 1C). GT preferentially inserts in the 5' regions of genes (63), effectively knocking out gene expression by truncating the native transcript (Supplemental figure 1D). For a full-scale mutagenesis experiment, approximately 200 million Hap-Lat #1 cells were mutagenized using two rounds of infection with GT-mCherry in the presence of exogenously supplied nucleosides.
Infection of Hap-Lat#1 with GT-mCherry effectively caused reactivation of a subpopulation of latent KBM7 cells ( Figure 1D-E). We reasoned that insertional mutagenesis in genes essential for maintenance of HIV latency would result in Hap-Lat#1 cells expressing the GFP reporter. We determined GT integrations in individual GFP/mCherry double-positive clones and estimated that the sequential infection resulted in 1 to 4 GT integrations per cell, with the majority of cells containing one integration (data not shown). By gating conservatively, approximately 1-4% GFP/mCherry double-positive cells were then sorted and expanded ( Figure 1E). After expansion, reactivated cells tend to revert to a latent state (Supplemental figure 1E). To enrich for a more stable GFP-expressing, mCherry GT-containing double-positive cell population, cell sorting was repeated for multiple rounds ( Figure 1F). Sequential rounds of sorting led to the appearance of a stable subpopulation within the total double-positive population expressing high levels of mCherry ( Figure 1F). To examine any potential biological differences between the two, we separately sorted the total double-positive population and the mCherry-high subpopulation, which we designated GFP Total and GFP Sub respectively, for a final (5 th ) round ( Figure 1F). Genomic DNA extracted from GFP Total and GFP Sub obtained in the 5 th round of sorting was used to determine GT integrations, while that of a pool of unsorted GT-infected cells was used as a reference.

Mapping of insertion sites to identify host factors maintaining HIV latency
To determine the host sequences flanking the GT insertion sites, inverse PCR with primers annealing to internal sequences in the gene trap vector followed by amplification was performed.
The amplified products were processed for high-throughput sequencing ( Figure 1F). For GFP Sub 2 8 biological replicates, samples A and B, were generated. For GFP Total 3 biological replicates were generated, samples C, D and E. To estimate the sampling depth of our GT, we re-sequenced GFP Sub sample B and GFP Total sample D at greater depth. The resulting NGS datasets were processed for candidate gene identification. A previously described method to analyze GT data, HaSAPPy (Haploid Screen Analysis Package in Python), was rigorously re-implemented, appended with additional steps for quality control, library normalization, and optimized resolution for the selection of integration sites (64). HaSAPPY assigns a Local Outlier Factor (LOF) score to each gene in a sample based on a triplet score derived from the number of putative GT integrations in the sample compared to the reference. For each population, we compiled all genes with an LOF score >3 from each replicate and obtained 686 hits for GFP Total and 382 hits for GFP Sub . 183 genes were common to both populations (Supplementary figure S2A). Next, we investigated any potential biological basis for the difference between GFP Total and GFP Sub . Since expression levels of the integrated mCherry reporter appear to be on average higher in the GFP Sub population than in the GFP Total population, we wondered if expression levels of the targeted genes were higher in GFP Sub . We obtained recently published KBM7 gene expression data (65) and found no substantial difference in the average level of expression between the two populations (Supplementary figure   S2B). To determine if there was any difference in the functionality of the GT target genes found in GFP Total and GFP Sub , we performed enrichment analysis using GO terms and found no substantial differences in enrichment for biological process and molecular function ontologies Finally, we cross-referenced the GT target genes found in GFP Total and GFP Sub populations to the HIV interaction database (https://www.ncbi.nlm.nih.gov/genome/viruses/retroviruses/hiv-1/interactions/) and found that both the GFP Total and the GFP Sub populations contain similar fractions (21.9% and 20.2%, respectively) of genes previously reported to be involved in HIV biology, fractions which are substantially higher than the 7.4% found for the complete list of ENSEMBL genes (Supplementary figure S2F). In order to limit our extensive list of candidate genes for follow up functional validation, we applied more stringent thresholds for each population and defined candidate genes as having an LOF score equal to or greater than 3 in at least 2 biological replicate samples. We thus identified 19 candidate genes in the GFP Sub population and 55 in the GFP Total population (Figures 2A and B

Candidate list validation
Since our bioinformatics analysis did not reveal a defining difference between the GFP Total and the GFP Sub populations, we decided to proceed with functional assays using shRNA-mediated depletion of candidate genes obtained from both populations. To prioritize the candidate genes found in the KBM7 haploid screen for functional validation in the more biologically relevant Jurkat T-cell-based HIV-1 latency models J-Lat A2 and 11.1, we focused on protein-coding genes and took into account LOF scores as well as gene expression in white blood cells (as extracted  Table 1). Knockdown of SCN9A, RHOF, SPN, COPS5 and EVL did not result in significant latency reversal in one or both J-Lat models (Supplemental figure S4). These results demonstrate that a significant proportion of the candidate genes found in our myeloid-derived KBM7 haploid screen play a role in maintenance of HIV-1 latency in the more relevant T cell-derived J-Lat A2 and J-Lat 11.1 HIV latency models.

CHD9 is an LTR-associated repressor of HIV-1 transcription
Interestingly, two genes, CIITA and CHD9, from our candidate list are associated with the GOterm DNA binding (GO:0003677). CIITA is a well-established factor involved in HIV expression and has been previously shown to inhibit Tat function and hence viral replication (68,69). The Chromodomain helicase DNA binding protein 9 (CHD9) is a member of an ATP-dependent chromatin remodeler family, the members of which modulate DNA-histone interactions and positioning of nucleosomes and play key roles in stem cell regulation, development, and disease (70). Previously, we have shown that chromatin remodeling by another ATP-dependent remodeler, the BAF complex, plays a crucial role in maintenance of HIV-1 latency and its re-activation (59) . We therefore decided to further characterize the role of CHD9 in regulating HIV-1 gene expression. We knocked down CHD9 using a lentivirally transduced shRNA and verified its depletion in both J-Lat A2 and J-Lat 11.1 cells at the protein level by Western blotting ( Figure   3A). Depletion of CHD9 led to a significant reversal of latency, as shown by an increase in the percentages of GFP positive cells ( Figure 3B and C). Latency reversal was also confirmed by increased expression of viral genes Gag, Pol, and Tat in J-Lat 11.1 ( Figure 3D). To characterize a potential direct association of CHD9 with the latent HIV-1 5'LTR, we performed chromatin immuno-precipitation (ChIP) in latent and PMA-activated 11.1 J-Lat cells. The positions of nucleosomes within the latent HIV-1 LTR are rigid, with positioned nucleosomes Nuc-0, Nuc-1, and Nuc-2 separated by DNAse I-sensitive regions HSS1 and HSS2, respectively, visually summarized in Figure 3E. CHD9 was found to be enriched throughout the HIV-1 LTR in latent 11.1 J-Lat cells, predominantly present over the Nuc0-HSS1 region, and this association was significantly decreased after LTR activation by PMA treatment ( Figure 3F

Pharmacological targeting of ADK, GRIK5 and NF1
With the aim of identifying potential latency reversing agents (LRAs), we performed a literature search and identified three candidate genes, ADK, GRIK5, and NF1, all present in the GFP Sub list, for which FDA-approved small molecule inhibitors are available. We therefore examined the latency reversal potential of the ADK inhibitor 5-Iodotubercidin, the GRIK5 inhibitor Topiramate, and the NF1 inhibitor Trametinib. Adenosine kinase (ADK) is a phosphotransferase that converts adenosine into 5'-adenosine-monophosphate and thus plays a major role in regulating the intracellular and extracellular concentrations of adenosine, activation of specific signaling pathways, and bioenergetic and epigenetic functions (71,72). 5-Iodotubercidin is a purine derivative that inhibits adenosine kinase by competing with adenosine for binding to the enzyme (73). Glutamate ionotropic receptor kainate type subunit 5 (GRIK5) is a subunit of the tetrameric kainate receptor (KAR), a subgroup of ionotropic glutamate receptors. GRIK5, together with GRIK4, binds glutamate, whereas subunits GRIK1-3 form functional ion-channels (74).
Topiramate is an FDA-approved GRIK5 inhibitor employed as an anti-epileptic drug and is used to manage seizures and prevent migraines (75). Neurofibromin 1(NF1) is ubiquitously expressed; however, its highest levels are found in cells of the central nervous system and it has been described to function as a negative regulator of the Ras signal transduction pathway (76,77). Trametinib is an NF1 inhibitor which is also known as a mitogen-activated protein kinase (MAPK) kinase (MEK) inhibitor with anticancer activity and is FDA-approved for use in metastatic malignant melanoma (78).
We examined the effects of treatment with 5-Iodotubercidin, Topiramate, and Trametinib in J-Lat  figure S6). Taken together, our data indicate that the FDAapproved GRIK5 inhibitor Topiramate reverses HIV-1 latency in a variety of T cell models of latency, without induction of T cell activation and with limited cytotoxicity, and, therefore, can be considered an attractive LRA for further mechanistic and pre-clinical investigation. 13

Discussion
In search of potentially novel host factors and pathways that play a role in the maintenance of HIV-1 latency, we performed a two-color haploid genetic screen in latent HIV-1-infected KBM7 cells.
An important advantage presented by this approach is that identification of putative functionally relevant candidate latency-promoting host target genes does not require a-priori knowledge of the molecular determinants of latency and is thus completely unbiased. Additionally, gene-trap insertional mutagenesis has enabled identification of previously unappreciated latency genes and cellular pathways, which likely modulate latency not only via direct physical association with the HIV-1 promoter, but also indirectly, through involvement in cellular signaling. We produced a list of 69 candidate genes and proceeded to validate ten candidates, the depletion of which in various in vitro models of latency, including primary CD4+ T-cell models, led to latency reversal.
The haploid KBM7 cell line, while a powerful system for GT-mediated forward genetic screens, is a myeloid cell line (36). HIV can infect and establish latent infection in monocytes and macrophages (81,82), but these are not considered to be the prime source of the latent reservoir; this begs the question how relevant the myeloid nuclear environment may be for HIV latency in lymphocytic cells. Nevertheless, when we tested a selected subset of the candidate genes by shRNA-mediated knockdown in a T-cell derived model of HIV latency, we found that knockdown of 67% of candidates (10 out of 15) led to latency reversal, which demonstrates the validity of our approach. Moreover, several of the genes identified in our gene trap screen have previously been implicated in HIV susceptibility: 15 out of 69 genes are listed in the HIV-1 human interaction database (https://www.ncbi.nlm.nih.gov/genome/viruses/retroviruses/hiv-1/interactions/), a detailed database of all known interactions between HIV-1 and the human host. Furthermore, one of the candidates, IRF2BP2, is a potential target of HIV-associated nucleotide polymorphisms within a cluster of regulatory DNA elements (66) which loop to and potentially regulate the IRF2BP2 promoter in CD4+ T-cells (83,84). In the current study, we demonstrate that knockdown of IRF2BP2 results in latency reversal. IRF2BP2 has been shown to interact with NFAT1 and to repress transcriptional activity (85), providing a plausible mechanism for its role in latency maintenance.
Among the candidate gene list, we also identified CHD9, a member of the chromodomain helicase DNA-binding (CHD) family of the ATP dependent chromatin remodelers. Members of this family are involved in various cellular processes and in normal development and disease; however, CHD9 14 is one of the least-studied members. We found that CHD9 is associated with the latent HIV-1 5'LTR and is displaced upon promoter activation by PMA stimulation, suggesting that it acts as a repressor of HIV transcription. Indeed, depletion of CHD9 by shRNA-mediated knockdown led to de-repression of HIV, as observed by increase in the expression of the HIV-1 LTR-driven reporter GFP as well as the HIV-1 genes Gag, Pol and Tat. Future studies will determine the mode of recruitment of CHD9 to the HIV-1 LTR and the molecular interplay between CHD9 and other chromatin remodeler and modifying complexes associated with the latent HIV-1 5'LTR (4).
A large subset of the list of 69 candidate genes are non-coding RNAs (N=22); upon closer inspection we found that 16 of those are at least in part oriented in an anti-sense direction with respect to known protein-coding genes. We currently do not know if these non-coding transcripts fulfill a biological function in maintenance of HIV-1 latency, directly through their transcripts, through regulatory effects on other (protein-coding) genes in cis or in trans, through other effects, or if they represent mapping artefacts. Further investigation into these possibilities is ongoing.
An interesting observation emerging from our experimental set-up is the appearance of a subpopulation of cells after multiple rounds of sorting that is more stable in its GFP expression. This sub-population is also higher in mCherry expression, as compared to the total double-positive population. The reversal of double-positive cells after sorting may reflect the intrinsically stochastic transcription of HIV-1(86-88). Latent cell lines are notoriously sensitive to cellular stresses, which cause reactivation (89)(90)(91). It is, therefore, possible that the less stable GFPpositive cells represent cells that are temporarily activated, and which slowly revert back to a latent state. Therefore, we focused our analysis and validation experiments mainly on the stable GFPhigh population. Nevertheless, the few candidate genes unique to the total population that we tested in our shRNA knockdown validation experiment had a similar false positive ratio (i.e. 33.3%) as we found for the candidate genes from the stable GFP-positive population.
For our analysis, we set a strict criterion of the LOF score being ≥3 in at least two biological replicates, although a cut-off at lower LOF scores of 2 or above has been used previously (92). We chose this stringent criteria to confidently identify 69 candidate genes. We believe however that our dataset likely contains additional candidate genes that we miss due to the stringency applied.
This is exemplified by our observation that STRING analysis indicates that many of the 598 protein-coding genes with an LOF score of 3 and higher functionally interact (Supplemental Figure   15 S7). Moreover, 32% of these genes (195) appear in the HIV-1 human interaction database, pointing to their potential roles in HIV-1 biology. Importantly, proteins with well-described functions in HIV biology, such as, e.g., IL32 (LOF=9.5; (66) Our data point to the GRIK5 inhibitor Topiramate as a potentially promising compound for latency reversal. Topiramate, effectively reversed latency in primary HIV-1-infected CD4+ T cells without inducing significant T cell activation or cytotoxicity; this makes Topiramate a potentially clinically promising LRA and a target for further investigation. GRIK5 (glutamate ionotropic receptor kainate type subunit 5), primarily studied in neurons, is a subunit of the tetrameric kainate receptor (KAR), a subgroup of ionotropic glutamate receptors. GRIK5, together with GRIK4, bind glutamate, whereas subunits GRIK1-3 form functional ion-channels (74). In B-cells, KAR activation by glutamate increases ADAM10 levels, leading to increased B cell proliferation and immunoglobulin production (95). Topiramate is primarily used as an anticonvulsant or antiepileptic drug. While the exact mechanism by which Topiramate exerts anticonvulsant or antiepileptic properties is unclear, it has been shown to block voltage-dependent sodium and calcium channels (73,96), to inhibit the excitatory glutamate pathway and enhance inhibition by GABA (97). Interestingly, Topiramate can induce cytochrome P450 family member CYP3A4 activity and potentially negatively affect the metabolism of many drugs (98), which should be taken into account when considering a potential therapeutic combination of LRAs in future studies.

Conflicts of interest
The authors have no conflict of interest

Establishment of haploid latent (Hap-Lat) HIV infected cell lines
We used a strategy described previously (58,59)

Gene Trap virus production and mutagenesis of Hap-Lat cell lines
We adapted a strategy described previously (37)  After recovery, genomic DNA was isolated from the sorted cell populations, as well as from a population of mutagenized unsorted cells.

Mapping insertion sites through inverse PCR
A previously described inverse PCR protocol was adapted to determine host sequences flanking the proviral insertion sites (37). Briefly, genomic DNA was isolated from 5 million cells using the DNAeasy kit (Qiagen). 4ug gDNA was digested with NlaIII or MseI. After PCR spin column purification (Qiagen), 50ul of eluted digested DNA was ligated using T4 DNA ligase (Roche) in a volume of 2ml. The reaction mix was purified using spin columns and used in as template in a PCR reaction with primers annealing to internal sequences in the gene trap vector (5'-CTGCAGCATCGTTCTGTGTT-3' and 5'-TCTCCAAATCTCGGTGGAAC-3'). The resulting PCR products were used for library preparation.

High-throughput sequencing and identification of integration sites
Sequencing libraries were created using the Ion Plus Fragment Library Kit (ThermoFisher Scientific) according to manufacturer's instructions, with minor modifications: briefly, 15ng of the PCR products were diluted with ddH2O to a final volume of 39 μl. The samples were end-repaired, adaptor ligated and amplified; this was followed by 2 rounds of purification using Agencourt

Flow cytometry for GFP expression in the J-Lat cell lines
Cells were collected in PBS. GFP fluorescent signal indicating latency reversal was monitored using a LSRFortessa (BD Biosciences). Viability was determined using the forward scatter area versus side scatter area profiles (FSC-A, SSC-A). Data was analyzed using FlowJo software (version 9.7.4, Tree Star).

Total RNA Isolation and Quantitative RT-PCR (RT-qPCR)
Total RNA was isolated from transduced A2 and 11.1 cells using  Table 3. Expression data was calculated using 2-ΔΔCt method (101). ß-2-microglobulin (B2M) and GAPDH were used as housekeeping genes for the analysis.

Isolation and ex vivo infection of primary CD4+ T cells
HIV-1 latency ex vivo model was generated based on Lassen and Greene method by spinoculation   precipitated and subjected to qPCR analysis using the primer sets summarized in Table 3.   Efficacy of knockdown by shRNA was quantitated by RT-qPCR. Re-activation of HIV-1 was assessed by measuring by the percentage of cells expressing GFP (green bars) and cell viability (gray bars). RT-qPCR data are presented as mean ± SD normalized to the control. Statistical significance was calculated using ratio-paired t-test and multiple comparison t-test on Log2 transformed fold changes (* -p < 0,05, ** -p < 0,01, *** -p < 0,001). Background color indicates if the Gene Trap target gene is found in the GFP Total , in the GFP Sub or in both populations. Table 1 Prioritized candidate list for the GFP Total and the GFP Sub populations.