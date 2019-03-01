Simultaneous analysis of HIV-1 proviral sequences and integration sites. To investigate mechanisms of viral latency in HIV-1 patients treated with suppressive antiretroviral therapy, an analysis of proviral sequences in conjunction with corresponding chromosomal integration sites for each provirus would be highly informative; however, such an analysis has been precluded in the past by technical approaches that permit only isolated assessments of either proviral sequences or viral integration sites (12–14). To address this, we here developed an experimental approach to concurrently analyze pairs of proviral HIV-1 sequences and their respective chromosomal integration sites using a combined assay system, termed MIP-Seq. First, genomic DNA was isolated from CD4+ T cells of 3 HIV-1–infected patients treated with suppressive antiretroviral therapy for approximately 10 years (Supplemental Table 1; supplemental material available online with this article; https://doi.org/10.1172/JCI124291DS1), subjected to quantification of viral gag copies by droplet digital PCR (ddPCR), and diluted to single proviral genomes based on ddPCR results and Poisson distribution statistics. Afterward, cells were exposed to multiple displacement amplification (MDA) mediated by phi29 polymerase; this whole-genome amplification (WGA) process generates 1000–10,000 identical copies of an individual cell’s genome, including any proviral sequence possibly harbored by a given cell. Subsequently, material from each individual MDA reaction was split and separately subjected to viral sequence amplification with primers spanning near-full-length HIV-1 (15, 16) and to chromosomal integration site analysis based on integration site loop amplification (ISLA) (13), ligation-mediated PCR (LM-PCR) (17), or nonrestrictive linear amplification–mediated PCR (nrLAM-PCR) (18); frequently, a combination of these integration site assays was used, yielding identical results. Amplified near-full-length viral sequences and viral-host junctions were analyzed by Illumina MiSeq next-generation sequencing. Although intact proviruses constitute only a small minority of total HIV-1 DNA sequences, we sought to analyze roughly equal numbers of intact and defective sequences by prioritizing the investigation of proviral sequences that approximated the size of full HIV-1 genomes (>8 kb) based on gel electrophoresis analysis. Using this approach, we identified 100 intact proviral sequences and their corresponding integration sites from the 3 study patients; of these 100 intact sequences, we detected n = 73 distinct pairs of proviral sequences and integration sites. A total of 84 defective proviral sequences (with hypermutations, major deletions, or internal inversions) and their respective integration sites were also identified, of which n = 76 represented distinct combinations of proviral sequences and corresponding integration sites (Figure 1, Figure 2, Supplemental Tables 2 and 3, and Supplemental Figures 1 and 2). Notably, intact proviruses generated after MDA were phylogenetically intermingled with sequences identified without prior WGA, demonstrating that cell-free cloning of proviral sequences by MDA is not associated with a selection bias for individual proviruses (Supplemental Figure 1). Moreover, we observed intact proviral sequences after MDA that were highly similar or identical to near-full-length proviral sequences retrieved from viral outgrowth assays, indicating that genome-intact sequences can indeed be fully replication- and infection-competent (Supplemental Figure 1), as shown in our prior work (15). Within all amplified sequences, we detected 8 clusters of intact sequences, each consisting of multiple identical proviruses paired with identical chromosomal integration sites; one large cluster encompassed 20 individual identical intact sequences in study participant 1, all located at the same position in the zinc finger protein 721–encoding gene (ZNF721) on chromosome 4 (Figures 1 and 2). Together, these clusters of identical intact proviral sequences accounted for n = 35 (35%) of all n = 100 intact proviral sequences analyzed. The identification of such identical proviral sequences matched with identical viral integration sites strongly supports the role of clonal proliferation for maintaining and stabilizing a pool of viral reservoir cells encoding for intact HIV-1 (19–21). In addition to intact proviral sequences derived from such clonally expanded CD4+ T cells, we also noted 6 clusters of defective proviruses exhibiting identical viral sequences with identical viral integration sites in each cluster; these clusters involved n = 14 (16.7%) sequences of the entire pool of n = 84 defective sequences analyzed. Although the amplification of identical viral sequences, coupled with identical corresponding integration sites from distinct single proviruses, supported the technical consistency of our experimental approach, we conducted additional experiments to further validate our method: for those intact proviral sequences from which sufficient material was available, we analyzed the viral-host junction sequence at both the 5′ long terminal repeat (5′-LTR) and the 3′-LTR border regions, which verified the identity of the respective chromosomal integration site (Supplemental Table 3 and Supplemental Figure 3). Moreover, our experimental approach allowed us to investigate viral sequence variations in the viral 5′-LTR and/or 3′-LTR promoter regions, which are not covered by the near-full-length sequencing assays used previously for identification of genome-intact proviruses (5, 15, 22). These additional studies demonstrated that relative to the functionally intact promoter regions in HXB2, patient-derived HIV-1 promoters were highly conserved and diversity was mostly attributable to single base substitution mutations (Supplemental Figure 4).

Figure 1 Simultaneous analysis of near-full-length HIV-1 proviral sequences and corresponding HIV-1 integration sites. (A–C) Horizontal phylogenetic trees of all intact, near-full-length HIV-1 sequences from 3 study participants (P1–P3). Clonal sequences are listed only once; the number of clones is indicated by circular symbols. Chromosomal integration site coordinates (3′-LTR border) for each sequence are indicated.

Figure 2 Chromosomal positions of intact and defective HIV-1 proviruses. Circos plots demonstrating chromosomal integration site positioning of intact and defective proviruses from 3 study participants (A, patient 1; B, patient 2; C, patient 3). Color and line coding indicate viral sequence characteristics (intact vs. defective) and orientation of integrated provirus relative to host gene. Targeted genes were identified using Ensembl (v86); gene names are shown according to HUGO classification (https://www.genenames.org). Colored dots indicate the number of clones detected. *Sequences in chromosomal regions associated with multiple genes; †mixed orientation among these genes; #integration sites in pseudogenes.

Chromosomal integration site features of intact proviruses. We subsequently focused on identifying distinguishing features of intact proviral sequences and their chromosomal locations. In order to avoid bias due to clonal expansion, integration sites for each cluster of clonally identical sequences were counted only once for these studies. We observed that relative to defective proviruses, a larger fraction of intact proviruses was located in non-genic or pseudogenic genomic regions (Figure 3A), which were previously associated with a deeper level of viral latency in tissue culture models of HIV-1 infection (23–26). In addition, among proviruses integrated in genes, we found a higher proportion of intact proviruses integrated in opposite orientation to the host gene, which can support viral latency by increasing susceptibility to transcriptional interference (27) (Figure 3B). This trend for an enrichment of intact proviral sequences in non-genic/pseudogenic regions and in opposite orientation to host genes was also observable when the 3 study participants were analyzed individually (Supplemental Figure 5) and when clonal sequences were counted as individual sequences (Supplemental Figure 6). There were no differences between intact and defective proviruses with regard to their location in introns, exons, or repetitive genetic elements (Figure 3, C and D). Notably, we identified viral integration sites in a number of host genes that have frequently been associated with viral integration, including BACH2 and STAT5B (13, 14, 28). In each of these 2 genes, we noted a defective proviral sequence integrated in the same direction as the host gene, in addition to an intact sequence in STAT5B integrated in the opposite orientation. Moreover, computational gene ontology analysis indicated that genes harboring proviral sequences were frequently encoding for T cell transcription factors or otherwise involved in the regulation of T cell behavior; however, there were no distinct differences in the predicted functional profile between genes hosting intact versus defective proviruses (Figure 3, E–G). Chromosomal locations of proviral sequences derived from clonally expanded CD4+ T cells were genic for all analyzed intact and defective proviruses, consistent with prior results (13, 14), and slightly more frequently positioned in opposite orientation to host genes (intact proviruses: 4 opposite orientation, 3 same orientation, 1 mixed orientation; defective proviruses: 4 opposite orientation, 2 same orientation), but there was no evidence that genes harboring intact or defective sequences isolated from clonally expanded CD4+ T cells were enriched for cancer-associated functions (Figure 3G). These data suggest that during prolonged antiretroviral therapy, intact viral sequences located in non-genic regions and in opposite orientation to host genes are preferentially selected for, likely as a result of immune-mediated mechanisms.

Figure 3 Chromosomal annotations of HIV-1 integration sites associated with intact and defective proviral sequences. (A and B) Pie charts showing proportion of intact and defective HIV-1 sequences located in genic versus non-genic/pseudogenic regions (A), and with the same or opposite orientation relative to host genes (among sequences integrated in genes; B). Integration sites associated with multiple genes and mixed orientations to host genes were not considered for the analysis in B. Significance was tested using 2-tailed χ2 tests; nominal P values are reported. (C and D) Pie charts indicating the proportion of intact and defective HIV-1 sequences located in regions with defined repetitive genetic elements (C) (SINE, short interspersed nuclear element; LINE, long interspersed nuclear element; LTR, LTR retrotransposon; DNA, DNA transposon) and in exons or introns (D). (E–G) Ontology analysis of genes harboring defective and intact HIV-1 sequences. Data represent a categorization of genes harboring intact or defective HIV-1 sequences according to defined formal functional entities (E). (F) Top ten canonical pathways predicted by Ingenuity Pathway Analysis for genes containing intact or defective proviruses; x axis shows corresponding –log(P value) for each pathway using right-tailed Fisher’s exact tests, with a threshold of –log(0.05) marked as a dotted line. RhoGDI, Rho GDP dissociation inhibitor. (G) Positioning of intact and defective HIV-1 proviruses in cancer-related genes. Left y axis shows upper limit of the –log(P value) for each indicated category (Ingenuity Pathway Analysis–based right-tailed Fisher’s exact tests); right y axis depicts the number of sites identified in the “Cancer” category in the different gene groups. For A–G, clonal sequences were counted only once.

Chromatin accessibility and transcriptional activity at integration sites of intact proviruses. For a closer analysis of chromosomal integration site features, we used RNA-Seq for genome-wide transcriptional profiling of autologous CD4+ T cells and of sorted central memory and effector memory CD4+ T cells, the two CD4+ T cell subpopulations most frequently harboring HIV-1 proviral sequences (29, 30). These studies allowed us to calculate the chromosomal distance between each proviral integration site and the most proximal transcriptional start site (TSS), and to determine the transcriptional activity of the respective host genes containing proviral sequences. Simultaneously, the chromatin accessibility of genomic DNA regions from our study participants was assessed in total, central memory, and effector memory CD4+ T cells, applying for transposase-accessible chromatin using sequencing (ATAC-Seq); these data were used to determine the chromosomal distance between each proviral integration site and the center of the most proximal ATAC-Seq peak. In study participant 1, these experiments showed that relative to defective proviruses, intact proviral sequences showed an increased distance to the nearest active TSSs, coupled with an increased distance to the most proximal accessible chromatin regions; this was true when clonal sequences were counted only once (Figure 4, A–E) or when included as individual sequences in this analysis (Supplemental Figure 7, A–C). The distances to the TSSs and the ATAC-Seq peaks were closely correlated with each other (Figure 4F), but there was no marked difference between the intact proviral sequences integrated in opposite orientation to the host genes and the few intact sequences integrated in the same direction. Consistent with prior data (1, 31), integration of intact and defective proviruses was biased toward highly expressed host genes (Supplemental Figure 8), with a small trend toward lower gene expression intensity in genes harboring intact proviruses compared with those containing defective viral sequences (Figure 4B). Moreover, a composite analysis of the transcriptional activity of host genes harboring integration sites, normalized to the distance between integration sites and the most proximal TSSs, was compatible with preferential persistence of intact proviruses in regions with more limited transcriptional activity (Figure 4E and Supplemental Figure 7C). Together, data from this patient suggest selection of intact proviruses located in less-accessible chromatin and with increased distance to active TSSs; these features were associated with deeper levels of HIV-1 latency in previous in vitro studies (23, 25, 32).

Figure 4 Distinct chromosomal locations of intact HIV-1 proviruses in study participant 1. (A and B) Circos plots highlighting ATAC-Seq and RNA-Seq reads in proximity (ATAC-Seq: ±8000 bp; RNA-Seq: ±5000 bp) to integration sites of intact and defective HIV-1 proviruses. (C) Combined individual-value/box-and-whisker plots indicating the chromosomal distance between HIV-1 integration sites and the most proximal TSS listed in Ensembl v86 (databank), or identified through analysis of expressed RNA species located within the boundaries of the host gene, using autologous RNA-Seq data from the indicated cell populations and limiting the analysis to proviruses integrated in expressed genes. (D) Combined individual-value/box-and-whisker plots showing the chromosomal distance between integration sites and the center of the most proximal ATAC-Seq peaks in indicated CD4+ T cell populations. (E) Gene expression intensity of host genes harboring intact or defective proviral integration sites, normalized to the chromosomal distance between integration sites and the most proximal TSSs determined using autologous RNA-Seq data as described in C. In C–E, boxes and whiskers represent median, 25% and 75% percentiles, and minimum/maximum levels. Significance was calculated using 2-tailed Mann Whitney U tests; nominal P values are reported. (F) Distance between each integration site and most proximal TSS, plotted against corresponding distance between each integration site and the center of the nearest ATAC-Seq peak. Spearman’s correlation coefficients are shown for each cell population. In A–F, clonal sequences were shown/counted only once. EM, effector memory; CM, central memory.

In study patients 2 and 3, our results demonstrated an opposite pattern for chromosomal integration site features of intact proviruses: in both of these participants, we noted that relative to defective proviral sequences, intact HIV-1 proviruses appeared to be located in closer proximity to active TSSs and to accessible chromatin (Figure 5, A–E); this trend was also observed when clonal sequences were considered as individual sequences (Supplemental Figure 7, D–F). Distances to TSSs or ATAC-Seq peaks, again closely correlated with each other (Figure 5F), did not differ notably between intact proviruses integrated in the same orientation and those integrated in the opposite orientation to their respective host genes, but our statistical power to detect such differences was low given the comparatively small number of intact proviruses with the same directional configuration as the host gene. There was no significant difference in expression intensity between genes harboring intact versus defective proviruses in these two patients (Supplemental Figure 8); however, the transcriptional activity of host genes, normalized to the chromosomal distance between integration sites and the most proximal TSSs, suggested preferential enrichment of intact proviruses in closer proximity to host transcriptional activity after long-term antiretroviral therapy (Figure 5E and Supplemental Figure 7F). This pattern supports the presence of transcriptional interference between host and proviral gene expression, which has been previously described in in vitro models of HIV-1 latency (26, 31, 33), as a predominant mechanism for maintaining HIV-1 latency in participants 2 and 3. Transcriptional interference can effectively inhibit gene expression of proviruses (31, 33) and may explain the otherwise paradoxical finding that HIV-1 can remain transcriptionally silent despite integration in actively transcribed and typically highly expressed host genes. Our data suggest that a greater susceptibility to transcriptional interference, due to closer proximity to active transcriptional units and accessible chromatin sites of the host, provided a selection advantage for intact proviruses during long-term antiretroviral therapy in participants 2 and 3.