Sex as a biological variable. Sex as a biological variable was not investigated in our study since only one enrollee in our parent study of Primary HIV Infection in Seattle self-identified as a woman. This participant did not donate a leukapheresis specimen, which was required for this project.

Study population. This study included 11 participants, 7 of whom initiated ART during acute infection (ART-acute-HIV), defined as 1.5 months or less between estimated time of infection and ART initiation, and 4 of whom initiated ART during chronic infection (ART-chronic-HIV), defined as more than 6 months between estimated time of infection and ART initiation. All participants were males aged 39–61 (median age 53) from the Primary Infection Clinic cohort based in Seattle, which enrolled very few women (23–25). Participants were selected based on the following criteria: (a) ART-acute-HIV individuals initiated ART within 6 weeks and ART-chronic-HIV individuals initiated ART more than 6 months from estimated time of infection, (b) HIV replication was ART-suppressed for more than 2 years and plasma viral RNA levels that were undetectable (<50 copies/mL) or with rare viremias with HIV RNA up to 200 copies/mL, and (c) all had banked PBMC aliquots from leukapheresis (note: no participants identifying as women donated leukapheresis specimens). History of viral loads, CD4+ and CD8+ T cell counts, and drug regimens for all individuals are provided in Supplemental Figure 2. All participants had either acute or early HIV infection at the time of cohort entry. ART was initiated based on clinical guidelines at the time participants enrolled and their personal preferences.

Serologic testing. Presence of serum antibodies for HIV, EBV, HSV1, HSV2, and CMV infection were determined in the University of Washington Clinical Virology Lab, using plasma banked from the time closest to primary HIV infection.

Screening of participants’ CD4+ T cells for peptide reactivity. PBMCs were thawed and rested overnight at 37°C. Following incubation, CD8+ T cells were depleted according to the manufacturer’s instructions (EasySep Human CD8 Positive Selection Kit II, StemCell Technologies). CD8+ T cell–depleted PBMCs were plated in a 96-well round-bottom plate at 200,000 cells per well in a final volume of 250 μL of RPMI with 10% heat-inactivated human serum containing 10 ng/mL recombinant human IL-7 (Peprotech), 1 μM raltegravir (integrase inhibitor; https://www.beiresources.org/), and 15 nM efavirenz (non-nucleoside reverse transcriptase inhibitor; https://www.beiresources.org/). Peptide pools (Supplemental Table 3) were added to reach a final concentration of 2 μg/mL. Plates were incubated at 37°C. On days 3, 5, 7, and 10, half of the culture media was replaced with fresh RPMI/10% human sera containing a final concentration of 10 IU/mL IL-2 (Peprotech), 10 ng/mL IL-7, 1 μM raltegravir, and 15 nM efavirenz. On day 10, cells were restimulated with 2 μg/mL of specific peptide pools, Brefeldin A (MilliporeSigma), and GolgiStop (BD Biosciences) for intracellular cytokine staining. Staphylococcal enterotoxin B (MilliporeSigma) was used as a positive control at a concentration of 0.4 μg/mL and for compensation and fluorescence-minus-one (FMO) controls, with the latter used to distinguish positive and negative cell populations. Plates were incubated at 37°C for 6 hours and placed at 4°C until staining. Plates were washed twice with PBS prior to staining with Live Dead IR (Thermo Fisher Scientific). Cells were surface stained with optimized concentrations of anti-CD3 PECy7 (clone SK7), anti-CD8 BV421 (clone RPA T8), and anti-CD137 PE (clone 4B4-1) (all BD Biosciences) for 30 minutes. Cells were fixed and permeabilized according to the manufacturer’s instructions (Foxp3/Transcription Factor Fixation/Permeabilization, eBioscience) and stained with optimized concentrations of the following antibodies from BD Biosciences: anti–IFN-γ BV605 (clone B27), anti–TNF-α APC (clone 6401.1111), and anti–IL-2 BV711 (clone 5344.111) for 30 minutes prior to washing and fixing in PBS/1% paraformaldehyde (Electron Microscopy Sciences). For all flow cytometric analyses, fluorescence was measured using an LSRII (BD Biosciences) and all analyses were performed using FlowJo (Tree Star, Inc.).

In vitro culturing of CD8+ T cell–depleted PBMCs. PBMCs were thawed and rested overnight at 37°C. CD8+ T cell–depleted PBMCs were plated in a 24-well plate at 2 × 106 cells per well in a final volume of 2 mL of RPMI supplemented with 10% heat-inactivated human serum containing 10 ng/mL recombinant human IL-7, 1 μM raltegravir, and 15 nM efavirenz. Peptide pools to which individuals were reactive (see Table 2) were added to reach a final concentration of 2 μg/mL and plates were incubated at 37°C. On days 3, 5, and 7, half of the culture media was replaced with fresh RPMI supplemented with 10% human sera containing a final concentration of 10 IU/mL IL-2, 10 ng/mL IL-7, 1 μM raltegravir, and 15 nM efavirenz. On day 10, cells were restimulated by addition of 2 μg/mL of peptide pools. Following a 30-hour incubation, cells were harvested and stained with Live Dead IR followed by optimized concentrations of the following antibodies from BD Biosciences: anti-CD3 PECy7 (clone SK7), anti-CD8 BV421 (clone RPA T8), and anti-CD137 PE (clone 4B4-1) for 30 minutes prior to washing. Live CD3+CD8–CD137+ cells were sorted on a BD Biosciences FACSAria cell sorter using FMO controls to set the sorting gates.

VODA. Standard curves for DNA quantitation were generated from HIV vector pNL4-3 (NIH AIDS Reagent Program) diluted from 10,000 copies/μL to 1 copy/μL at 1:10 serial dilution and 3 μL was used in a 20 μL qPCR assay. Human genomic DNA (Bioline) was diluted to 66.7 ng/μL to 0.0067 ng/μL at 1:10 serial dilution and 3 μL was used in a 20 μL qPCR assay. Both standards were added in the same wells in triplicate. DNA was aliquoted in 0.2 mL 8-strip tubes for single use and stored at –80°C for no more than 1 month. qPCR was performed in either 96-well or 384-well plates. PCR master mix consisted of 1× TaqMan Fast Advance Master Mix (Thermo Fisher Scientific), 140 nM of each HIV-1 probe (probeV1-LTR104-19, gag-B1, env-B2), 300 nM of each HIV-1 primer (NEC152, 5R633alt1, 5F1372alt1, 5R1504, 5F7724, and 5R7851), 120 nM human transferrin receptor probe (hTFR-exon-Cy5), 100 nM of each human primer (hTFR-exon-F and hTFR-exon-R2), 6 μL template or standards, and H 2 O to a total volume of 20 μL. H 2 O and 100 ng human genomic DNA were used as negative controls. PCR cycling parameters were as follows: initial denaturation at 95°C for 3 minutes, 45 cycles of 95°C for 5 seconds, 58°C for 15 seconds, and 60°C for 30 seconds on Quant Studio 6 (Thermo Fisher Scientific). Primers and probe for LTR, probeV1-LTR104-19, and NEC152 were published previously (55). Primers and probes for HIV env and gag and hTFR were designed using ABI Primer Express software; sequences are available in Supplemental Table 5.

Cell proliferation measurements. CD8+ T cell–depleted PBMCs were stained with 0.25 μL of CellTrace CFSE (Life Technologies) per 1 × 107 cells in 1 mL PBS for 7 minutes and quenched with 2 mL FBS. After staining, PBMCs were plated in a 24-well plate at 2 × 106 cells per well and stimulated with peptide pools at a concentration of 2 μg/mL in RPMI culture medium containing 1% penicillin-streptomycin, 10% human serum, 1 μM raltegravir, and 15 nM efavirenz. Anti-CD3/anti-CD28 Dynabeads, a human T cell activator, served as a positive control (Life Technologies). Stimulated cells were incubated for 5 days, after which cells were transferred to a 96-well round-bottom plate for staining. Live cells were identified by staining with the amine-reactive VIVID Pacific Blue viability marker (Life Technologies) for 20 minutes at room temperature. Following 2 washes in PBS, cells were stained with anti-CD3 PECy7 (BD Biosciences, clone SK7) and anti-CD8 PerCpCy5.5 (BD Biosciences, clone SK1). Fluorescence was measured using an LSRII (BD Biosciences) and all analyses were performed using FlowJo (Tree Star, Inc.)

IS looping assay. ISs were determined using an IS looping assay (ISLA), as previously described (16). In some cases, multiple displacement amplification (MDA) was performed prior to ISLA to amplify one HIV copy using HIV-specific primers (Supplemental Table 6).

Sequence analyses. Single genome sequences derived from ISLA or MDA-ISLA were edited using Geneious R8.1 (https://www.geneious.com/updates/geneious-prime-r8-1) to remove poor quality data, manually call ambiguous bases, and extract any mixed sequences (nucleotide sequences are available in the Supporting Data Values file). Subsequently, sequences were mapped to the human reference genome GRCh38.p2 with the IS pipeline developed in the Mullins Lab at the University of Washington (https://indra.mullins.microbiol.washington.edu/integrationsites/). The analysis pipeline utilized the final 40 bases of the HIV 3′ LTR to identify the site of provirus integration into the human genome. Sequences that mapped to an ambiguous location in the human genome due to HIV integration into a repetitive region were excluded. Gene names and genome locations were derived from Ensembl version 101 (http://aug2020.archive.ensembl.org/index.html) corresponding to GENCODE release 35 through annotations extracted from Ensembl’s BioMart data service. Genes were associated with ISs by computing the overlaps with IS locations. ISs falling within 10 kb upstream of a gene were considered within the promoter region for the gene. Unique ISs were determined by deduplicating on the tuple of (subject, chromosome/landmark, location, orientation). The multiplicity of an IS was defined as the number of times that exact (landmark, location, orientation) tuple is observed from independent amplification reactions within a participant. ISs with a multiplicity greater than 1 are assumed to originate from proliferating cells.

The location of ISs are reported in zero-origin, interbase coordinates; thus, location was identified between 2 nucleotides rather than a nucleotide (56). The top strand coordinate of the match was used as the location of integration. When the sequence matched the negative strand, the location of integration was defined as the value that is obtained by subtracting 4 from the top strand coordinate of the matched sequence, as previously described (57, 58). A total of 1,083 ISs from 7 participants in the ART-acute-HIV group and 632 ISs from 4 participants in the ART-chronic-HIV group was curated. After collapsing the ISs with the same location and orientation, we obtained 500 unique ISs in the ART-acute-HIV group and 520 unique ISs in the ART-chronic-HIV group. HIV-3′-HIV ISs are available in Retrovirus Integrations Database (https://rid.cancer.gov/bibliography.php). For our comparative analyses, we combined approximately 66,000 ISs from unstimulated primary CD4+ T cells infected with HIV-1 BaL for 48 hours (11) and approximately 3,000 primary resting CD4+ T cells infected with HIV NL4-3 for 96 hours (27) for a total of 69,184 unique in vitro ISs.

Statistical analyses: IS comparisons. All statistical analyses were conducted in R (https://www.R-project.org/). IS analyses were conducted following a prespecified tiered analysis plan. We curated gene sets from MSigDB v7.2 — H-“Hallmark” (50 gene sets), C2-“Canonical Pathways” (2,871 gene sets), and C2-“Chemical and genetic perturbations” (3,358 gene sets) — and filtered using the following criteria: (a) gene sets should contain at least 4 ISs from the in vivo data set, (b) ISs with assigned genes in the gene set must be from at least 2 unique participants, and (c) a priori minimum P value of 0.05 or less based on all possible permutations of ART-acute-HIV and ART-chronic-HIV labels among participants. Using these criteria, the number of gene sets within each of the 3 MSigDB collections was reduced to 22, 543, and 692 for H, C2.cp, and C2.cgp, respectively. We then used Fisher’s exact test to determine whether the number of unique ISs within genes and gene promoters in each gene set was independent of the source of the IS. Gene sets with significance at tier 1 (Holm-adjusted P ≤ 0.05) or tier 2 (FDR q ≤ 0.20 and unadjusted P ≤ 0.05) in either the ART-acute-HIV versus in vitro or ART-chronic-HIV versus in vitro analyses (Supplemental Table 4) were then considered for an in vivo–only analysis comparing ISs from participants in ART-acute-HIV and ART-acute-HIV groups. These gene sets are shown in Supplemental Table 4, along with the number of in vivo ISs overlapping or nonoverlapping with the genes in the gene set. Supplemental Table 4 also shows the odds ratio for ISs being associated with genes in the gene set, unadjusted Fisher’s exact test P values, Holm-adjusted P values, and FDR q values. For this in vivo–only analysis, we adjusted for multiple comparison only among the gene sets shown in Supplemental Table 4. This final test was not prespecified but will be prespecified in our future comparisons of IS analyses as co-primary with the unfiltered analysis that considers all gene sets. Limiting the comparison of in vivo gene sets already known to differ in one or both of the in vivo comparison groups versus the in vitro data improves power for identifying relevant differences in vivo by a priori considering only those gene sets with evidence of some enrichment or depletion compared to what we expect from the in vitro integration experiments.

Statistics: VODA. The estimates of HIV copies per 106 cells depicted in Figure 1 are based on logs of ratios of estimated numbers of HIV copies to estimated number of cells (using the hTFR housekeeping gene), averaged over 2 replicates. In some cases, only one replicate exceeded the limit of detection (see below), and in these cases the estimate is based on this replicate only (indicated as open circles). When neither replicate exceeded the limit of detection, the value is shown as an open square at 1.

We leverage the independence of the replicates and of the measurement error in PCR across reactions to obtain a pooled variance estimate for the uncertainty in these estimators. Error bars are used to indicate 2 standard errors, estimated using the following procedure. First, we estimate variances of the normally distributed error of the difference of log concentration estimates for numerator (HIV LTR) and denominator (hTFR) for each replicate separately; using each fitted standard curve (for predicting log concentration from C T ), one for HIV LTR and the other for hTFR, we estimate the variance of the mean of the 2 replicates by considering the variance of each replicate as the sum of the estimated residual variance from the fitted standard curve simple linear regression models for the corresponding plate. The final standard error of the mean is computed as one-half the square root of the sum of the variance estimates for each replicate. A complete set of such curves is given in Supplemental Figure 4. These curves give a normally distributed prediction of log concentration for any given observed C T value. Since the difference of these independent log concentration estimates (for HIV LTR and for hTFR) is approximately normally distributed with variance given by the sum of the variances of each component, this formula yields the estimated error in the log ratio.

The effective limit of detection varied across these results. To aid in interpretation of these estimated values and confidence limits, we have included dotted lines on each panel in Supplemental Figure 4 to indicate the value of copies per million cells that would be estimated if you observed a C T value corresponding to 1 viral copy in the reaction. Sometimes C T values beyond the threshold of 1 copy may be considered reliable; however, these lines can be considered effective limits of detection in that values below this line are as reliable as C T values beyond that threshold. Specifically, we compute these values as 106 per the geometric mean of the log hTFR copy values (over the 1–2 available replicate values).

Data availability. Data used to derive Figures 1–3 are included as a Supporting Data Values file.

Study approval. Specimens from participants were obtained after written informed consent was given, following a protocol approved by the University of Washington’s Human Subjects’ Institutional Review Board.