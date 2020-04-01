Study cohort and sampling

PBMCs and tissue samples were obtained from the first 6 PWH enrolled in the Last Gift cohort (92, 93). Study participants were followed closely near the end of their lives (approximatively every other week, with closer intervals as death approached). During these visits, participants provided: (a) detailed clinical and sociodemographic information before their death (use of ART, chemotherapy and other therapies, surgical procedures, coinfections, etc.); (b) blood samples while they were alive; and (c) their entire bodies after they died for a rapid autopsy. Clinical characteristics of the study participants are summarized in Supplemental Table 1 and Supplemental Figure 2.

Rapid autopsy

The Last Gift rapid autopsy protocol was designed to collect tissues within 6 hours of death to minimize postmortem tissue degradation. At the time of death, the body was rapidly transported to the UCSD morgue, where the team performed a complete autopsy of all organs to obtain tissue samples, which were either formalin-fixed and paraffin-embedded for histological analysis or snap-frozen in liquid nitrogen. Fluids collected included CSF and blood drawn from cardiac puncture.

HIV DNA quantification and sequencing

DNA extraction, purification, and quantification. Genomic DNA was extracted from 5 million PBMCs and snap-frozen tissues using a QIAamp DNA Mini Kit (QIAGEN, catalog 51306) according to the manufacturer’s protocol. After extraction, precipitation was performed to concentrate DNA. Concentrations of DNA were determined using NanoDrop One (Thermo Fisher Scientific). Levels of extracted HIV DNA were quantified by droplet digital PCR (ddPCR) using the Bio-Rad QX200 Droplet Reader (94). Copy numbers were calculated as the mean of 3 replicate PCR measurements and normalized to 1 million cells, as determined by RPP30 assay (total cell count) (94, 95).

RNA extraction and quantification. RNA was extracted from blood plasma by layering 500–700 μL plasma on top of 200 μL 20% sterile filtered sucrose solution. Samples were spun at 23,500 ×g for 1 hour at 4°C to pellet the virus. Supernatant was removed and the pellet resuspended in 140 μL PBS. RNA was extracted using a QIAamp Viral RNA Mini Kit (QIAGEN, catalog 52904) according to the manufacturer’s recommendation. cDNA from HIV RNA was generated using the One-Step RT-ddPCR Advanced Kit for Probes (Bio-Rad, catalog 186-4021), and levels were quantified by ddPCR using the Bio-Rad QX200 Droplet Reader. Copy numbers were calculated as the mean of 3 replicates.

Nested PCR. To amplify single-genome FL env, DNA extracted from antemortem PBMCs and postmortem tissues was diluted using ddPCR quantification data. This limited dilution PCR reaction can prevent PCR recombination and ambiguous base calls and allow the amplification of viral single genomes (87, 96). For HIV RNA in blood plasma, cDNA was generated from RNA using the SuperScript III First Strand Synthesis System (Thermo Fisher Scientific, catalog 18080-051). Template cDNA and HIV DNA extracted from tissues were diluted until approximately 30% of the second-round reactions were positive for the correctly sized amplification product. Primers used for the first round were as follows: 5′-FENVouter (forward) TTAGGCATCTCCTATGGCAGGAA and 3′-RENVouter (reverse) TCTTAAAGGTACCTGAGGTCTGACTGG. First-round PCRs were performed using the Advantage 2 PCR Kit (Takara, catalog 639206) following the manufacturer’s recommendations and 10× SA buffer (Takara, catalog 639206). Cycling conditions were as follows: 95°C for 1 minute, 35 cycles of 95°C for 15 seconds, 57°C for 30 seconds, 68°C for 3 minutes, with a final extension at 68°C for 10 minutes. The second-round PCRs were done using 5′-FENVinner (forward): GAGCAGAAGACAGTGGCAATGA and 3′RENVinner (reverse): CCACTTGCCACCCATBTTATAGCA. The cycling conditions were as follows: 95°C for 1 minute, 30 cycles of 95°C for 15 seconds, 64°C for 30 seconds, and 68°C for 3 minutes, with a final extension at 68°C for 10 minutes. PCR cleanups were performed on the second-round reaction products using a QIAquick PCR Purification Kit (QIAGEN, catalog 28106). DNA was quantified using a Qubit dsDNA HS Assay Kit (Invitrogen, Thermo Fisher Scientific, catalog Q32854). Quality and integrity were measured using Genomic DNA ScreenTape (Agilent Technologies, catalog 5067-5365) in combination with the 2200 TapeStation System (Agilent Technologies, Genomic DNA Reagents, catalog 5067-5366).

Nextera XT library preparation. Single-genome amplicons were prepared for deep sequencing using the Nextera XT DNA Library Preparation Kit (Illumina, FC-131-1096) with indexing of 96 samples per run (Nextera XT Index Kit, set A FC-131-2001) according to the manufacturer’s protocols.

Assembly of FL HIV env proviruses. We used a custom-designed pipeline to recover FL env HIV sequences from the paired-end reads. The pipeline included a preliminary step of quality control, which involved trimming reads for PHRED quality above or equal to 30 and removal of Illumina adapters. Next, overlapping identical paired forward and reverse reads were merged and premapped to the HXB2 reference genome. Cleaned reads were remapped to the de novo assembled near–FL env sequence before generation of the final consensus sequence. The minimum acceptable coverage was set to 10,000 reads. To identify mixtures (i.e., suggesting multiple amplified HIV templates), all generated FL env contigs were screened. Mixtures were identified on the basis of read coverage and variant calling. Contigs with evidence of SNPs with a frequency of greater than 1% were considered mixtures and excluded from further analyses.

Test for cross-contamination. A maximum likelihood (ML) phylogeny including all sequences from the 4 participants was estimated using IQ-TREE (http://www.iqtree.org/) with the general time-reversible (GTR) substitution model (97) to test for contamination, which would show as intermixed clustering of taxa between participants.

Identification of defective or hypermutant sequences. FL envelopes containing large deletions (>100 bp) were considered defective (86, 98). Deleterious stop codons were identified using the Gene Cutter tool (Los Alamos HIV Database; https://www.hiv.lanl.gov/content/sequence/GENE_CUTTER/cutter.html). Any contigs containing a stop codon were considered defective. APOBEC-induced ( apo lipoprotein B mRNA-editing e nzyme, c atalytic polypeptide-like–induced) G–A hypermutations were identified using the Los Alamos HIV Database Hypermut 2 program (https://www.hiv.lanl.gov/content/sequence/HYPERMUT/hypermut.html) and the participant’s consensus sequence (99, 100). Proviruses with a number of mutations significantly higher than those in the participant’s consensus (P < 0.05, Fisher’s exact test) were considered hypermutant and were not included in the downstream analyses described below.

Sequence analyses

Identification of identical FL env sequences. To determine the sequences that were greater than or equal to 99% or 100% genetically identical, we used the ElimDupes tool from the Los Alamos HIV database, with a genetic identity threshold of 99% or greater or 100% as the analysis parameter (101). A sequence was classified as identical if it was a 100% match against another sequence sampled from the same participant. Once identified, the proportion of identical and nearly identical sequences was calculated by dividing the total number of sequences classified as identical/nearly identical for each participant or compartment by the total number of sequences for that group.

Diversity and divergence. Viral diversity was defined as the average pairwise genetic distance between sequences from a compartment using the TN93 correction for multiple hits (46). Viral divergence was assessed by computing the mean pairwise distance (TN93) between viral populations sampled across anatomical sites.

Coreceptor tropism. Viral tropism of each variant was inferred from the V3 amino acid sequence using Geno2pheno (https://coreceptor.geno2pheno.org/) (102). We applied a conservative 10% false-positive rate threshold for coreceptor CXCR4 usage on the basis of the European Consensus Group’s recommendation on clinical management of HIV-1 tropism testing.

Viral migration. Before evaluating the within-host migration processes, the level of spatial structure was quantified using the Simmonds association index (AI) implemented in BaTS, version 1.0 (103). We used the BEAST software package, version 1.10 (https://beast.community/) (104) for all evolutionary analysis. All sequences were considered isochronous, i.e., branch lengths were estimated in units of substitutions per site. For this, a strict molecular clock was specified, and the clock rate was fixed to 1. The substitution process was described with a HKY+Γ model (105, 106), and a constant population size was assumed. Discrete trait analyses were performed using the asymmetric diffusion model (107, 108). To identify the subset of migration rates that were most informative to reconstruct the dispersal history, we used a model averaging procedure (Bayesian stochastic search variable selection [BSSVS]) (107). BF support for all possible types of location exchanges was calculated using SpreaD3 (https://rega.kuleuven.be/cev/ecv/software/SpreaD3) (109). BFs between 3 and 20, between 20 and 150, and above 150 were considered positive, strong, and decisive support, respectively (110). Estimates of the posterior expected number of migration events between all pairs of locations (Markov jumps) were computed using stochastic mapping techniques (105, 111). To identify individual-level differences in within-host viral diffusion patterns, the trait analyses were performed on a patient-specific basis.

To investigate whether support for migration links followed from the relative abundance of the involved trait states, analyses were repeated while randomly permuting the compartment states between tips during the Markov chain Monte Carlo (MCMC) sampling (112), a technique analogous to the tip-date randomization procedure for testing the significance of the temporal signal (113, 114). Indeed, if the support for a particular migration rate persisted after randomizing the tip-to-location assignments, one cannot rule out the possibility that the support is due to sampling intensity differences. Furthermore, of the migration links that passed the above filter (good BF support in the “as is” analysis and poor BF support in the “tip-state-swap” analysis), only those that remained significant after accounting for the sampling heterogeneity were taken into account to further reduce the false-positive rate. To this end, the inclusion frequencies of the “tip-state-swap” analysis instead of those of the “as is” analysis were treated as the prior inclusion probabilities when recomputing the BF support, hereafter referred to as the adjusted BF. Only the results with an adjusted BF with positive support (BFs ≥3) are further discussed in this study. This approach was also used to investigate the sensitivity of the Simmonds AI to the sampling heterogeneity.

Multiple MCMC chains were run for a sufficient length of time to ensure convergence and adequate mixing (effective sample size [ESS] >200), which was inspected using Tracer, version 1.7 (https://beast.community/tracer) (115). The chains were sampled every 500,000 generations and combined after removal of the burn-in. Maximum clade credibility (MCC) trees were obtained with TreeAnnotator, version 1.10 (https://beast.community/programs) (104).

Identification of predictors of within-host spread. The GLM extension of the discrete trait model implemented in BEAST (116) was used to investigate the relevance of potential explanatory variables (predictors) to explaining the dispersal across body compartments. The following variables were included: (a) the number of FL intact env provirus recovered in each compartment to control for sampling bias effects (116); (b) the level of HIV DNA (ddPCR gag copy/106 cells for all antemortem PBMC and postmortem tissue samples) or level of HIV RNA for antemortem blood plasma samples (number of copies/μL); (c) viral diversity, estimated using the average pairwise genetic distance between each sequence; and (d) the proportion of X4-tropic provirus in each compartment. For each of these variables, the values in the compartment source and recipient were considered. We also included a matrix of pairwise measurements of TN93 distance between compartments (i.e., viral divergence). Colinearity between the variables included in the GLM model that was evaluated. When 2 variables showed a collinearity coefficient of 0.8 or higher, the analysis was repeated with only 1 variable in the model.

Statistics

Multivariable logistic regression was carried out in R, version 3.6.1, applying the function GLM and binomial link function, and was used to compare the proportion of sequences that were intact and identical (clonal). The independent variables in these analyses were participant and anatomical location. For the continuous average pairwise distance (diversity) outcome, multiple linear regression was used, with assumptions of constant variance and normality of residuals checked and met. Mixed models for both binary and continuous outcomes were analyzed with glmer and lmer from the lme4 R library. These mixed models used participant as the grouping factor and included a random term for either intercept alone, or intercept and compartment. Given the sparse nature of the data, all mixed models had difficulty converging with poor model fits. As such, the results and P values of these models were not reliable, and so the results are not presented. Given the large number of tissue compartments, the sparse sampling across participants, and the expected effect modification by participant and compartment, comparisons across specific compartments were not informative.

Study approval

All study participants were at least 18 years of age and provided written, informed consent. This study was approved by the UCSD Office of Human Research Protections Program (protocol 160563). One of the participants exercised his legal right-to-die option in the state of California (117). No government funds were used as part of the option.