HIV proviral sequence database: a new public database for near full-length HIV proviral sequences and their meta-analyses

W Shao, J Shan, WS Hu, EK Halvas… - AIDS research and …, 2020 - liebertpub.com
W Shao, J Shan, WS Hu, EK Halvas, JW Mellors, JM Coffin, MF Kearney
AIDS research and human retroviruses, 2020liebertpub.com
Editor: Despite the success of antiretroviral therapy (ART), HIV persists and viremia
rebounds if ART is interrupted. 1 Characterizing proviruses in infected cells that persist on
ART is critical to developing strategies to eliminate them. Important studies from a limited
number of donors have revealed that the vast majority of proviruses that persist on ART are
defective, leaving only a small fraction that are potentially replication competent or ''intact''. 2–
4 Despite the importance of characterizing the proviral landscape, there is no centralized …
Editor: Despite the success of antiretroviral therapy (ART), HIV persists and viremia rebounds if ART is interrupted. 1 Characterizing proviruses in infected cells that persist on ART is critical to developing strategies to eliminate them. Important studies from a limited number of donors have revealed that the vast majority of proviruses that persist on ART are defective, leaving only a small fraction that are potentially replication competent or ‘‘intact’’. 2–4 Despite the importance of characterizing the proviral landscape, there is no centralized public database dedicated to proviral sequences, nor are there bioinformatics tools available to differentiate defective from sequence intact proviruses. To facilitate our understanding of the genetic structure and dynamics of the HIV reservoir, we developed the Proviral Sequence Database (PSD; https://psd. cancer. gov) for the storage and meta-analyses of HIV sequences before ART, that persist on ART, or that rebounded after ART was interrupted (Fig. 1). The PSD described in this study only includes HIV sequences from donors who provided informed consent in studies with institutional review board approval. The PSD was constructed with MySql (https://www. mysql. com), an open source relational database management system. It stores near full-length DNA and RNA HIV sequences from publications along with characteristics of the donor and sample type. HIV sequences are obtained by single-genome techniques and thus are derived from a single template. Proviral data can be queried in the PSD by donor characteristics (eg, on or off ART and plasma HIV RNA levels), sample type (eg, PBMC, lymph node, or other tissue or cell type), or sequence characteristics (eg, intact, infectious, and defective). Query results include downloadable sequence FASTA files with extensive annotation of each sequence, including insertions, deletions, frameshifts, premature stop codons in coding regions, and mutations likely to disrupt function in LTR regulatory elements. 5 A built-in tool called the Proviral Sequence Annotation & Intactness Test (Pro-Seq IT) allows users to submit their sequences, obtain the sequence annotations, and computationally infer proviral intactness. Host integration sites of proviruses are included when available.
In summary, the HIV PSD and accompanying tools are available for sequence storage and meta-analyses of HIV RNA and DNA genomes at (https://psd. cancer. gov). This resource and the analyses derived from it should contribute to the understanding of HIV persistence on ART and may reveal new curative strategies. PSD will be regularly updated and extended to include other retroviruses and tools as they become available.
Mary Ann Liebert