Endogenous retroviral signatures predict immunotherapy response in clear cell renal cell carcinoma

Human endogenous retroviruses (hERVs) are remnants of exogenous retroviruses that have integrated into the genome throughout evolution. We developed a computational workflow, hervQuant, which identified more than 3,000 transcriptionally active hERVs within The Cancer Genome Atlas (TCGA) pan-cancer RNA-Seq database. hERV expression was associated with clinical prognosis in several tumor types, most significantly clear cell renal cell carcinoma (ccRCC). We explored two mechanisms by which hERV expression may influence the tumor immune microenvironment in ccRCC: (i) RIG-I–like signaling and (ii) retroviral antigen activation of adaptive immunity. We demonstrated the ability of hERV signatures associated with these immune mechanisms to predict patient survival in ccRCC, independent of clinical staging and molecular subtyping. We identified potential tumor-specific hERV epitopes with evidence of translational activity through the use of a ccRCC ribosome profiling (Ribo-Seq) dataset, validated their ability to bind HLA in vitro, and identified the presence of MHC tetramer–positive T cells against predicted epitopes. hERV sequences identified through this screening approach were significantly more highly expressed in ccRCC tumors responsive to treatment with programmed death receptor 1 (PD-1) inhibition. hervQuant provides insights into the role of hERVs within the tumor immune microenvironment, as well as evidence that hERV expression could serve as a biomarker for patient prognosis and response to immunotherapy.


Introduction
Human endogenous retroviruses (hERVs) are remnants of exogenous retroviruses integrated into the primate genome over evolutionary time (1). hERVs share genomic similarities to other retroviruses, including the presence of functional and remnant 5′ and 3′ long terminal repeats (LTRs), and gag, pro, pol, and env genes. Subsets of recently integrated hERVs still maintain limited translation under physiological and pathological conditions (2)(3)(4)(5)(6), including evidence for modulation of melanoma, lymphomas, leukemias, and ovarian, breast, prostate, urothelial, and renal carcinomas (5,(7)(8)(9)(10)(11)(12)(13)(14). Although studies have identified the role of specific hERVs in the pathogenesis and progression of these cancers, to date there have been a limited number of pan-cancer studies elucidating the landscape and impact of hERV expression. A recent study by Rooney et al. analyzed features associated with genes important for immune cytolytic activity, finding that one of these associated features was expression of a small subset of hERVs (15). While this study provided evidence that hERV expression associated with an immune phenotype, the exploration of hERVs was limited by a small reference set, no reported mechanism of association or prognostic impact of hERV expression, and no confirmation of a hERV-specific immune population within any tumor type. Thus, the role of hERVs in modulating the tumor immune microenvironment remains largely unexplored, predominately due to a lack of tools for identification of full-length, intact hERVs from sequencing data. To fully understand the role of hERVs in antitumor immunity, a more comprehensive database containing greater numbers of individual full-length hERVs is required. Understanding patterns of hERV expression will allow for greater knowledge of the impact of hERVs on tumorimmune interactions, the design of new prognostic models based on hERV signatures, and further identification of tumor-specific hERV epitopes for targeted tumor vaccinations.
Currently, a limited repertoire of tools are available for hERV quantification. There exist several databases of hERV elements, including HERVd, which contains hERV-like elements, and their genomic locations that have been used for analysis of RNA-Seq data (16)(17)(18). Additionally, there are several tools for identification of intra-and intergenic hERV-like elements (19), related transposable elements (20), and interspersed repeats (RepeatMasker) among human transcripts (21). While these resources provide Human endogenous retroviruses (hERVs) are remnants of exogenous retroviruses that have integrated into the genome throughout evolution. We developed a computational workflow, hervQuant, which identified more than 3,000 transcriptionally active hERVs within The Cancer Genome Atlas (TCGA) pan-cancer RNA-Seq database. hERV expression was associated with clinical prognosis in several tumor types, most significantly clear cell renal cell carcinoma (ccRCC). We explored two mechanisms by which hERV expression may influence the tumor immune microenvironment in ccRCC: (i) RIG-I-like signaling and (ii) retroviral antigen activation of adaptive immunity. We demonstrated the ability of hERV signatures associated with these immune mechanisms to predict patient survival in ccRCC, independent of clinical staging and molecular subtyping. We identified potential tumor-specific hERV epitopes with evidence of translational activity through the use of a ccRCC ribosome profiling (Ribo-Seq) dataset, validated their ability to bind HLA in vitro, and identified the presence of MHC tetramer-positive T cells against predicted epitopes. hERV sequences identified through this screening approach were significantly more highly expressed in ccRCC tumors responsive to treatment with programmed death receptor 1 (PD-1) inhibition. hervQuant provides insights into the role of hERVs within the tumor immune microenvironment, as well as evidence that hERV expression could serve as a biomarker for patient prognosis and response to immunotherapy.
Endogenous retroviral signatures predict immunotherapy response in clear cell renal cell carcinoma Euclidean distance of mean hERV expression between each cancer type ( Figure 1B and Supplemental Figure 5). Tumor types with lowest overall hERV expression (LIHC, ACC, UVM) were closely related by unsupervised clustering and shared very low similarity with all other tumor types. Two large clusters comprised 10 (PCPG,  SKCM, CHOL, SARC, THYM, DLBC, PRAD, THCA, KICH, and LGG) and 8 (LUAD, PAAD, BLCA, CESC, MESO, UCEC, UCS, and BRCA) cancer types. While several cancer types demonstrated similar hERV expression patterns based on tissue location (UCEC and UCS, HNSC and LUSC, KIRC and KIRP, and READ and COAD), the clustering observed between various tumor types suggests that hERV expression may be conserved among cancers across a variety of tissues. Notably, two tumor types with immune-privileged tissues of origin (TGCT and UVM) demonstrated lower similarities to all other cancers. Lack of immune interactions within these native tissues may potentially result in unique hERV expression profiles in these tumors, suggesting that shared hERV expression profiles within other tumor types may be shaped by the presence of related tumor immune responses.
We next examined the association between hERV expression and immune features, age, and survival among tumor types. We first performed multivariable linear regression of hERV expression by cancer type with 46 immune gene signatures (IGS) previously described in the literature (28-33) ( Figure 1D and Supplemental Figure 6). A small population of hERVs demonstrated near ubiquitous positive or negative association with all IGS, with the majority of hERVs showing a split association pattern. Included among IGS that demonstrated positive association with the majority of significant hERVs (GLM FDR-corrected P < 0.05) were those associated with immune cells known to have antitumor effector function, including effector and central memory T cells and NK cells. Additionally, a signature of anti-PD-1 (aPD1) responsiveness (IPRES_aPD1_responder) was positively associated with hERV expression in 79.2% (1,472 of 1,858) of significantly associated hERVs, while a signature for nonresponder tumor biopsies (IPRES_aPD1_nonresponder) was negatively associated with all hERV expression in 83.0% (1,679 of 2,024) of significantly associated hERVs (34). We next examined the association between hERV expression and age, controlling for tumor type, and observed that the majority of significantly associated hERVs demonstrated negative association between expression and patient age (GLM FDR-corrected P < 0.05; 150 with coefficient <0; 13 with coefficient >0; Supplemental Figure 7). To elucidate whether hERV expression associated with clinical outcome, we performed Cox's proportional hazard regression (CoxPH) for hERV expression across all cancer types. Association of survival with mean hERV expression identified 3 tumor types with prognostic mean hERV expression (KICH, COAD, and KIRC). In all 3 tumor types, mean methods to quantify expression of hERV-like elements among transcripts, they do not provide quantification based on an intact, full-length hERV proviral reference. This capability to distinguish and quantify individual hERVs provides a useful tool to class ify hERVs into distinct groups based on biological associations in various cancers.
Recently, Vargiu et al. compiled a database of 3,173 intact, fulllength hERV sequences and developed a comprehensive method for classifying these sequences into 11 superfamilies (Supplemental Table 1; supplemental material available online with this article; https://doi.org/10.1172/JCI121476DS1) (3). Using this database as a reference, we designed a computational workflow for identifying the expression of specific hERVs from RNA sequencing (RNA-Seq), hervQuant, and quantified hERV expression within the Cancer Genome Atlas (TCGA) pan-cancer dataset. We assessed interactions of specific hERVs with immune and clinical features. Among all cancer types encompassed within the pan-cancer dataset, clear cell renal cell carcinoma (ccRCC, designated by TCGA as KIRC) contained the greatest number of prognostic hERVs. Thus, we explored two mechanisms by which hERV expression may influence the tumor immune microenvironment in ccRCC: (i) activation of RIG-I-like pathway signaling and (ii) hERV epitope-triggered T and B cell activation. Using biological classes of hERV signatures derived from these two mechanisms, we further demonstrated the ability of hERV expression to predict patient survival in a multivariate regression model, independent of traditional clinical staging and molecular subtyping. Last, we used a publicly available ccRCC ribosome profiling (Ribo-Seq) dataset (22) to screen for translation of tumor-specific hERV epitopes, validated their capacity to bind HLA in vitro, and demonstrated the presence of tetramer-positive epitope-specific T cells within ccRCC tumors. We found tumorspecific hERV expression to be associated with clinical response to PD-1 axis inhibition in ccRCC patients, suggesting that hERV expression may provide a biomarker for immunotherapy responsiveness and hERV viral proteins may provide targetable, tumorspecific epitopes. The information gained from hERV expression profiling gives new insight into the role of hERVs within tumorimmune microenvironment interactions and provides evidence for hERV expression-based molecular models for patient prognosis and responsiveness to immunotherapy.

Results
Expression and association of hERVs in TCGA pan-cancer. TCGA pan-cancer hERV expression was determined using hervQuant, described in detail in the Supplemental Notes ( Figure 1A and Supplemental Figures 1 and 2). For consistency, only samples sequenced by Illumina NextSeq at 2 × 50 bp were analyzed, resulting in complete removal of ESCA, GBM, OV, and STAD and partial removal of COAD, UCEC, and READ subtypes (see Supplemental Table 2 for tumor abbreviations). All 3,173 reference hERVs were expressed in at least one sample, encompassing all 11 superfamilies and 3 lineages (Supplemental Table 1). Relative hERV expression patterns were strikingly homogenous across all cancer types ( Figure 1B and Supplemental Figure 3). Among all cancer types, TGCT demonstrated the greatest mean and median hERV expression, while LIHC, ACC, and UVM ranked last (Supplemental Figure 4). To identify similar hERV expression patterns across models, we calculated the negative association to key agonistic genes in NF-κB signaling (e.g. TBK1, TANK, and AZI2). CoxPH of hERV expression within TCGA KIRC provided further evidence that these groups are biologically distinct, with the majority of group 1 and 2 hERVs providing association with longer and shorter overall survival, respectively (Figure 2C). In addition, group 2 and non-prognostic group 1 hERVs (CoxPH Bonferroni-corrected P > 0.05) demonstrated a significant positive association with the majority of IGS (93%, 57%, and 60%, respectively), while prognostic group 1 hERVs (Bonferronicorrected P ≤ 0.05; majority associated with longer overall survival) largely demonstrated a negative association with IGS (33%), including those for T cells, B cells, dendritic cells, macrophages, and NK cells ( Figure 2D and Supplemental Figure 11). Despite these negative association patterns with IGS observed in prognostic group 1 hERVs, TCGA KIRC samples with greater expression of these hERVs had decreased ratios of Treg to CD8 + IGS (Treg IGS divided by the mean of 3 CD8 + IGS) compared with any other hERV group, suggesting the immune infiltrate associated with prognostic group 1 hERVs was less immunosuppressive than that of non-prognostic group 1 and group 2 hERVs (Supplemental Figure 12). Additionally, prognostic group 1 hERVs demonstrated positive association with signatures for Th17 T cells, which have been associated with a more favorable prognosis in ccRCC (39). Overall, this analysis provided the first evidence to our knowledge for biologically distinct hERV groups that differentially interact with innate immune sensing, with differential downstream prognostic and immunological effects and prognostic associations.
hERV expression in ccRCC demonstrates evidence of B cell activation. In addition to innate immune sensor signaling, hERVs can trigger antitumor immunity through tumor-specific expression of viral epitopes. In cancer patients, high antibody titers have been known to develop against hERV proteins with specificity of expression within the tumor, with little else known regarding the role of this B cell response (40). To determine whether hERVs show evidence of an adaptive immune response in ccRCC, we identified T/B cell clonotype repertoires in TCGA KIRC using MiXCR and filtered on T/B cell receptors (TCRs/BCRs; defined as shared CDR3 amino acid sequence) observed in ≥10% of patients (41). These filtering criteria resulted in no shared TCR clonotypes, suggesting potentially low sensitivity of detection for MiXCR-derived TCR data in RNA-Seq data. In contrast, 437 shared BCRs were identified, of which 397 were significantly associated with expression of ≥1 hERV ( Figure 3A, left). Within this pool, 4 clones had significant positive association with the expression of 1,207 hERVs, suggesting a potential hERV epitope-driven B cell response (Figure 3A, right, and Supplemental Table 3). Differential superfamily distribution patterns were observed between BCR-associated and non-BCR-associated hERVs, suggesting certain superfamilies may have a greater propensity for triggering B cell activation (HERVERI, HML, HSERVIII, and HERVW9; FDR-corrected χ 2 test P ≤ 0.05; Supplemental Figure 13). Furthermore, multiple sequence alignment (Clustal Omega) of proviral sequences from these BCR-associated hERVs identified large regions of high sequence identity (Supplemental Figure 14). Filtering on sequence identity of ≥25% of all BCR-associated hERVs with a sequence length ≥21 base pairs (the approximate minimal length necessary for immunoglobulin CDR3 region specificity) (42), we observed 8  Table 2 for number of samples per TCGA cancer cohort.
hERV expression was negatively prognostic (Supplemental Figure 8). Additionally, we examined Kaplan-Meier survival curves for each TCGA cancer type split by upper versus lower 50th percentile mean hERV expression, and observed 5 cancer types with significant separation of survival curves (Supplemental Figure  9; BLCA, COAD, KICH, KIRC, and PCPG; log-ranked P < 0.05). Among these 5 cancer types, KIRC was the most associated with survival. All cancer types except BLCA demonstrated shorter survival in patients with greater mean hERV expression. To perform a more detailed analysis, we associated survival with expression of each individual hERV ( Figure 1E and Supplemental Figure 10). TCGA KIRC (ccRCC), a tumor type in which several hERVs have been shown to be actively translated (14,(35)(36)(37), constituted 25.1% of all significantly prognostic hERVs, with over 1.5× more significant hERVs than the next highest cancer, LGG (KIRC: 362; LGG: 230; Figure 1E). To elucidate the immune mechanisms behind this enrichment of prognostic hERVs in ccRCC, we focused on this cancer type for the remainder of our analyses.
hERV expression in ccRCC demonstrates evidence of immune stimulation through RIG-I-like signaling. Several groups have demonstrated that activation of select endogenous retroviral elements can trigger signaling through innate immune sensors, including double-stranded RNAs (dsRNA) that subsequently signal through cytosolic RIG-I-like receptors (26,27). To elucidate a more comprehensive role for hERVs in the RIG-I-like pathway in ccRCC, we studied the association between hERV expression and genes in the RIG-I-like receptor signature (Molecular Signatures Database) (38), observing marked separation of genes into 2 groups by hierarchical clustering (Figure 2A). We defined 2 hERV groups (1 and 2; Supplemental Table 3) based on the ratio between each hERV's mean linear regression coefficients within each gene cluster (>1 or < 1) and validated their definitions using principal component analysis ( Figure 2B). While both groups demonstrated significant positive association between hERV expression and genes that activate the RIG-I-like pathway, group 2 hERVs demonstrated a significant positive association with several key antagonist genes downstream of NF-κB signaling (most notably NFKBIB), along with a significant ing of disease-free interval (DFI) (44). Of these metrics, DSS and PFI trended similarly to curves observed with overall survival, providing further evidence that these hERV signatures are specifically associated with disease burden (Supplemental Figure 15). We performed multivariable CoxPH modeling with clinical stage and with or without molecular subtype (M1-M4) and hERV signatures as predictors for patient outcome in TCGA KIRC. Comparing a full model against an all-but-one-feature model, all 3 signatures provided significant prognostic value in addition to stage and molecular subtype, with the RIG-I-like down signature contributing nearly as much prognostic power as traditional staging and each of the 3 signatures providing greater prognostic power than molecular subtyping ( Figure 4D and Supplemental Table 4). To establish whether these hERV signatures were prognostic in other tumors, we performed univariable CoxPH for each signature within all TCGA cancer types ( Figure 4E). Among these 3 signatures, BCR-associated hERVs were additionally prognostic in COAD and LGG, while RIG-I-like down hERVs were additionally prognostic in BLCA, COAD, KIRP, LGG, and LIHC, suggesting these additional cancer types may have hERV-immune microenvironment interactions similar to those in ccRCC. Included among these cancer types were KIRP and COAD, both of which were closely related to KIRC by hierarchical clustering of hERV expression patterns (Supplemental Figure 5), and LGG, which contained the second greatest number of prognostic hERVs after KIRC ( Figure 1E).
hERVs demonstrate evidence of tumor-specific presentation of targetable viral epitopes. Previous studies have identified select tumor-specific hERV epitopes in ccRCC that trigger in vitro antitumor responses with limited in vivo efficacy (35)(36)(37). Studies regarding neoantigens have suggested that a large number of potential epitopes are required for screening in order to identify a few clinically relevant peptides with significant in vivo antitumor efficacy (45)(46)(47)(48). We examined hERV expression patterns between tumors and matched normal tissue within TCGA KIRC and observed that normal samples clustered together (Supplemental Figure 16). The majority of hERVs were heavily upregulated in tumor compared with matched normal samples, leading us to hypothesize that there may be many more differentially expressed and targetable hERVs within tumor than previously described. In an attempt to expand the potentially targetable hERV epitope pool in ccRCC, we first ranked hERVs based on fold change in expression between tumor and matched normal samples (Supplemental Figure 17) (49). Notably, CT-RCC hERV-E (HERVERI/gammaretrovirus-like, desig nated as hERV 2256 in the reference database, also known as ERVE-4), one of the few hERVs demonstrated to be capable of eliciting a vaccine-inducible CD8 + T cell response, ranked second highest in tumor versus normal fold change in expression (35)(36)(37). This same hERV was previously described by  and was found to be significantly upregulated in ccRCC and associated with a signature of cytotoxicity (15). To ensure that our analyses were consistent with these previously published findings, we performed linear regression between CT-RCC hERV-E and IGS expression including the Rooney signature for cytotoxicity (CYT), and observed a significant association between expression of this hERV and the majority of IGS in our set, including CYT (Supplemental Figure 18). regions of conserved DNA similarity ( Figure 3B). NIH Retrovirus Protein BLAST (https://www.ncbi.nlm.nih.gov/genome/viruses/ retroviruses/) of these sequences showed similarity to known hERV env genes in 8 of 8 sequences, with additional similarity to other retroviral genes in 2 of 8 sequences. While suggestive of potentially targetable antigens within the hERV env region, CoxPH demonstrated significantly higher hazard ratios among BCRassociated compared with non-BCR-associated hERVs (Welch's t test P = 2.4 × 10 -3 ; Figure 3C). Differential expression analysis (DESeq2) of BCR-associated hERVs demonstrated a balanced proportion of hERVs with both higher tumor-to-matched normal and matched normal-to-tumor expression (tumor: n = 542; matched: n = 72; Figure 3D), suggesting an overall lack of tumor specificity among BCR-associated hERVs.
hERV signatures of innate and adaptive immune activation provides prognostic value in ccRCC. Currently, clinical stage is the most robust prognostic variable for ccRCC. While molecular features such as M1-M4 molecular subtyping have been shown to be potentially prognostic, no molecular markers have been widely adapted for clinical decision making in ccRCC, making identification of a robust molecular marker for prognosis an appealing goal (43). Throughout this study, we identified pools of hERVs with evidence of both RIG-I-like-mediated innate immune activation and inhibition, as well as B cell-mediated adaptive immunity ( Figure 4, A and B). To provide evidence that these classes can be used to generate a model of clinical outcome in ccRCC, we derived signatures corresponding to the mean expression of prognostic hERVs (CoxPH Bonferroni-corrected P ≤ 0.05) within each class. According to log-rank test, Kaplan-Meier overall survival curves for patients within the upper versus lower 50th percentiles for each of the 3 signatures were significantly different (RIG-Ilike upregulated [up]: P = 4.5 × 10 -10 ; RIG-I-like downregulated [down]: P = 6.3 × 10 -14 ; BCR-associated: P = 1.1 × 10 -5 ; Figure 4C). Patients with both higher expression of RIG-I-like down and BCRassociated signatures had significantly shorter overall survival, while those with higher expression of the RIG-I-like up signature had longer overall survival. Recent analyses also provided metrics for disease-specific survival (DSS) and progression-free interval (PFI) in TCGA KIRC, additionally with an underpowered report-provided evidence for translation of CT-RCC hERV-E in several human lymphoblastic cell lines but minimal translation in all other sets, including normal human tissues, suggesting that CT-RCC hERV-E had the capacity for translation within tumor-like tissues (Supplemental Figure 20). hERV 4700 (HERVERI/gammaretrovirus-like), which demonstrated the highest tumor versus normal expression by RNA-Seq, was identified as the most differentially expressed hERV with greatest evidence of translation. Additionally, hERV 4700 was expressed at low levels in matched normal tissues from all other tumor subtypes (Supplemental Figure 21) and demonstrated additional evidence of translation among GWIPS tumor cell line samples (Supplemental Figure 22). Although Ribo-Seq coverage of hERV 4700 within ccRCC samples was relatively low, coverage patterns were similar to those observed by RNA-Seq ( Figure 5B). Areas of coverage within the hERV 4700 proviral reference corresponded to viral gag (red), pol (blue), and env (green) genes. Protein-BLAST of these regions translated across each reading frame provided high sequence similarity with known reference hERV sequences across all 3 frames of pol and env, and frame 2 of gag ( Figure 5C and Supplemental Figure 23). Using the longest sequence identified within each protein reading frame, we performed NetMHCPan4.0 epitope prediction, identifying 30 predicted HLA-A*02:01 binders (binding affinity ≤500 nM; Supplemental Table 5) (52). To ensure these predicted epitopes were hERV specific, we searched for overlap between amino acid sequences of each peptide with known human proteins in the GENCODE hg19 protein-coding transcript translated sequences, observing no overlap between epitopes and non-hERV proteins. Using an HLA-A*02:01 monomer UV exchange assay and HLA ELISA readout (53-58), we validated the binding of 30 of 30 predicted epitopes to HLA-A*02:01 with exchange efficiencies ranging from 16.1% to 73.1% ( Figure 5D).
hERV epitopes associate with aPD1 response with evidence of epitope-specific T cells in ccRCC. To explore whether hERV 4700 expression is predictive for patient response to aPD1 therapy, we performed quantitative real-time PCR (RT-qPCR) quantification of hERV 4700 with 2 of each gag-, pol-, and env-specific primer/ probe sets on ccRCC tumor biopsy RNA in aPD1-treated patients (responders: n = 7, nonresponders: n = 6; Figure 5E and Supplemental Tables 6-8). We observed greater mean RT-qPCR signal in aPD1 responders in all primer/probe sets (Mann-Whitney U test P < 0.05; Supplemental Table 9), as well as hervQuant-derived hERV 4700 expression from the same set with added samples (responders: n = 10, nonresponders: n = 10; Mann-Whitney U test P = 0.0455), suggesting that transcription of hERV 4700 is associated with greater responsiveness to immunotherapy. Additionally, multivariable linear regression (GLM) provided perfect fit of primer/probe sets as a predictor for response. To demonstrate the presence of an anti-hERV 4700 T cell immune response in ccRCC, we performed tetramer staining of an HLA-A*02:01 ccRCC tumor sample using the 30 MHC tetramers described above (Figure 6, A and B). Using a stepwise approach, we first screened the tumor using 5 pools of 6 tetramers, which demonstrated that pool 4 had the largest tetramer-positive CD8 + T cell population (11.3% tetramer-positive). Running the 6 individual tetramers, we observed tetramers 2 and 3 to have the greatest staining, which corresponded to peptides derived from frame 2 of the gag (10.9% positive) and pol (13.5%) protein regions, respec- Similar to the pattern observed in CT-RCC hERV-E, hERVs that were overexpressed within tumors were ubiquitously positively associated with IGS, while those that demonstrated overexpression within matched normal tissue demonstrated a mixed association pattern (FDR-corrected P ≤ 0.05; Figure 5A), suggesting that preferential hERV expression in the tumor may facilitate immune activation. Interestingly, none of the top 10 hERVs by tumor versus normal expression were significantly associated with TCR/BCR clonotype expression or with survival. Given that (i) these hERVs were significantly associated with immune activation and (ii) there is evidence of functional epitopes and public hERV-specific T cells in at least one of these hERVs (CT-RCC hERV-E), the inability to computationally detect TCRs/BCRs significantly associated with these hERVs suggests we lacked the sensitivity necessary to identify these hERV-specific TCR/BCR clones. This lack of detectable public adaptive immune response is also characteristic of neoantigens, which despite failing to show association with TCR/BCR expression and survival in the absence of immunotherapy in ccRCC, have been recently demonstrated to provide vaccine-induced efficacy in melanoma (46,50).
Tumor-specific transcription is necessary for epitope generation but is not sufficient without downstream translation. Since the majority of hERVs are translationally inactive, we ran hervQuant on a publicly available Ribo-Seq dataset comprising several regions from 2 ccRCC and matched normal kidney nephrectomy samples (4 regions per tumor; 2 regions per matched normal) (22). To filter for hERVs with the strongest evidence of differential expression by both Ribo-Seq and RNA-Seq, we ranked hERVs by the sum of RNA-Seq and Ribo-Seq fold change in expression in tumor versus normal samples (Supplemental Figure 19). Despite evidence of translation in the literature, CT-RCC hERV-E did not demonstrate coverage by Ribo-Seq in this ccRCC dataset, suggesting the relative insensitivity of Ribo-Seq-compared with RNA-Seqbased hERV identification. However, analysis of the GWIPS database (51) containing aggregate data from >30 Ribo-Seq datasets from different publications (CD8_T_Cell, CD8_Cluster, CD8) (30,32,33), with CD8_T_Cell showing an association pattern different from the other 2 signatures. The CD8_T_Cell signature contained a set of 8 genes that accounted for its variation from the other 2 signatures -HAUS3 (cytokinesis and mitosis), SF1 (pre-mRNA splicing), SFRS7 (pre-mRNA splicing), ZNF91 (protein coding), ZNF609 (protein coding), THUMPD1 (gene expression/rRNA processing), MYST3 (histone acetyltransferase), and CDKN2A (cell cycle regulator) -all of which are nonspecific to CD8 + T cells in function (Supplemental Figure 25). Nevertheless, we included the CD8_T_Cell signature within all analyses (including Tregto-CD8 + ratio) because it remains a commonly used signature for CD8 + T cells within the literature.
In contrast to IGS, CoxPH analysis with UQN hERV data contained a greater number of positively prognostic hERVs compared with read-normalized data, suggesting that the proportional expression of hERVs may also influence overall survival. We additionally observed that the majority of hERVs were associated with younger patient age. Since most tumor types show an association between older age and worse outcome, and the majority of significantly prognostic hERVs were associated with worse outcome, these results suggest that the association between hERVs and patient outcome was not simply due to an association with age.
Due to the diverse tumor-immune interactions observed among different cancer types, we narrowed down further the role of hERVs upon the tumor immune microenvironment to one cancer type. We focused on ccRCC to further study the role of hERVs in shaping the tumor immune microenvironment because (i) it contained the greatest number of prognostic hERVs and (ii) hERV proteins are known to be expressed and immunogenic in ccRCC (14,(35)(36)(37).
Within ccRCC, we considered the potential for hERVs to impact both arms of the immune system. The role of hERVs in triggering an innate immune response is underscored by several recent reports noting that epigenetic-modifying agents that promote greater DNA demethylation -decitabine (methyltransferase inhibitor) and abemaciclib (CDK4/6 inhibitor) (26, 27) -increased expression of retroviral elements and triggered subsequent antitumor responses through innate sensor signaling, including induction of RIG-I-like pathway detection of viral dsRNAs. While these previous reports demonstrated only the proinflammatory nature of selected hERV elements, we were surprised to find two strikingly distinct patterns of association between hERV expression in ccRCC and expression of genes associated with the RIG-I-like family. The implication of this clustering pattern (along with the significantly different patterns of association between these hERV groups with survival and IGS expression) is that hERVs may play both agonistic and antagonistic roles in innate sensor immunity. Potentially, group 2 hERVs (RIG-I-like down) may interfere with RIG-I-like signaling through a currently unknown mechanism, ultimately skewing the tumor immune microenvironment in favor of an immunosuppressive phenotype with greater Treg-to-CD8 + T cell ratios and negatively impacting patient prognosis.
Next, we studied the role of hERVs in triggering an adaptive immune response through hERV-mediated immune activation of retroviral epitope-driven T and B cell responses. MiXCR analysis of TCGA KIRC failed to identify TCR clones that were shared across at least 10% of samples, suggesting that while hERV tively. We validated the presence of these T cell populations in 3 additional ccRCC tumors (gag: 10.9%-24.8%; pol: 13.5%-22.3%), as well as observing staining within the range of negative control tetramers in 4 healthy donor peripheral blood mononuclear cells (PBMC) samples (gag: 0.12%-1.51%, pol: 0.13%-0.76%; Figure 6C and Supplemental Figure 24). Overall, these data validate our epitope prediction method and provide evidence for the presence of hERV 4700-specific T cells within ccRCC.

Discussion
We report here a hierarchical analysis of hERV-immune microenvironment interactions within the TCGA pan-cancer dataset, integrated with Ribo-Seq data, RNA-Seq data from immunotherapytreated patients, and functional biological assays, to provide insight into hERV immunobiology in cancer. Our broad survey of hERV expression and association patterns provided multiple lines of evidence that hERVs shape the tumor immune microenvironment in several cancer types. Conditioning on cancer type, we observed that gene signatures of immune responsiveness (aPD1-responsive signature, effector immune cells) were positively associated with hERV expression, suggesting that hERVs may either directly interact with antitumor immunity through immune activation or provide a biomarker for an active antitumor immune response. In agreement with this view, we observed that hERVs were significantly prognostic in multiple cancer types, with the greatest enrichment of prognostic hERVs observed in ccRCC. Interestingly, BLCA was the only cancer type in which greater average hERV expression resulted in significantly longer survival times. This finding suggests potentially different hERV-mediated tumor immunobiology in BLCA and should be further explored in future studies. For IGS and CoxPH analyses, hERV expression data were normalized either (i) to total RNA-Seq read count (reads per million; RPM) to determine the impact of absolute hERV expression or (ii) to upper quartile normalization (UQN) of hERV reads within each sample to determine the impact of relative hERV proportions (Supplemental Tables 10 and 11). IGS patterns of association were strongly conserved between hERV expression by UQN and read normalization. We observed variability in hERV association patterns with 3 CD8 + T cell signatures derived both be potentially targeted for immune activation through the use of nonspecific (e.g., checkpoint blockade therapy, innate immune agonists) or epitope-specific (vaccination, adoptive T cell therapy) immunotherapies. Further time-course immune profiling studies should be performed to study the mechanisms of hERV-mediated immune surveillance in a developing tumor.
With evidence of hERV-mediated activation of both innate and adaptive immune responses, we sought to examine whether these responses could be used to develop a model for patient prognosis in ccRCC. Apart from molecular subtyping, no molecular markers have improved the prognostic capabilities of current clinical predictive systems in ccRCC, suggesting the potential for development of hERV-based signatures as a biomarker for survival. In attempt to identify such a prognostic biomarker, we created hERV signatures derived from our previous analysis of hERV interactions with the innate and adaptive immune response. Based on these signatures, we developed a model that provided significantly greater prognostic power than M1-M4 molecular subtyping and levels of prognostic information similar to those of traditional clinical staging. Additionally, while these hERV signatures were derived and optimized for ccRCC, we showed 2 signatures to provide prognosis in several other tumor models related to ccRCC by hERV expression patterns, level of prognostic hERVs, and tissue of origin, implying that additional hERV signatures for patient prognosis can be independently developed for other cancer types.
Last, we sought to develop a screening method for detection of hERVs actively undergoing translation. The implication of such a tool is the potential for development of immune response biomarkers and antitumor T cell vaccine therapies, similar to those developed in neoantigen-based vaccine studies. Our analysis of tumor-specific hERVs in ccRCC identified CT-RCC hERV-E as the second highest differentially expressed hERV by RNA-Seq expression. This particular hERV has been well described in the literature as a ccRCC tumor-specific provirus with evidence of hERVspecific T cell responses (35)(36)(37). Within our Ribo-Seq analysis, we were underpowered to detect evidence of CT-RCC hERV-E translation among 2 ccRCC samples. However, our analysis of the GWIPS database provided evidence for the translation of CT-RCC hERV-E in human tumor cells but not in normal blood, fibroblasts, or muscle tissue. This conforms to the view that CT-RCC hERV-E has the capacity for translation under tumor-specific conditions and suggests that deeper Ribo-Seq coverage in ccRCC may be needed to increase the sensitivity of our computational screening to broaden the set of potentially targetable hERV epitopes. Our analysis of CT-RCC hERV-E RNA-Seq expression in TCGA KIRC data supports the previous report by Rooney et al. identifying this hERV as being upregulated in ccRCC and associated with a gene expression index of cytotoxicity (15). We observed the same significant association with their cytotoxicity signature and additionally identified a large proportion of other IGS strongly associated with its expression. Among these, the most significantly associated was the Treg signature, suggesting that expression of CT-RCC hERV-E may be also associated with immunosuppression. This strong association with immunosuppressive signatures suggests CT-RCC hERV-E may be another potential marker of response for immunotherapies such as aPD1 checkpoint blockade therapy. epitopes have the capacity to trigger a T cell-driven antitumor response (35)(36)(37), we lacked the sensitivity to computationally identify public hERV-specific TCR clones. In agreement with this, comparison of MiXCR-derived TCR expression with previously described TCRs derived from amplicon-based adaptive TCR repertoire profiling in 3 TCGA KIRC samples demonstrated low total TCR counts of MiXCR data with low frequencies of overlapping clones (Supplemental Figure 26). In contrast, we observed a large pool of shared BCRs. It is important to note that BCR repertoires are likely more completely sampled from RNA-Seq data than are TCR repertoires, as we observed increased BCR sequence reads, consistent with the greater transcription of immunoglobulin mRNA from cells of the B cell lineage compared with TCR mRNA transcription from activated T cells. Thus, our study had greater power to detect BCR than TCR repertoire associations. Multiple sequence alignment of BCR-associated hERVs demonstrated clustering of proviral sequences by superfamily, suggesting that a B cell response generated against shared hERV epitopes is likely to occur within one or several closely related superfamilies. The higher hazard ratios among BCR-associated hERVs may be related to the lack of tumor specificity for these hERVs. The majority of IGS in ccRCC, including those for B cells, have been shown to be associated with worse prognosis (59). While the mechanism for this finding is currently undetermined, a potential contributor to this pattern may be a B cell response in which hERVs are generated in the tumor with epitopes shared by hERVS upregulated within the surrounding normal tissues. Further investigation should be performed to study the importance of this potential anti-hERV B cell response in ccRCC.
Evidence for hERV-mediated activation of the innate and adaptive immune responses suggests that expression of these proviruses within tumors may contribute to immune editing of tumor cell populations. Highly immunogenic hERVs with the capacity to be recognized by endogenous T and B cell responses are likely cleared by the immune system or otherwise expressed under a heavily immunosuppressed microenvironment. There may also exist additional hERV epitopes that generate immune responses too weak to promote antitumor immunity. These two groups can the BCR-associated signature and 34 with all prognostic hERVs, suggesting relatively low overlap between the set of predictive and prognostic hERVs. Overall, hervQuant is the first described method to our knowledge for comprehensive identification of potentially targetable hERV epitopes. Further validation should be performed to confirm the capacity of these potential hERV epitopes as therapeutic vaccine targets and to develop a robust hERV-based biomarker for immunotherapy response in ccRCC.
In summary, we describe a computational workflow, hervQuant, for robust quantification of individual hERVs using RNA-Seq data. The data gained through hervQuant provide insights into the pan-cancer landscape of hERV expression and immune modulation. Within ccRCC, we found a distinct group of hERVs that were inversely associated with RIG-I-like signaling genes, prognosis, and IGS expression. Additionally, we examined the interaction between hERV expression in ccRCC and activation of B cell clonotypes, and demonstrated the capacity of the above-mentioned hERV classes to provide a multivariable model of patient prognosis that significantly outperforms traditional clinical staging and molecular subtype prognosis models in ccRCC. We provide evidence for a new method of hERV epitope prediction based on differential hERV expression in the tumor, Ribo-Seq screening for translation, computational epitope prediction, in vitro validation for HLA binding, and in vivo detection of epitope-specific T cells in a ccRCC tumor. Importantly, we observed that hERV sequences identified through this approach were significantly associated with aPD1 responsiveness in ccRCC tumors, supporting continued research into hERVs as biomarkers and therapeutic targets for immunotherapy. With the recent increasing interest in the role of hERVs in modulating the tumor immune microenvironment, we believe the work presented here substantially expands our understanding of hERV biology and opens the way for future development of technologies to exploit hERV biology for new therapeutic tools.

Methods
Alignment and quantification of hERV expression from RNA-Seq data. hERV genomic coordinates were derived from a previously a published study by Vargiu et al. (3). Full-length hERV sequences were masked for low complexity reads (9 or more repeating single nt; 7 or more repeating double nt; 4 or more repeating nt patterns of 3; 3 or more repeating nt patterns of 4; 2 or more repeating patterns of 5; 2 or more repeating nt patterns of 5) and compiled alongside human hg19 transcriptome reads into a reference file for downstream alignment. RNA-Seq FASTQ files were aligned to the hERV reference using STAR v2.5.3 (multimaps ≤10, mismatch ≤7) (60). BAM output files were filtered for reads that mapped to hERV reference using SAMtools (v1.4) (61), then quantified using Salmon v0.8.2 (Quant mode, -1 ISF) (62). Raw expression matrices were either normalized to hERV counts per million total FASTQ reads and log 2 transformed, or normalized to the upper quartile hERV expression value among non-zero values within each sample and log 2 transformed (Supplemental Tables 12-14). Only TCGA pan-cancer samples sequenced with Illumina HiSeq 2 × 50 bp were analyzed. See the supplemental material for optimization details and input parameters.
RNA-Seq expression, IGS analysis, and survival analysis. MapSplicealigned, RSEM-quantified RNA-Seq expression matrices and survival data were downloaded from FireBrowse (http://firebrowse. RNA-Seq analysis of hERV 4700 demonstrated preferential expression within ccRCC, with modest expression in normal kidney and liver. This preferential expression underscores the potential for hERV 4700-targeted immunotherapies, with the caveat that a particularly robust anti-hERV 4700 immune response could potentially result in on-target/on-tissue and on-target/off-tissue toxicity. We provided additionally validation for the transcription of this hERV through RT-qPCR and hervQuant analysis of an aPD1-treated ccRCC dataset, and showed that expression of hERV 4700 is associated with responsiveness to immunotherapy. Ribo-Seq screening provided evidence for translation of hERV 4700, supporting translation of epitopes that we further validated to bind MHC. Additionally, tetramer staining of predicted hERV 4700 epitopes in 4 ccRCC tumors demonstrated the presence of infiltrating T cells with receptors specific for gag-and pol-derived epitopes, supporting the idea that (i) hERV 4700 may act as a direct target in ccRCC, whereby aPD1 could trigger an antitumor response against hERV 4700-derived epitopes, and (ii) hERV 4700 expression may be a new biomarker of aPD1 responsiveness in ccRCC. These same T cell populations were scarce to absent in healthy donor PBMCs, confirming the specificity of these T cells in ccRCC tumors. Tetramer-specific T cell frequencies were particularly high among ccRCC tumors (NSWQEMPV, 10.9%-24.8%; MVFPW-PRPV, 13.5%-22.3%), suggesting that as much as 40% of tumorinfiltrating CD8 + T cells may be specific for these 2 hERV 4700 epitopes. We recognize that these frequencies are particularly high for a tumor-infiltrating population, and several caveats exist for our analyses. First is the potential for T cell cross-reactivity against these tetramers, as well as peptide impurities that recognize other infiltrating T cell populations. Additionally, tetramer-positive populations contained a large range of fluorescence intensities, suggesting these T cells do not necessarily comprise a single clone but likely several different clones with different TCR affinities. Future studies to characterize the TCR sequences and phenotypic characteristics of these tetramer-positive populations should be performed to further elucidate the role of these populations and determine the basis for these and other potential caveats.
A minimum of 1,000,000 events were collected for each sample on a BD LSRFortessa flow cytometer. FlowJo flow cytometry software version 10 was used for analyses of all flow cytometric data. Tumors were derived from viably frozen nephrectomy samples from UNC Chapel Hill and Vanderbilt University hospital patients with clear cell histology. Healthy donor PBMCs were screened by and purchased from Gulf Coast Regional Blood Center, Houston, Texas, USA.
Data availability. TCGA analyses were performed on data collected and generated by the TCGA Research Network -expression matrices can be accessed at http://firebrowse.org/; TCGA raw data can be accessed in the database of Genotypes and Phenotypes (dbGaP, accession phs000178). Ribo-Seq analysis was performed on data collected by Loayza-Puch et al. and can be accessed in the NCBI's Gene Expression Omnibus database (GEO GSE59821) (22). hervQuant expression matrices for TCGA pan-cancer (UQN and RPM) and aPD1-treated ccRCC (raw reads) RNA-Seq datasets are available in Supplemental Tables 12-14. The GWIPS ribosomal profiling database is available at https://gwips.ucc.ie/. The hervQuant workflow reference and instructions are available for download at https://unclineberger.org/vincent/resources.
Statistics. GLM using the R "glm" package was used for all univariable regression, unless otherwise stated. Univariable and multivariable CoxPH was performed with the R "survival" package. Multiple sequence alignment was performed with Clustal Omega through the R "msa" package (64). Differential hERV expression was calculated using the DESeq2 R package (49). For all CoxPH analyses, P value correction was performed using Bonferroni's correction to maintain a conservative cutoff of significance. For all other analyses, 5% FDR multiple testing correction for P values was performed unless otherwise stated. Welch's t test was performed for statistical calculation in Figure 3C. Log rank test was performed for statistical calculation in Figure 4C, with no multiple testing correction. Multivariable CoxPH and χ 2 test were performed for statistical calculation in Figure 4D, with no multiple testing correction. Mann-Whitney U test was performed for statistical calculation in Figure 5E, with no multiple testing correction. P < 0.05 was considered significant for all statistical tests performed.
Study approval and sample acquisition. The present studies in humans were reviewed and approved by the Vanderbilt University Human Research Protections Program, and the University of North Carolina at Chapel Hill IRB and the Office of Human Research Ethics (CB 7097). Subjects provided written informed consent prior to their participation in the study. Biopsy samples were collected according to a protocol approved by the Vanderbilt University IRB (no. 160979), and the UNC IRB approved the biorepository protocol (LCCC 1212). Patients were identified through an IRB-approved protocol and identified using a pharmacy-based list. Line org/). Expression matrices were merged between all cancer types, upper quartile normalized within each sample, and log 2 transformed. IGS were derived from previously described signatures (28)(29)(30)(31)(32), with expression calculated as the mean expression of each gene within the signature. TCGA LAML samples were omitted from analysis in order to prevent skewing of IGS patterns.
TCR/BCR alignment. MiXCR (v2.1.1) was used for identification of TCR and BCR sequences with TCGA KIRC (41). Following suggested run methods provided by MiXCR's documentation for RNA-Seq data (https://mixcr.readthedocs.io/en/latest/rnaseq.html), paired-end FASTQ files were run through alignment in RNA-Seq mode, 2 rounds of contig assembly, extension of incomplete CDR3s, assembly, and export. Data were subsequently converted into an expression matrix, dropping all clones (defined as conserved amino acid CDR3 sequence) with expression in fewer than 10% of all TCGA KIRC samples, and scaled to counts per billion total FASTQ reads.
HLA-A*02:01 monomer UV exchange and β2-microglobulin ELISA. Epitope prediction was performed with the NetMHCpan 4.0 Server interface, defining predicted HLA binders as those with binding affinity ≤500 nM (52). Predicted hERV epitopes were synthesized through New England Peptide array technology. Monomer exchange reaction was carried out using the BioLegend Flex-T HLA-A*02:01 monomer UV exchange protocol (57). Peptide exchange efficiency was performed using the BioLegend HLA class I ELISA protocol (58).
RT-qPCR validation of hERV 4700. Expression levels of hERV 4700 were assessed by RT-qPCR in a collection of ccRCC formalinfixed, paraffin-embedded (FFPE) archival tissue from responders (n = 7 patients; 9 samples) and nonresponders (n = 6 patients; 6 samples). RT-qPCR was performed on all available samples, with no further selection process. Total RNA isolation was performed using the RNAeasy FFPE Kit (QIAGEN). DNAse treatment was performed during RNA isolation using RNase-free DNase I (QIAGEN). RNA quality and concentration were assessed using a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies).
First-strand cDNA synthesis was performed using 250 ng total RNA, random hexamers, and the SuperScript IV Reverse Transcriptase Kit (Life Technologies). RT-qPCR was performed on a CFX96 Touch Real-Time PCR Detection System (Bio-Rad) using TaqMan Universal PCR Master Mix (Applied Biosystems). RT-qPCR primer and probe sequences are shown in Supplemental Table 7. All analyses were performed in triplicate, and relative RNA levels were determined using hypoxanthine phosphoribosyltransferase 1 (HPRT1) as an endogenous internal control (Applied Biosystems, catalog 4333768). A HeLa control RNA sample was included for inter-plate calibration. hERV 4700 expression levels were calculated using the ΔΔCt method. Expression levels for 2 sample pairs derived from the same patients were averaged for statistical analyses in Figure 5E.