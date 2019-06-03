Kinetics of human cell development in humanized mice and the histology of grafted thymi. To study the formation of the human thymic and peripheral TCR repertoire, we generated 3 batches of mice. As shown in Figure 1A, the first experiment consisted of 3 mice (1autoA, 1autoB, and 1autoC) that were generated by transplantation with the same fetal liver HSCs and autologous fetal thymus. Therefore, they had the same genetic background, and selection took place in the same thymus (Figure 1A). The second batch consisted of 6 mice that were generated by transplantation with the same fetal liver HSCs (different from experiment 1). Mice designated 2autoA, 2autoB, and 2autoC received an autologous fetal thymus, while the mice designated 2alloA, 2alloB, and 2alloC received an allogeneic fetal thymus, so that thymic selection occurred in a different thymus, whereas the thymocyte genetic backgrounds were the same as those of the other 3 mice (Figure 1B). Experiment 3 consisted of 2 mice (2autoA and 2autoB) that were thymectomized and transplanted with the same fetal liver HSCs and autologous fetal thymus. We also analyzed and sequenced peripheral CD4+ and CD8+ cells in these mice, in addition to thymic single-positive CD4+ (SP-CD4) and SP-CD8 cells. The mice in the first experiment were euthanized 14 weeks after transplantation, whereas the mice in the second and third experiments were euthanized 20 and 22 weeks after transplantation, respectively. Supplemental Figure 1A (supplemental material available online with this article; https://doi.org/10.1172/JCI124358DS1) shows the gross appearance of the spleen, lymph nodes (LNs), and the grafted thymus under the kidney capsule of a representative humanized mouse at the time of harvest. Supplemental Figure 1B shows H&E staining of a representative grafted thymus and a thymus from a 13-year-old child. Cortical (hypercellular) and medullary (hypocellular) areas and Hassall’s corpuscles in the medullary areas are noticeable in the H&E stains. Immunofluorescence staining of a representative grafted thymus and a thymus from a 13-year-old child stained for HLA-DR, cytokeratin 8 (CK8) and CK14 is shown in Supplemental Figure 1C. HLA-DR+ cells that were not stained for CKs were HSC-derived antigen-presenting cells (APCs) that were mainly concentrated in medullary areas. We further characterized the APCs in grafted thymi by flow cytometric analysis (FCM). B cells (CD19+), monocytes (CD14+), and DCs (CD11c+) collectively constituted approximately 30% of the double-negative (CD4–CD8–) cells in grafted thymi (Supplemental Figure 1D).

Figure 1 Experimental design and clonality scores. (A–C) Generation of humanized mice for experiments 1, 2, and 3. Cell populations were sorted for sequencing at 14, 20, and 22 weeks after transplantation, respectively. (D–F) Clonality scores for cell populations in experiments 1, 2, and 3 at the nucleotide nonproductive (Nt nonproductive), Nt productive (Nt productive), and aa levels (mean ± SEM, except for experiment 3, which shows individual animals). Paired t tests were performed to compare the clonality of each sequence set within each cell population. Paired t tests with Bonferroni’s multiple testing correction were performed to compare different cell populations in experiment 2. *P < 0.05 and **P < 0.01, by paired t test (paired by mouse, with Bonferroni’s multiple-testing correction). P. CD8, peripheral CD8+; P. CD4, peripheral CD4+. (G) Scores for aa clonality of grafted thymi and the original autologous thymus in experiment 2. (H) Expression of TdT in DP thymocytes of fetal (n = 3, gestational ages of 17, 20, and 21 weeks), postnatal (n = 4, age 4 months, 6 months, 13 years, and 17 years), and grafted human thymi in humanized mice (n = 3, at 18, 26, and 33 weeks after transplantation). *P < 0.05, by unpaired t test.

The kinetics of the peripheral appearance of human immune cells (hCD45+), B cells (CD19+), and T cells (CD3+), as well as the T cell naive/memory phenotype are shown in Supplemental Figure 1, E–H. The majority of T cells in peripheral blood at weeks 14–16 were naive.

Our method for constructing humanized mice included several measures to eliminate preexisting thymocytes and their progeny from the transplanted fetal thymic tissue. These measures included freezing and thawing the thymus tissues as described previously (16), pipetting up and down to physically release thymocytes, and injecting 2 weekly doses of a depleting anti-CD2 antibody as described previously (16). To assess the role of cells carried in the thymic tissue in producing peripheral and intrathymic T cell populations in this model, we generated a batch of mice with allogeneic fetal HSCs and thymus tissue. The fetal thymic cells were HLA-A3–, whereas the fetal HSCs were HLA-A3+. Twenty-four weeks after transplantation, we euthanized the animals and evaluated the origin of T cells in grafted thymi and peripheral lymphoid tissues. Approximately 3% of double-positive (DP) and SP-CD8 thymocytes and 2% of SP-CD4 cells were thymus graft derived (HLA-A3–) (Supplemental Figure 1I). Approximately 0.5% of CD4+ and CD8+ cells in the spleen were thymus graft derived (Supplemental Figure 1J). Therefore, the majority of T cells in the grafted thymi and spleens of these animals were derived from the HSCs that were given intravenously.

Effect of selection on diversity. The cell counts of grafted thymi in addition to the sorted cell numbers are summarized in Supplemental Table 1. For each sample, we obtained template counts, clonality scores, and unique clone counts at the nucleotide level (for both productive rearrangements and nonproductive rearrangements that include frame shifts or premature stop codons) and the aa level. These data are shown in Supplemental Table 1. Template counts for CD69– DP cells were lower than expected from the number of cells, probably reflecting the rearrangement of TCRβ after acquisition of the DP phenotype in a significant fraction of cells (17). Clonality (a normalized measure of inverse diversity based on CDR3β sequences) in all thymic samples was very low, demonstrating production and selection of a highly diverse repertoire in the human thymus grafts. Clonality scores are typically much higher for both CD4+ and CD8+ T cells in human peripheral blood, most markedly for CD8+ T cells, presumably reflecting antigen-driven expansions (18). Accordingly, clonality of peripheral CD4+ and CD8+ cells was markedly higher than that of thymic SP-CD4 and SP-CD8 cells in experiment 3 (Figure 1F). Although only some differences achieved statistical significance, all thymocyte subsets (CD8+ SP, CD4+ SP non-Tregs, CD4+ Tregs) showed increased clonality scores for aa compared with nucleotide sequences (Figure 1, D and E, and Supplemental Table 1). Collectively, these results show the effect of selection on narrowing the TCR repertoire, since selection is applied to TCR protein and multiple productive nucleotide sequences can produce the same peptide sequence.

In experiment 2, in which CD69– and CD69+ DP thymocytes were sequenced in addition to the 3 SP subsets, each animal showed very low clonality scores for the CD69– DP cell population. With the exception of 1 animal (2alloB), we were unable to detect a positive selection–induced increase in clonality in the CD69– to CD69+ transition. However, a comparison of CD69– DP cell populations and all 3 SP cell subsets (SP-CD4 non-Treg [referred to hereafter as SP-CD4 for simplicity], SP-CD8, and CD4+ Tregs [referred to hereafter as Tregs]) revealed an increase in aa sequence clonality (Figure 1E and Supplemental Table 1). Collectively, these data demonstrate a narrowing of the T cell repertoire due to thymic selection.

Compared with the original fetal thymus, clonality scores were lower in the grafted thymi for SP-CD8 and Treg populations (Figure 1G), demonstrating greater diversity of the thymocytes generated from human HSCs used to construct humanized mice than in the original fetal thymus, which had a gestational age of 17 weeks, when thymic development and generation of a fully diversified repertoire is not complete. It has been previously reported that the TCR repertoire of mouse neonates (day 1 after birth) is much narrower than that of adult mice because of the lack of random nucleotide insertions in the CDR3s (19). As shown in Figure 1H, TdT was not expressed in DP cells of fetal human thymus. However, it was expressed in DP cells of postnatal thymi as well as grafted human thymus in humanized mice, thus explaining the greater diversity of TCR repertoires in grafted human thymi in our study compared with that in the fetal human thymus.

Role of stochastic rearrangement and selection in TCR repertoire formation. To obtain an understanding of the impact of stochastic TCR rearrangement versus background genetics on the TCR repertoire, we compared repertoires generated under the same conditions from the same progenitor pool across identical as well as allogeneic, extensively HLA-mismatched (Supplemental Table 2) thymi by measuring the Jenson-Shannon divergence (JSD) and the number of shared CDR3β TCR sequences, as the shared CDR3β fraction quantifies sharing of unique sequences and JSD additionally accounts for the frequency of shared sequences. In experiment 1, even though all 3 mice received the same HSCs and thymus from the same human fetal donor, their TCR repertoires at the level of CDR3β were highly divergent at both the nucleotide and aa levels (Figure 2A). In experiment 2, in which 6 mice received the same HSCs, with 2autoA, 2autoB, and 2autoC mice receiving autologous thymus tissues and 2alloA, 2alloB, and 2alloC mice receiving allogeneic thymus tissues, we observed a similarly high divergence among all thymi in different cell populations (Figure 2B). Furthermore, there was no difference in divergence between pairs of mice whose T cells developed in the same thymus versus those whose T cells developed in the allogeneic thymi (Supplemental Table 3). In addition, the observed divergence between mice for both aa and nucleotide repertoires was significantly higher than the baseline generated from repeated undersamplings of identical repertoires for all thymic subpopulations, as determined by both JSD (Supplemental Figure 3A) and the shared CDR3β fraction (Supplemental Figure 3B). All of these findings emphasize the highly stochastic nature of TCR repertoire formation at the level of CDR3β.

Figure 2 Repertoire divergence between animals in each experiment. (A–C) JSD scores at nucleotide and aa levels for each cell population in experiment 1 (n = 3 comparisons), experiment 2 (n = 15 comparisons), and experiment 3 (n = 1 comparison), respectively. JSD scores for each possible pair of mice were calculated and are presented as box-and-whisker plots, which show the median, range, and interquartile range, as well as outliers (except experiment 3, for which only 1 comparison per cell subset is shown, because there were only 2 mice). (D) JSD aa scores across different cell populations in experiment 2 for all sequences versus the 100 most frequent sequences. (E) JSD aa scores for TCR repertoires of different cell populations from grafted thymi of the 6 mice in experiment 2 compared with the original autologous fetal thymus. *P < 0.05, **P < 0.01, and ***P < 0.001, by paired t test with Bonferroni’s multiple testing correction for all comparisons.

In all experiments, and in every thymocyte and peripheral subset, the divergence was lower at the aa level compared with the nucleotide level, and the JSD decreased for selected (CD69+ DP and SP cell populations) compared with unselected (CD69– DP) cell populations at the aa but not the nucleotide level (Figure 2, A–C). This finding suggested that, despite the stochastic nature of repertoire formation, thymic selection in identical thymi results in selection of some shared sequences between individuals.

We compared the fraction of CDR3βs that were shared between every possible pair of mice for each thymocyte population. As shown in Figure 3, the fraction of shared CDR3βs between paired mice in all 3 experiments was less than 4% of all thymic TCRβs for each mouse. The proportion of shared CDR3βs was highest at the aa level and also was higher at the productive/nucleotide level compared with the nonproductive/nucleotide level (Figure 3, A–C, and Supplemental Table 4). In addition, the proportion of shared CDR3βs increased significantly during transition from the CD69– DP to the CD69+ DP stage, indicating positive selection for these shared CDR3βs. The proportion of shared CDR3βs further increased for the sorted mature (CD3hiCD5hi) SP-CD4 and -CD8 cell populations, which had completed positive selection and partially undergone negative selection, compared with the positively selected CD69+ DP cell population (Figure 3B). CDR3β sharing was even higher in peripheral CD4+ and CD8+ cell subsets compared with thymic SP-CD4 and SP-CD8 samples in experiment 3 (Figure 3C), possibly due to repertoire narrowing via completion of negative selection and post-thymic selection and expansion of certain clones. Pairwise divergence analyses by JSD yielded similar results, with significant decreases in DP-CD69+ compared with DP-CD69– cells and further decreases in the mature SP cell populations (SP-CD4, SP-CD8) (Figure 2B) and peripheral CD4+ and CD8+ cell populations (Figure 2C). Together, these findings demonstrate that both positive and negative selection of human thymocytes increases CDR3β overlap between individual T cell repertoires. However, the highest proportion of sequences shared between 2 replicate thymic repertoires found in the SP-CD4 subset accounted for only 3.5% of the repertoire.

Figure 3 Proportion of shared CDR3βs between animals and experiments. (A–C) Box-and-whisker plots (dot plot for experiment 3 because of the smaller sample size) comparing proportions of shared CDR3βs between each asymmetric mouse pair in experiment 1 (n = 6 comparisons), experiment 2 (n = 30 comparisons), and experiment 3 (n = 2 comparisons) for each cell population at the nucleotide nonproductive, nucleotide productive, and aa levels. (D) Box-and-whisker plot distributions of the proportion of shared CDR3βs comparing all versus the top 100 sequences by frequency in experiment 2. (E) Comparisons in both directions between each pair of mice in experiment 2, depending on whether the mice received the same (autologous thymus, n = 12 comparisons) or a different thymus (allogeneic thymus, n = 18 comparisons). (F) Distributions of the proportion of shared CDR3βs between each pair of mice within and across experiments. Exp, experiment. (G) Ratio of unique CDR3β nucleotide sequences per aa sequence (Nt/aa ratio) in shared versus unshared sequences for each pair of mice in experiment 2. Supplemental Table 5 shows the mean Nt/aa ratio for each subset and P values comparing different subsets. Box-and-whisker plots show the median, range, interquartile range, and outliers. ***P < 0.001, by paired t test with Bonferroni’s multiple testing correction.

As another readout of the effect of selection, we compared aa sequence convergence of nucleotide sequences for shared and unshared CDR3βs. For each pair of mice in experiment 2, we measured the number of unique CDR3β nucleotide sequences corresponding to each aa sequence shared between the same cell populations in both mice (shared) compared with the number of unique nucleotide sequences corresponding to each aa sequence present in at least 1 of the mice but not shared between both (unshared). Although the average nucleotide-per-aa sequence ratio was close to 1 for unshared sequences, it was significantly higher for the shared CDR3βs in all cell populations, indicating preferential selection of shared aa sequences (Figure 3G and Supplemental Table 5). Within the population of shared CDR3βs, this ratio was significantly higher in DP-CD69+ cells compared with that in DP-CD69– cells and in SP cell populations (except Tregs) compared with the ratio in DP-CD69+ cells, indicating selection for the shared sequences (Supplemental Table 6).

The JSD between different mice was lower among the 100 most frequent sequences compared with all sequences for each of the 5 selected and nonselected cell populations, indicating greater overlap among the more abundant sequences (Figure 2D). Consistently, the fraction of CDR3β sequences overlapping between animals was greater among the top 100 sequences compared with the entire cell population (Figure 3D). Although this finding may reflect the greater likelihood of detecting abundant sequences in general, it is also consistent with the possibility that the shared CDR3βs are preferentially selected.

Surprisingly, for the 5 different selected and nonselected cell populations, the proportion of shared CDR3βs was not different between mice with allogeneic versus autologous thymi (Figure 3E). Furthermore, we detected no dramatic increase in shared CDR3βs among mice within an experiment compared with those between experiments, despite the different genetic backgrounds of the HSCs and thymi used to generate the T cells in each experiment (Figure 3F). In addition, different cell subsets in allogeneic and autologous thymi in experiment 2 had similar divergences compared with the original autologous fetal thymus (Figure 2E). Supplemental Tables 7 and 8 show the numbers of unique and shared CDR3βs between the 3 mice that were generated by transplantation with the same thymus and HSCs in experiment 1 and the 6 mice transplanted with allogeneic and autologous thymi in experiment 2, respectively. Supplemental Table 9 shows the total number of shared and nonshared CDR3βs for each cell population in each experiment at both aa and nucleotide levels. Consistent with the results described above, the number of overlapping CDR3βs increased as the selection progressed, and we detected dramatically larger numbers of overlapping CDR3βs at the aa sequence level compared with numbers at the nucleotide sequence level.

In order to address variable template counts across samples, we validated the results of the repertoire divergence analysis by randomly subsampling each sample to the same template count and then repeating the analysis. With 3 subsamples of 1000 templates each, we observed the same trends as in whole-sample comparisons with regard to the shared CDR3β fraction (Supplemental Figure 3C). Specifically, we observed consistently increased sharing at the aa level compared with the nucleotide level and increased sharing in DP-CD69+ samples compared with DP-CD69– samples, and in SP compared with DP samples. Therefore, our results are stable across random subsamples of the data, regardless of the variable sample sizes (Supplemental Figure 3C).

Shared CDR3βs have a shorter length due to fewer N insertions than do unique CDR3βs and often use different V genes. Further characterization of the CDR3βs that were shared between any 2 thymi versus those that were detected in only 1 thymus (unique sequences) revealed that the shared CDR3βs were significantly shorter than the unique CDR3βs. The shared CDR3βs had an average length of approximately 40 nucleotides, whereas the unique CDR3βs had an average CDR3β length of approximately 44 nucleotides (Figure 4A). The number of inserted nucleotides at V-D and D-J junctions was significantly lower for the shared CDR3βs compared with numbers for the unshared CDR3βs (Figure 4B). As the number of V and J nucleotide deletions was slightly higher in unshared CDR3βs (Figure 4, C and D), the shorter length of shared CDR3βs was thus attributable to the lower number of nucleotide insertions in these sequences. The shorter length of shared CDR3βs did not simply reflect the fact that they tended to be relatively abundant, as the average CDR3 length of the 1000 most abundant CDR3βs overall was significantly greater than that of the 1000 least abundant CDR3βs (Figure 4E and Supplemental Table 6). However, each animal showed CDR3β shortening as selection progressed from the CD69+ DP to the SP stage among the 1000 most abundant sequences, but not among the least abundant sequences (Figure 4E and Supplemental Table 10), indicating a selective preference for shorter shared CDR3βs. Shortening of CDR3βs continued further in the transition from thymic SP to peripheral CD4+ and CD8+ cells (Figure 4F).

Figure 4 Characteristics of shared versus unshared CDR3βs. (A) Nucleotide length distribution of shared versus unshared SP-CD4 CDR3βs in experiment 2. (B) Number of nontemplate nucleotide insertions at V-D plus D-J junctions for shared versus unshared SP-CD4 CDR3βs. (C and D) Number of nucleotides that are deleted from the 3′ end of V genes and the 5′ end of J genes at V-D and D-J junctions of SP-CD4 CDR3βs, respectively. (E) Distribution of combined (all 6 mice in experiment 2) CDR3β length for the 1000 most frequent CDR3βs and the 1000 CDR3βs with the lowest frequencies across different thymic cell populations. The Supplemental Table 6 shows P values comparing different cell subsets (unpaired t test). (F) Nucleotide CDR3β length for all thymic and peripheral T cell subsets in experiment 3. (G) Proportion of shared CDR3βs (aa level) using the same Vβ gene, Jβ gene, and Vβ-Jβ pair for SP-CD4 and SP-CD8 T cell populations, comparing mice with the same (autologous) versus allogeneic thymus in experiment 2. ***P < 0.001, by paired t test with Bonferroni’s multiple testing correction. Box-and-whisker plots show the median, range, interquartile range, and outliers.

We characterized the V and J gene usage of shared CDR3βs among SP-CD4 cells. Only approximately 20%–25% of CDR3βs shared between different thymi used the same V gene, whereas almost all shared CDR3βs used the same J gene. The percentage of shared CDR3βs that used the same V-J pair and hence the same TCRβ chain is therefore 20%–25% (Figure 4G). These percentages were not different when allogeneic versus autologous thymi were compared.

TCRβ chain overlap between different cell subsets in individual human thymus grafts. To better understand the selection of shared CDR3β sequences, we compared CDR3β sequences of SP T cell populations within each mouse in experiment 2. As shown in Figure 5A, there was overlap in CDR3β sequences between SP thymocyte populations from each mouse, especially among the 100 most frequent sequences. Among the sequences with identical CDR3βs in different mature thymocyte subsets, approximately 60% used the same V gene within individual mice (Figure 5B), whereas approximately 40% used different V genes. Almost all of the shared CDR3βs were associated with the same J gene, so approximately 60% of TCRs with shared CDR3βs used the same V-J pair and shared the entire TCRβ chain (Figure 5B). Since the HLAs that SP-CD4 and SP-CD8 cells are selected on are different, these results suggest that cross-reactive TCR β chains can be selected on different MHCs. CDR3βs that were shared in both SP-CD4 and SP-CD8 cells of each mouse in experiment 2 had an average nucleotide-per-aa sequence ratio of approximately 2, whereas the CDR3βs that were not shared had an average ratio close to 1, pointing to preferential selection of the shared CDR3βs in both T cell subsets (Supplemental Table 11).

Figure 5 Overlap between different cell subsets and enrichment for cross-reactive/autoreactive CDR3βs among shared sequences. (A) Proportions of shared CDR3βs between paired cell populations in each thymus graft in experiment 2 (aa level) among all versus the 100 most frequent CDR3βs (n = 6). Potentially ambiguous sequences present in more than 1 cell population were not removed from this analysis. Supplemental Table 11 shows the average number of unique nucleotide sequences per aa sequence for shared versus unshared CDR3βs between SP-CD4 and SP-CD8 cells. (B) Proportion of shared CDR3βs with a shared Vβ gene, Jβ gene, and Vβ-Jβ pair, comparing each pair of SP cell populations in each mouse in experiment 2. (C and D) ORs of cross-reactivity in shared versus unshared sequences, sharing in cross-reactive versus allo–non–cross-reactive sequences, and T1D reactivity in shared versus unshared sequences for experiments 1, 2, and 3. P values are shown in Supplemental Table 12. (E) Clone fraction and cumulative frequency of T1D-reactive CDR3βs in different cell subsets in experiment 2. *P < 0.05 and ***P < 0.001, by unpaired t test with Bonferroni’s correction (A) and paired t test with Bonferroni’s correction (paired by mouse) (E). Box-and-whisker plots show the median, range, interquartile range, and outliers.

Shared CDR3βs are more likely to be cross-reactive than are unshared sequences. The data above suggested that shared CDR3 sequences might be highly cross-reactive against disparate specificities. To address this possibility, we compared the repertoires of shared and unshared CDR3βs from SP cell populations in both experiments with a list of cross-reactive CDR3βs defined by a greater than 2-fold frequency expansion in mixed lymphocyte reactions of a human peripheral blood sample against 2 different allogeneic donors sharing no HLA alleles. Among 100,112 and 29,033 alloreactive CDR3β sequences, 1,019 sequences expanded to both stimulators and were therefore identified as cross-reactive. Fisher’s exact test revealed a highly significant increase (Supplemental Table 12) in the rate at which shared versus unshared CDR3β sequences from experiments 1, 2, and 3 were cross-reactive against 2 different sets of alloantigens (Figure 5C). Conversely, we observed a highly significant increase in the odds of allo–cross-reactive sequences compared with alloreactive but non–cross-reactive sequences being shared between the mice in experiments 1 and 2 (Figure 5C) . P values by Fisher’s exact test for the OR of cross-reactivity in shared versus unshared sequences as well as for the OR of sharing in cross-reactive sequences versus allo–non–cross-reactive sequences are listed in Supplemental Table 12. These data demonstrate that shared CDR3 sequences are more cross-reactive than are unshared sequences.

Selection of autoreactive TCRs. In view of the evidence for cross-reactivity of shared sequences selected between disparate thymi and subsets, we hypothesized that shared sequences might be enriched for autoreactivity. We interrogated a previously described list of 1655 T1D-associated autoreactive CDR3βs (20), along with some newer unique CDR3β aa sequences (total of 2208 sequences) associated with T1D, largely from peripheral blood but also found in pancreas, LNs, and spleen of T1D donors from the network for Pancreatic Organ donors with Diabetes (nPOD) program (21). These sequences were derived from a number of assays including sequencing of T cells following FACS proliferation of dye-labeled responding T cells harvested following culture with autoantigens (22), direct MHC tetramer isolation of autoreactive T cells (22–25), or following isolation and examination of peptide reactivities from islet-infiltrating T cells (26). T1D reactivity for these sequences was defined as reactivity to islet antigens such as GAD65 and insulin as described previously (21).

Comparison of these autoreactive TCRs with the TCR repertoires of grafted thymi in experiment 2 revealed a significant increase in both the cumulative frequency and clone fraction of T1D-associated sequences in SP-CD8 versus DP-CD69– cell populations (Figures 5E). Remarkably, the odds that a CDR3β shared between SP subsets in any 2 mice in experiments 1, 2, or 3 was T1D reactive was highly significantly greater than that for nonshared CDR3βs (Figure 5D), suggesting that shared CDR3s were enriched for autoreactivity. The P values for the odds of T1D reactivity in shared versus unshared sequences are listed in Supplemental Table 12.

CDR3α and TCR sharing from single-cell sequencing. To determine the extent to which CDR3β sharing was associated with sharing of the entire TCR, including the α chain, we performed single-cell TCR sequencing of thymic SP-CD4 cells from the same mice whose cells were bulk-sequenced in experiment 2 (except mouse 2autoA, due to a technical failure). Comparing each pair of mice, we found that the level of CDR3α sharing was significantly higher than that of CDR3β sharing (Figure 6A). However, the level of sharing for paired CDR3α-CDR3β was near zero and significantly lower than for either TCR chain on its own (Figure 6A), showing that the TCRs were almost always different among clones with a shared CDR3 α or β sequence. Consistent with the findings from bulk sequencing, the levels of shared CDR3s were not different between mice with allogeneic versus autologous thymi, either for TCRα, TCRβ, or paired TCRα-TCRβ (Figure 6B). The number of unique CDRαs, CDR3βs and paired CDR3α-CDR3βs, the fraction of cells with a β chain that have at least 1 paired α chain or 2 paired α chains and the fraction of cells with an α chain that have a paired β chain is shown in Supplemental Table 13.

Figure 6 Fraction of shared CDR3αs, CDR3βs, and paired CDR3α-CDR3βs revealed by single-cell T cell sequencing. (A) Fraction of shared CDR3αs, CDR3βs, and paired CDR3α-CDR3βs for SP-CD4 cells between each pair of mice in experiment 2 (except 2autoA mice) at the aa level (comparisons in both directions, n = 20 comparisons). ***P < 0.001, by unpaired t test. (B) Comparisons in both directions between each pair of mice, depending on whether the mice received the same (autologous thymus, n = 8 comparisons) or a different thymus (allogeneic thymus, n = 12 comparisons).

Sub-sequence features are conserved in shared CDR3βs. Methods from Greiff et al. (27), which successfully distinguished between public and private antibody repertoires, were applied to this data set to determine whether sub-sequence-level features can distinguish between shared and unshared sequences. This method uses a normalized gapped k-mer (2 sub-sequences of length k, separated by a gap of up to m aa) count as an input to a support vector machine (SVM) to determine whether a shared or unshared status can be predicted. Optimal parameters determined by Greiff et al. (k = 1, m = 1, and cost = 100) were used for SVM analysis, and 10-fold cross validation was performed to assess the performance of the classifier, using balanced accuracy (mean of sensitivity and specificity) as a performance metric. This was repeated on 100 length-matched shared and unshared sequence data sets generated as described above. As shown in Supplemental Figure 4A, these features can be used to predict a shared or unshared status of sequences with a median balanced accuracy of approximately 62% to 78% for all cell subsets, in which 50% would be equivalent to a random classifier. The frequency of gapped k-mers in shared sequences plotted against the frequency in unshared sequences further supported the hypothesis that there are sub-sequence features that are conserved in shared sequences (Supplemental Figure 4B). We also found a notable enrichment in the “CASSL” motif at the 5′ end of shared CDR3βs relative to unshared sequences, even in the unselected CD69– DP cell population (Supplemental Figure 5), though that motif was highly represented in both shared and unshared sequences (Supplemental Figure 5).

Evidence suggesting a role for self-peptides in human thymocyte selection. In preselection murine thymocytes, TCRβ CDR3 interfacial hydrophobicity at position 6 and position 7 (P6 and P7), the residues that interface with the peptide and MHC, correlated with the ability to be activated by self-peptide and MHC (28). Stadinski et al. developed a self-reactivity index based on the hydrophobicity of aa at CDR3β P6 and P7 and showed that this index correlates well with increased and decreased self-reactivity during positive and negative selection, respectively. We performed a similar analysis on human thymocyte and peripheral T cell subsets from experiments 2 and 3, focusing on CDR3β lengths of the greatest frequency in all thymocyte subsets (13–16 aa). For each mouse, we analyzed P6 and P7 aa frequencies in the thymic (DP-CD69–, DP-CD69+, SP-CD4+, SP-CD8+, and Treg) and peripheral (CD4+ and CD8+) cell populations, and normalized the frequencies within each cell population. For each animal, a fold-change in frequency of P6 and P7 aa residues between cell populations was recorded, and these values were averaged across the mice within each grafted thymus group (i.e., experiment 2, allogeneic versus autologous thymus, and experiment 3). For experiment 2 samples, we compared SP-CD4, SP-CD8, and Tregs against DP-CD69– thymocytes to evaluate the entire thymic selection process, DP-CD69+ thymocytes against DP-CD69– thymocytes, and SP-CD4, SP-CD8, and Tregs against DP-CD69+ thymocytes. For experiment 3 samples, we compared peripheral CD4+ cells and peripheral CD8+ cells against SP-CD4 and SP-CD8 cells, respectively. As shown for P6 in Figure 7A and Supplemental Figure 6A and for P7 in Figure 7B and Supplemental Figure 6B, we observed a trend toward enrichment of hydrophobic aa (as defined in Figure 8C in ref. 29) at both positions as thymic selection progressed. Results of the Spearman’s nonparametric rank test for the mice in experiments 2 and 3 are shown in Figure 7, A and B. We observed statistically significant correlations between the fold changes of the aa residue at P6 (Figure 7A) or P7 (Figure 7B) and its hydrophobicity during selection from the DP-CD69– stage to the mature SP-CD4, CD8+, and Treg populations. Both autologous and allogeneic thymi showed a similar trend toward increasing hydrophobicity at P6 and P7 as selection progressed (Supplemental Figure 6). Positive selection from the CD69– to the CD69+ DP stage was associated with significantly increased P6 hydrophobicity, and overall selection from CD69– DP to both CD4+ and CD8+ SP populations was associated with significantly increased hydrophobicity at P6 and P7 and at P6 for the CD69–-to-Treg transition. The CD69+ DP to SP transition was associated with significantly increased hydrophobicity only for SP-CD8 cells at P6 and for SP-CD8 and Tregs at P7. Overall, we found that increased hydrophobicity with the transition from DP-CD69– to SP cells was more pronounced than with the transition from DP-CD69+ to SP cells. This trend was stopped or reversed in the transition from SP-CD4 and SP-CD8 cells to peripheral CD4+ and CD8+ cells, both at P6 and P7 (Figures 7, A and B, and Supplemental Figure 6). In sum, our data demonstrate an increase in hydrophobic aa usage at P6 and P7 in association with selection of human thymocytes (more associated with positive selection) and arrest or reversal of this trend in the transition from SP thymocytes to peripheral T cells, possibly in association with completion of negative selection.

Figure 7 Interaction with self-peptides in the selection of shared and unshared sequences. Fold changes (mean ± SEM) in the relative aa frequencies versus hydrophobicity of the aa based on Gibbs free energy at P6 (A) or P7 (B) for transition from DP-CD69– to DP-CD69+ cells and from there to SP cell subsets in experiment 2, and also in transition from SP-CD8 and SP-CD4 to peripheral CD8+ and CD4+ cells for experiment 3. Spearman’s correlation coefficient R and P values from the nonparametric Spearman’s correlation test are shown. Negative R values imply that, as hydrophobicity increases, so does the fold change in the relative aa frequency across the 2 cell populations. *P < 0.05, **P < 0.01, and ***P < 0.001, by unpaired t test. (C) Differential abundance of each aa at each position in CDR3β, computed by random selection of a length-matched unshared sequence for each shared sequence. Shared sequences are those present in at least 2 mice, and unshared sequences are unique to a single mouse. Only results for aa producing a Benjamini-Hochberg–adjusted P value of less than 0.05 by Fisher’s exact test are shown. The aa plotted at a frequency of 0 were preferentially used at that position in shared sequences, whereas those with a frequency of less than 0 were preferentially used in unshared sequences.

Shared sequences might escape negative selection. To analyze differential usage of aa at each position as defined by the international ImMunoGeneTics (IMGT), we performed a Fisher’s exact test for all sequences in each of the 100 length-matched data sets of shared and unshared sequences. Differentially used aa were plotted if the Benjamini-Hochberg–adjusted P value was less than 0.05 for the Fisher’s exact test to ensure that differences were significant (Figure 7C). Only aa showing up in at least 75 of the 100 downsamples were annotated. We noted a significant enrichment for the neutral aa G and hydrophilic aa (e.g., Q) and a significant decrease in hydrophobic aa (e.g., W) at P6 and P7 (equal to positions 109 and 110 on the plots, respectively) in shared sequences among CD69+ DP cells, most thymic SP cell populations, and peripheral cell populations (Figure 7C). We did not observe this pattern for shared sequences among CD69– DP cells. The reduced hydrophobicity at P6 and P7 in shared sequences among selected but not unselected cell populations suggests that selected shared sequences may have weaker interactions with self-peptides than unshared sequences and that this may allow them to escape negative selection.

V and J gene usage. As shown in Supplemental Figure 7, the pattern of V and J gene usage for the SP-CD4 cell population was very similar between the mice that received autologous tissue versus those that received allogeneic thymus tissue in experiment 2, with no significant differences in V or J gene usage, arguing against a major role for selection in determining V and J gene usage. Overall, we detected a similar pattern of V and J gene usage for SP-CD4 cells when comparing mice in experiments 1, 2, and 3, which received different HSCs as well as different thymi (Supplemental Figure 7). We also observed a similar pattern of V and J gene usage between thymic SP-CD4 and peripheral CD4+ cells in the mice in experiment 3 (Supplemental Figure 7). We also observed similar V and J gene usage patterns across all cell populations for the mice in experiment 2 (Supplemental Figure 8). Few statistically significant differences are shown in Supplemental Figure 7 and Supplemental Figure 8. Thus, we detected only small effects of genetic background and/or thymic selection on the overall pattern of V and J gene usage. These findings were confirmed by single-cell TCR-sequencing data, which showed a similar pattern of V and J gene usage for both α and β chains comparing SP-CD4 cells in mice with allogeneic versus autologous thymi in experiment 2 (Supplemental Figure 9).

Supplemental Figure 10A shows plots of VJ usage for CD4+ repertoires of 1autoA, 1autoB and 1autoC mice, which received the same thymic tissue and HSCs. These representative plots show no disproportionately favored V-J pairing in the repertoire. We detected a strong correlation between the observed VJ usage and the VJ usage expected from the stochastic combination of V genes with J genes according to the background frequency of each V and J (Supplemental Figure 10B). The observed and expected VJ distributions were compared by Mann-Whitney U test, which failed to reject the null hypothesis that VJ pairing is stochastic.