Development and validation of an MHPA strategy for sequencing error suppression and detection of low-frequency mutations in TSC2 and TP53. Detection of low–allele frequency (<1%) mutations is challenging because of the high error rate/intrinsic noise in standard MPS (20–22). To obtain reliable detection of extremely-low-VAF (<0.1%) mutations, we developed an ultrasensitive MHPA strategy derived from similar efforts (22–24) (Figure 2A; and see Methods). MHPA is a multiplex amplicon-based strategy that employs barcoding of single DNA molecules with unique molecular identifiers (UMIs), which are random 14-nt sequences, to enable error suppression. MPS read data are compressed to paired-end consensus reads from all the reads for each individual UMI. UMI compressed consensus sequences are retained when there is no variation in the sequencing data among different reads, and converted to the reference sequence when there is inconsistent variation among reads suggestive of PCR error (Figure 2B).

Figure 2 MHPA strategy. (A) Schematic representation of the major MHPA steps. MHPA consists of PCR amplification of short DNA segments using primers that include unique molecular identifier (UMI) barcodes, followed by library preparation and sequencing. In the first reaction (step 1), a multiplex linear amplification of each genomic region occurs using reverse primers only with inclusion of a random 14-nt UMI. Following purification, another linear amplification (step 2) of each genomic region occurs using forward primers only. Following purification, amplification of the UMI-barcoded molecules occurs using universal primers (UP) (steps 3 and 4). Step 3 is performed for optimization of MHPA assays, in which amplicons are labeled using a fluorescent dye, 6-carboxyfluorescein (FAM), followed by capillary electrophoresis to assess abundance of each amplicon. Step 4 is used to generate the MHPA libraries, which are purified and subjected to MPS. (B) Comparison of conventional and UMI-based MPS variant calling strategies. Barcoding of single DNA molecules with UMIs enables compression of the MHPA data to consensus reads that permits sequencing error suppression. (C) Map of deleterious germline TSC2 variants reported in the LOVD database. The y axis indicates the number of TSC2 variants at a single nucleotide position. Recurrent variants appear as vertical lines. The color of the line indicates the type of variant, as shown in the inset legend. Hotspots with variants reported more than 30 times are labeled with coding sequence nucleotide (c.) and amino acid (p.) position. Splice mutations are summed and shown as a single bar at each exon-exon junction. Genomic regions covered by MHPA amplicons are marked in gray, with indication of the fraction of the deleterious germline variants covered by each of the amplicons (%).

To implement MHPA analysis of TSC2, we reviewed the Leiden Open Variation Database (LOVD) (25) and prepared a comprehensive list of pathogenic variants in TSC2 (total n = 4402; unique n = 1595) (1). Our goal was to maximize coverage of TSC2 with relatively uniform read depth, while minimizing the amount of DNA needed for the MHPA reactions. Forty short (48–117 nt) amplicons (queried sequence, Supplemental Table 1.1; supplemental material available online with this article; https://doi.org/10.1172/JCI155858DS1) were designed to amplify TSC2 exon and exon-intron junction regions with the most mutations, covering in aggregate 74% of reported germline pathogenic TSC2 variants (Figure 2C and Supplemental Table 1.1). PCR amplification was performed in two 20-plex reactions, to conserve DNA and effectively amplify adjacent genomic regions. After initial first-strand and second-strand linear amplification reactions, universal primer amplification was performed (Figure 2A).

An MHPA assay for TP53 was also generated, using 12 primer pairs to amplify similar short segments (90–126 nt; queried sequence, Supplemental Table 1.2) of the exons and exon-intron junctions of TP53, covering all hotspot regions and 93% of somatic nonsynonymous TP53 variants, identified previously in normal keratinocytes and skin cancers (Supplemental Table 1.2 and Supplemental Figure 1). Similarly to PCR amplification for the TSC2-MHPA assay, PCR amplification for the TP53-MHPA assay was performed in 2 separate reactions (each 6-plex).

The TSC2-MHPA and TP53-MHPA assays were used for the analysis of 81 and 39 DNA samples, respectively (Table 1). MHPA enabled extremely high depth of coverage, with median read depths before and after UMI consensus compression of 185,935× and 19,942×, respectively, with TSC2-MHPA and 217,679× and 35,997×, respectively, with TP53-MHPA, for a median UMI compression of 9-fold for TSC2 and 6-fold for TP53 (Supplemental Figure 2A). DNA input was generally 10–50 ng for each of the 20-plex (TSC2-MHPA)/6-plex (TP53-MHPA) reactions. Since 10–50 ng is equivalent to 2000–8000 haploid genomes, our maximum sensitivity is 0.05% to 0.01%.

Table 1 Summary of the samples analyzed using TSC2-MHPA and TP53-MHPA assays in this study

During optimization of TSC2-MHPA and TP53-MHPA assays, amplicons were labeled using a fluorescent dye (6-carboxyfluorescein [6-FAM]) followed by capillary electrophoresis to assess uniformity of amplification (Figure 2A, quality control step 3; and Supplemental Figure 2B). A high correlation was seen between capillary electrophoresis peak heights and amplicon depth of coverage (TSC2-MHPA read depth before and after UMI consensus compression: r = 0.71, P < 0.0001, and r = 0.40, P < 0.0001), as expected, indicating that this was a robust method to predict read depth, and enable optimization of the MHPA primer setup (Figure 2A and Supplemental Figure 2C).

Read depth in the TSC2- and TP53-MHPA assays was relatively uniform across all amplicons, with a maximum fold change (comparing median coverage of the amplicons with highest and lowest read depth) of 4.6 for TSC2-MHPA and 4.2 for TP53-MHPA (Supplemental Figure 2D). As expected, UMI consensus compression caused a greater reduction in sequencing depth for lower amounts of input DNA for MHPA (for TSC2-MHPA, r = 0.63, P < 0.0001; for TP53-MHPA, r = 0.65, P < 0.0001) (Supplemental Figure 2E).

Forty-seven of 81 samples included in this study were analyzed previously in our laboratory using hybrid capture (MPS of the entire exonic and intronic extent of TSC1 and TSC2; VAF limit of detection: ~0.5%) and/or amplicon MPS (high–read depth targeted validation of the findings from hybrid-capture MPS) (11, 26). These samples were derived from individuals with TSC with low-level systemic TSC2 mosaicism detected previously in various fluids and tissues, including blood, saliva, semen, normal skin, and TSC skin lesions, available from each patient (including 19 FAF, 10 TSC normal skin, 13 blood, 1 semen, and 1 UF samples) or heterozygous TSC2 mutations (3 TSC nipple angiofibroma [AF] samples) (Table 1 and Supplemental Table 2). Thirty-five of 36 findings (97%) identified in our previous MPS analyses were validated by our TSC2-MHPA assay, with very high positive correlation of the observed VAFs (r = 0.99, P < 0.0001), confirming the reliability of MHPA (Supplemental Figure 3). The only finding that was not validated had been identified in a FAF, and also seen in a matched blood sample at low VAF (0.17%); the variant was not validated in MHPA analysis of the blood sample, suggesting that the earlier finding had been overcalled. In addition, MHPA done independently twice on the same sample for 8 samples showed extremely high concordance for mutation detection, 100% for VAF ≥ 0.08% (r = 0.99, P < 0.0001; Supplemental Table 3 and Supplemental Figure 4). At a VAF lower than 0.08%, many but not all variants were seen in both samples, consistent with random effects on inclusion/amplification of a variant allele when it occurs at very low VAF.

MHPA provides at least a 10-fold improvement in sensitivity (detection of variants with VAF 0.01%–0.05%) in comparison with our previous MPS approach for TSC mutation detection (11, 12).

Results of MHPA analysis of TSC FAFs and other TSC samples. MHPA was used for the analysis of several different types of samples: (a) TSC facial angiofibroma biopsies (“TSC-FAF”); (b) TSC nipple angiofibroma biopsies (“TSC-nipple AF”); (c) TSC normal skin, mostly from upper arm (“TSC-NS”); (d) samples from TSC blood, semen, and buccal swab (“TSC-blood/semen/buccal swab”); (e) TSC ungual fibroma biopsy (“TSC-UF”); (f) normal skin samples from individuals without TSC, which were adjacent to resected basal cell carcinoma (BCC) lesions on different sun-exposed body areas (“nonTSC-BCCadj NS”); (g) normal skin samples from individuals without TSC, from inner upper lateral arm (reduced sun exposure) (“nonTSC-NS”); and (h) neonatal foreskin samples from individuals without TSC (no sun exposure) (“nonTSC-foreskin”) (Table 1 and Supplemental Table 4). Note that 9 of 13 TSC blood and 6 of 10 TSC normal skin samples were matched (derived from the same TSC individual) with the respective samples from the TSC-FAF set.

Shower of UV-related TSC2 mutations within FAFs, defining thousands of subclinical “micro-FAFs” in TSC facial skin. TSC2-MHPA analysis of 24 FAFs (TSC-FAF set) confirmed all 20 variants identified in 19 of the 24 FAFs by our previous analyses, and identified 108 new, and a total of 112 low-VAF (0.01%–8.02%, median 0.08%), somatic single-nucleotide variants (SNVs), dinucleotide variants (DNVs), and indels (Figure 3). None of these somatic TSC2 mutations seen in FAF were seen in blood (n = 9) or normal skin samples (n = 6) from the same patient, analyzed by TSC2-MHPA (Supplemental Table 2.1).

Figure 3 Summary of TSC2 and TP53 mutations identified using MHPA in TSC skin samples. (A) Top: TSC2. Bottom: TP53. Blue diamonds indicate systemic mosaic or heterozygous germline (VAF = 50%) mutations, while circles indicate somatic mutations; orange-filled circles correspond to CC:GG>TT:AA mutations. The y axis indicates the VAFs, while the x axis indicates sample labels; the colors of sample labels correspond to different TSC sample subgroups, as indicated in the inset legend. No. somatic muts: number of somatic mutations observed in each of the samples; Epidermis+dermis: whole-skin biopsies; Dermis: biopsies with removed epidermis. (B) Comparison of the number of somatic TP53 mutations in FAF whole-skin biopsies (Epi + derm FAF) and FAF biopsies with removed epidermis (Derm-only FAF). The comparison was performed using Mann-Whitney test. The horizontal bars indicate median values. **P < 0.01. (C) Correlation analysis between the number of somatic mutations in TSC2 and that in TP53 in the respective whole-skin biopsies; r represents Pearson’s correlation coefficient. The curve was generated using linear regression.

Since the majority of these new low-VAF somatic variants were functionally inactivating (see below), we hypothesized that they reflected the occurrence of additional subclinical clonal fibroblast populations in these facial biopsies, that could be considered micro-FAF tumors, and set out to examine this hypothesis. The VAF of the TSC2 systemic mosaic/germline variant was higher than any of the somatic TSC2 mutation VAFs in each FAF (Figure 3A). In addition, when multiple FAFs derived from a single person were studied (n = 8 subjects with ≥2 FAFs analyzed), each FAF had a different spectrum of low-VAF TSC2 mutations, indicating they had arisen independently, likely in different fibroblast clones (Supplemental Table 2.1).

Among the 112 somatic TSC2 mutations identified in the TSC-FAF set, there were 60 (54%) SNVs, 37 (33%) DNVs, and 15 (13%) indels (Figure 4A). Thirty-four of 37 (92%) DNVs were CC:GG>TT:AA, indicative of UV causation. Fifty-one of 60 (85%) SNVs were also very likely due to UV irradiation based on prior studies (15), with 34 C:G>T:A and 17 G:C>T:A. Fourteen of 15 (93%) somatic TSC2 indels in TSC-FAFs were deletions, of size 1–18 nt; 1 somatic small TSC2 insertion was also identified (Supplemental Table 2.1).

Figure 4 Spectrum of somatic TSC2 and TP53 mutations identified using MHPA in skin samples. (A and B) Summary of somatic TSC2 (A) and TP53 (B) mutations. The number of samples analyzed is indicated in parentheses next to the label for each subgroup of samples (TSC-FAF, TSC-NS, and nonTSC-BCCadj NS); n indicates the number of mutations identified in each subgroup. The top pie charts indicate proportions of the identified DNVs, SNVs, and indels. The bottom pie charts indicate proportions of different mutation subtypes color-coded according to the inset legend at left. For missense variants, the functional significance is provided according to the consensus assessment included in Supplemental Tables 2 and 5. P values are based on Fisher’s exact test for comparison of nonsynonymous and synonymous variant fractions in the respective subgroups of samples. NS/S, ratio of the number of coding nonsynonymous to synonymous variants. *P < 0.05.

Most of the identified somatic non-indel TSC2 mutations were missense (44 of 97, 45%) or nonsense (23 of 97, 24%) (Figure 4A). Twenty-one of 44 missense variants (48%) were either likely pathogenic or pathogenic, while 21 (48%) were variants of unknown significance (VUS), and 2 (4%) were likely benign (Supplemental Table 2.1 and Figure 4A; and see Methods). The ratio of nonsynonymous to synonymous (NS/S) TSC2 variants was 5.9 in TSC-FAF, suggesting that they were not background “noise” or passenger events, but rather were inactivating mutations in TSC2 for the most part, fitting our hypothesis that these low-VAF mutations were driving clonal proliferation of micro-FAFs present in these biopsies.

Further evidence that these mutations were functional were the differences observed in mutation frequency and pattern in TSC-FAF versus TSC-NS and other samples. The NS/S ratio was higher in TSC-FAF (5.9) than in TSC-NS (3.1) and nonTSC-BCCadj NS (2.1) (Figure 4A), suggesting that TSC2 mutations in nonTSC-BCCadj NS may in many cases be passenger events in keratinocyte clones whose growth is driven by TP53 and other mutations, while, in TSC-NS, some TSC2 mutations may be functional and others not.

In addition, the number of all TSC2 somatic mutations per sample analyzed and the number of TSC2 CC:GG>TT:AA mutations per sample analyzed were higher in TSC-FAF than in TSC-NS samples (n = 10), and higher in TSC-NS than in TSC-UF/TSC-nipple AF (n = 4) and TSC-blood/semen samples (n = 14) (P = 0.0002 and P < 0.0001, Kruskal-Wallis test; Figure 5A). At least one CC:GG>TT:AA mutation in TSC2 was seen in 17 of 24 TSC-FAFs (71%), while no such mutations were observed in TSC-blood/semen/buccal, TSC-UF, or TSC-nipple AF samples (0 of 19, P < 0.0001, Fisher’s exact test) (Figure 5A). These findings further confirm the important role of UV radiation in these mutational events occurring predominantly in TSC facial skin, in contrast to other sites.

Figure 5 Comparison of the number of mutations identified by MHPA in different groups of TSC samples. (A) Left: Each dot represents the number of all somatic TSC2 mutations in an analyzed sample. Right: Each dot represents the number of all CC:GG>TT:AA TSC2 mutations in an analyzed sample. (B) Left: Each dot represents the number of all somatic TP53 mutations in an analyzed whole-skin biopsy sample. Right: Each dot represents the number of all CC:GG>TT:AA TP53 mutations in an analyzed whole-skin biopsy sample. The comparisons were performed using Kruskal-Wallis test. P values for the pairwise comparisons within multiple groups were adjusted for multiple comparisons using post hoc Dunn’s test, performed along with Kruskal-Wallis test. Significant Dunn’s P values (<0.05) are indicated above the respective plots. The horizontal bars indicate median values. *P < 0.05, ***P < 0.001, ****P < 0.0001.

The distribution of nonsynonymous/intronic somatic mutations identified in TSC-FAFs among the exons of TSC2 was similar to the distribution of pathogenic germline mutations reported before in the LOVD database (r = 0.66, P < 0.0001; Supplemental Figure 5A and Supplemental Table 1.1). Most FAF somatic variants were observed just once, but some TSC2 aa positions appeared to be relative hotspots (Figure 6). Five of the mutated aa positions had been observed to be affected by CC:GG>TT:AA mutation in our previous study of FAF fibroblast cultures (Figure 6 and ref. 14). Thirty of 522 (6%) CC:GG sites in the TSC2 region sequenced using TSC2-MHPA were affected by CC:GG>TT:AA mutation in 1 or more FAFs (Supplemental Table 2.1).

Figure 6 Map of somatic TSC2 and TP53 mutations in TSC FAFs. Top: TSC2. Bottom: TP53. Each lollipop indicates mutation at the amino acid (aa) position indicated below the plot. Circles indicate mutations identified by MHPA, while triangles indicate mutations identified in the MPS study of FAF fibroblast cell cultures performed previously in our group (Tyburczy et al., 2014, ref. 14). The number of circles/triangles and corresponding height of the lollipop correspond to the number of mutations observed at each aa position. Mutations seen at least 4 times are labeled with the nucleotide (c.) and amino acid (p.) position. Types of mutations are color-coded as indicated in the inset legend. Larger symbols correspond to CC:GG>TT:AA UV-related mutations, while smaller symbols indicate all other mutation subtypes.

In 11 instances in 7 FAFs, 2 somatic TSC2 mutations or a somatic and a germline TSC2 mutation were located in the same amplicon. Scrutiny of the reads using Integrative Genomics Viewer (IGV) indicated that all mutations occurred in trans, affecting different alleles (Supplemental Figure 6). In one FAF (P2_FAF) there were 6 somatic low-VAF indels and SNVs in trans (VAF 0.017%–0.14%) in the same TSC2 exon (Supplemental Figure 6).

Considering the number of mutations identified, the mosaic/germline VAF in these biopsies, and the size of the total facial skin (see Methods), we estimate that approximately 150,000 independent clonal fibroblast proliferations due to second-hit mutations in TSC2 occur in the skin of TSC2 patients (micro-FAFs), a small proportion of which develop into observable FAF lesions (Figure 7). As our MHPA strategy did not assess all exons of TSC2, and LOH events are not detected by our methods, this figure may be considered a minimal estimate, and the true number of micro-FAFs may approach 500,000 to 1,000,000 per TSC patient.

Figure 7 Diagram of UV effects on TSC FAF development. Our findings suggest that thousands of independent clonal fibroblast proliferations (subclinical micro-FAFs) due to second-hit mutations in TSC2 occur in the skin of TSC2 patients, a small proportion of which develop into observable FAF lesions.

We observed that single, clinically visible FAF biopsy contained on average approximately 5 second-hit mutations. Each of the identified second hits may represent either the clinically visible FAF or micro-FAFs nearby. For many of the mosaic TSC-FAFs analyzed in this study (e.g., P1_FAF2, P3_FAF, and P6_FAF1; Supplemental Table 2.1), we were able to identify 1 second-hit mutation with VAF close to the VAF of the systemic mosaic variant, which we think is likely derived from the clinically visible FAF. The multiple additional somatic mutations seen in most samples had significantly lower VAF (0.01%–0.5%), and they are likely derived from other FAF lesions in the specimen (micro-FAF) that were subclinical.

Validation of new MHPA TSC2 findings by prior MPS. Forty-one of 110 TSC2 somatic variants newly identified by MHPA in TSC-FAF or TSC-nipple AF samples were compared with prior hybrid-capture MPS data (mean depth of coverage: ~500×) (11, 26). Ten of 41 (24%) variants with median 0.12% VAF were seen in 1–3 reads, consistent with the MHPA findings (Supplemental Figure 7A). The median VAF of the remaining 31 of 41 (76%) MHPA variants (0.07%) was significantly lower (P = 0.002, Mann-Whitney test), explaining their absence in our prior hybrid-capture MPS analysis (Supplemental Figure 7B).

Comparison of TP53 mutations with TSC2 mutations in TSC-FAF. TP53-MHPA analysis of the TSC-FAF set (Table 1) led to the identification of 188 low-VAF (0.01%–3.50%, median 0.13%) variants: 119 (63%) SNVs, 55 (29%) DNVs, 9 (5%) indels, and 5 (3%) adjacent indel-SNVs (see below for further discussion of the last category). Fifty-two of 55 (95%) DNVs were CC:GG>TT:AA, due to UV mutagenesis. TP53 SNVs in FAFs were predominantly C:G>T:A (n = 63 of 119, 53%) and G:C>T:A (n = 36 of 119, 30%), similar to TSC2 SNVs in FAFs. All 9 TP53 indels in FAFs were deletions, also similar to TSC2 indels in FAFs (all but one deletions) (Supplemental Table 5.1).

Most of the non-indel TP53 mutations in FAF were missense (123 of 174, 71%), the vast majority of which, 107 of 123 (87%), were reported to be likely pathogenic or pathogenic (Figure 4B and Supplemental Table 5.1). The NS/S ratio for TP53 mutations was extremely high in TSC-FAF (29.8), indicating that TP53 mutations are under strong selective pressure in TSC facial skin (Figure 4B).

TP53 mutations were much more common in whole-skin FAF biopsies (dermis + epidermis; n = 18, range 1–36 mutations, median 5 mutations) than in FAF biopsies that were dermis only (n = 5, range 0–2 mutations, median 1 mutation) (P = 0.004, Mann-Whitney test; Figure 3B), suggesting that TP53 mutations were occurring mainly in keratinocytes in these samples, rather than fibroblasts in TSC facial skin, consistent with previous studies in individuals without TSC (15, 18). In addition, the VAF of somatic TP53 mutations was significantly higher than the TSC2 VAF (median VAFs: 0.13 and 0.07, respectively; P = 0.0001, Mann-Whitney test) in whole-skin TSC-FAF (Supplemental Figure 8), also suggesting a different cell of origin for the TP53 mutations. An average of 10.1 (range 1–36) TP53 mutations were identified per whole-skin FAF, and correlated with the number of TSC2 mutations in the same sample (n = 18, r = 0.63, P = 0.005; Figure 3C), likely reflecting the role of UV irradiation in the generation of both genes’ mutations, albeit in different cell types.

The number of TP53 mutations was significantly higher in whole-skin TSC-FAF (median 5) than in TSC-NS (median 2) and TSC-UF/TSC-nipple AF (median 0) samples (Kruskal-Wallis test, P = 0.02), owing to expected differences in UV exposure in these biopsy sites (Figure 5B). No CC:GG>TT:AA TP53 mutations were seen in TSC-nipple AF or TSC-UF samples, in contrast to whole-skin TSC-FAF, where 14 of 18 (78%) biopsies had at least one CC>TT mutation, similar to our findings for UV-induced mutations in TSC2 in these samples.

The distribution of the nonsynonymous/intronic somatic TP53 mutations in FAFs mirrors the distribution of somatic mutations reported previously in normal keratinocytes/skin cancers (15–17, 27–39) (r = 0.92, P < 0.0001; Supplemental Figure 5B and Supplemental Table 1.2), including multiple mutations at well-known hotspots (e.g., aa 248) (Figure 6). Twenty-seven of 175 (15.4%) CC:GG sites in the region targeted by TP53-MHPA were affected by CC:GG>TT:AA mutation in 1 or more FAFs (Figure 6 and Supplemental Table 5.1).

TSC2 and TP53 mutations in normal-appearing skin. TSC2-MHPA and TP53-MHPA analysis of 8 samples from sun-exposed nonTSC-NS adjacent to BCC revealed a large number of somatic TSC2 and TP53 mutations (7.3 and 32.0 mutations per sample, respectively) (Supplemental Figure 9). A large fraction of the mutations were CC:GG>TT:AA for both genes (TSC2: 22 of 58 [38%]; TP53: 101 of 256 [39%]) (Supplemental Figure 9A and Supplemental Tables 2.2 and 5.2).

We also performed TSC2-MHPA analysis of 2 additional panels of normal skin biopsies from individuals without TSC: (a) a panel of 10 normal skin biopsies from the inner upper lateral arm (area with reduced sun exposure) (nonTSC-NS) and (b) a panel of 10 biopsies from newborn foreskin (nonTSC-foreskin). In contrast to the sun-exposed nonTSC-NS, these samples had an average of 0.7 and 0.5 TSC2 variants, respectively, with a very low VAF (median 0.09%) (Supplemental Figure 9 and Supplemental Table 2.2). As expected, none of the nonTSC-foreskin DNA samples showed a CC:GG>TT:AA mutation, and just a single such mutation was seen in the sun-protected nonTSC-NS.

UV-induced-mutation signatures in TSC-FAF and normal skin. We combined all somatic TSC2 and TP53 mutations from all samples in the respective TSC-FAF and nonTSC-BCCadj NS sets, to enable comparison with canonical mutation signatures from the Catalogue of Somatic Mutations in Cancer (COSMIC) (refs. 40–42 and Figure 8, A–C). The SNV signatures for the TSC-FAF and nonTSC-BCCadj NS sets showed the highest cosine similarity to UV-induced-mutation signature SBS7b, which has a predominance of C>T substitutions, with cosine similarity scores 0.65 and 0.83, respectively (Figure 8A and ref. 40). However, there were larger numbers of C>T substitutions in the CCG and GCG contexts in TSC-FAF, and in the CCG context in nonTSC-BCCadj NS (Figure 8A). A modest enrichment for G:C>T:A substitutions in variable sequence contexts was also noted in the TSC-FAF set, less so for nonTSC-BCCadj NS. This may be due to ROS generated by sunlight, as reported previously (Figure 8A and ref. 15).

Figure 8 Mutation signatures in TSC-FAFs and nonTSC-BCCadj normal skin samples. (A) Comparison of the reference SNV COSMIC UV-induced-mutation signature SBS7b (indicated in black) and the SNV mutation signature identified using MHPA (colors of the bars correspond to the colors of different single-nucleotide substitutions indicated above the plot). The signatures are summarized separately for TSC-FAFs (top plot) and nonTSC-BCCadj normal skin (bottom plot). (B) Comparison of the reference DNV COSMIC UV-induced-mutation signature DBS-1 (indicated in black) and the DNV mutation signature identified using MHPA (indicated in red) in TSC-FAFs (top plot) and nonTSC-BCCadj normal skin (bottom plot). (C) Comparison of the reference indel COSMIC UV signature ID13 (indicated in black), the indel UV signature by Saini et al. (42) (indicated in gray), and the indel signature identified using MHPA (colors of the bars correspond to colors of different single-nucleotide substitutions indicated above the plot). The indel signature summarizes combined MHPA results for TSC-FAFs and nonTSC-BCCadj normal skin.

DNV signatures for both TSC-FAF and nonTSC-BCCadj NS each matched very well with the canonical COSMIC DBS-1 UV signature, with identical cosine similarity scores of 0.999 (40), with CC:GG>TT:AA substitutions accounting for 85.8%, 93.5%, and 93.9% of all DNV substitutions in the DBS-1, TSC-FAF, and nonTSC-BCCadj NS, respectively (Figure 8B). For each C:G>T:A SNV and CC:GG>TT:AA DNV, more C than G (TSC-FAF: P = 0.001, binomial distribution test) and more CC than GG (TSC-FAF: P = 0.009, binomial distribution test) were mutated, consistent with an untranscribed strand bias resulting from transcription-coupled nucleotide excision repair. A strand bias for G:C>T:A mutations was also noted, with more G than C mutated (TSC-FAF: P = 0.002, binomial distribution test), as reported previously for UV-exposed eyelid epidermis (Figure 9 and ref. 15).

Figure 9 Orientation of SNVs and DNVs relative to transcription. Total number of TSC2 and TP53 mutations in the coding (untranscribed) versus the noncoding (transcribed) strand for all detected SNVs (top row) and DNVs (bottom row). The labels for different subgroups of the analyzed samples (TSC-FAF, TSC-NS, and nonTSC-BCCadj NS) are given next to the y axis. P values are indicated for untranscribed/transcribed strand bias, for C>A/G>T, C>T/G>A, and CC>TT/GG>AA (binomial distribution test), above each of the compared pairs. **P < 0.01; ***P < 0.001.

The indel signature observed for all indels identified in TSC-FAF and nonTSC-BCCadj NS skin samples combined (n = 38 indels) was different from the reference COSMIC UV-related ID13 signature, with a cosine similarity score of 0.24 (Figure 8C). It was more similar to the recently reported indel signature identified from whole-genome MPS of single-cell-derived clonal lineages from primary normal skin cells (42), with a cosine similarity score of 0.46. Similar to the findings of Saini et al. (42), most single-nucleotide deletions, 23 of 38 (61%), occurred in a 2- to 5-nt poly-C:T/G:A homopolymer tract. Deletions with microhomology at the breakpoints were also identified, accounting for 9 of 38 indels (24%).

A recurrent complex mutation type in TP53 only. Interestingly, 18 complex adjacent indel-SNV/DNV mutations in TP53 were identified, likely generated by UV radiation, with 12 in nonTSC-BCCadj NS, 5 in TSC-FAF, and 1 in TSC-NS (Figure 10 and Supplemental Figure 10). These adjacent indel-SNV/DNV mutations were observed in TP53 only (TP53: 18 of 455 somatic mutations in skin; TSC2: 0 of 210 somatic mutations in skin; P = 0.001, Fisher’s exact test), which suggests that these mutations occur in keratinocytes only and may be due to effects of UVB radiation, which does not reach the dermis (43, 44). Eight of 18 (44%) indel-SNV/DNV mutations occurred within a tract of at least 3 adjacent Cs or Gs (Supplemental Figure 10). Thirteen of 18 (72%) indel-SNV/DNVs contained either C:G>T:A or CC:GG>TT:AA UV-related substitutions; for 9 of these 13 (69%) the UV-related SNV/DNV was upstream of the deletion, while for 4 it was downstream of the deletion (Figure 10).

Figure 10 A complex mutation type in TP53 only. Diagram of the adjacent indels and SNVs/DNVs, occurring in cis. Column width is 1 nt. UV-related SNVs/DNVs are aligned according to the primary UV-related cyclobutyl pyrimidine dimers reflected by C>T/CC>TT DNA damage.

Somatic mutation prevalence and clinical characteristics of skin biopsies and their donors. We examined the possibility that the prevalence of somatic TSC2/TP53 mutations in skin biopsies might be associated with different clinical characteristics, including FAF grade (Facial Angiofibroma Severity Index score; ref. 45), age at biopsy, pigmentation, latitude at which donors had lived for most of their lives, and degree of sun exposure. We did not find a strong correlation between most of these factors and number of somatic mutations (Supplemental Figure 11), likely owing to complex interactions among these factors in determining mutation occurrence. However, there was a trend toward fewer somatic mutations in either TSC2 (P = 0.03, Kruskal-Wallis test) or TP53 in skin types with higher amounts of pigment (Supplemental Figure 11D). There was also a significant association between the number of either TSC2 (P < 0.0001, Kruskal-Wallis test) or TP53 (P = 0.02, Kruskal-Wallis test) somatic mutations and degree of sun exposure at the specific sites of different skin biopsies (Supplemental Figure 11E).