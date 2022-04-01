We have discussed types of genomic changes missed with NGS and have provided case examples in which other approaches were required to make a diagnosis (Figure 1 and Table 1). Further advancements in NGS technologies will continue to improve diagnostic utility. For example, the ACMG released a practice guideline in 2021 recommending that ES/GS be considered a first- or second-tier test in patients with congenital anomalies, developmental delay, or intellectual disability based on clinical utility for providers and families (35). Prior to this guideline, there was variability in the diagnostic approach for these patients; first-tier testing could include CMA, fragile X testing, and biochemical studies. Single-gene or gene-panel testing could be incorporated at any point in the diagnostic workup. The new guidelines are based on evidence demonstrating utility in performing ES or GS after CMA or focused testing. Even so, testing options are influenced by medical insurance, provider preferences, parental desires, and health care system policies and practices (35).

Short-read versus long-read sequencing. DNA sequencing technology has changed greatly in the more than 40 years since Sanger sequencing was developed. Over this time, sequencing has transitioned from gel electrophoresis–based approaches like Sanger sequencing to shotgun sequencing to the NGS approach that uses massively parallel sequencing (45). NGS has greatly decreased the cost and increased the efficiency of DNA sequencing. These technologies are primarily “sequence by synthesis,” where complementary nucleotides are added sequentially, causing nucleotide-specific fluorescence that is read by a camera and results in sequences that are 100 to 200 bp in length (46). While this approach has revolutionized the fields of genetics and genomics, short reads can be difficult to computationally align to the reference genome, which makes resolution of complex and repetitive regions of genomes difficult and severely limits detection of structural variants (Table 1, Figure 1, and refs. 47, 48). In contrast, evolving long-read sequencing approaches generally use alternative sequence by synthesis chemistry or measurement of changes in electrical current caused by the DNA molecule as it passes through a nanopore. These reads can be 10 kb to several Mb in length (46). Further advances to increase the accuracy and decrease the cost of long-read sequencing technologies could lead to a genetic testing approach that would allow for de novo genome assembly; identify sequence variation, copy number variation, REs, and structural variants; and provide more accurate phasing of variants (Table 1, Figure 1, and refs. 46, 49, 50–53). However, obstacles to realizing the full potential of long-read sequencing include that it remains relatively expensive (6- to 12-fold more expensive by some estimates) and does not have the base-calling accuracy of short-read sequencing technology (50, 53). Data storage and scalability are also issues; large genomes present the problem of storing large amounts of data as well as demonstrate decreasing efficiency of genome assembly as genome size increases. While long-read sequencing can identify epigenetic changes, these technologies have not yet realized that full potential (54). Short-read sequencing covers a majority of the known, disease-causing structural variants, and consequently it is thought that the addition of long-read sequencing technology, in its current state, is unlikely to substantially increase diagnostic yield (50). As the sequencing technology continues to improve, the cost decreases, and the bioinformatics pipelines become more accurate and efficient, long-read sequencing may eventually contribute to an increased diagnostic yield from genetic testing (46, 50, 53, 54).

Data sharing and cloud computing. Another opportunity to improve our understanding of genomic function and dysfunction comes from leveraging large genomic databases that are coupled with phenotypic data. Currently, ClinVar and gnomAD are two of these most-used databases. ClinVar has partnered with a collaborative program called Clinical Genome Reference (ClinGen) to improve the curation, sharing, and archiving of genomic variation data as well as their clinical interpretation or relevance. ClinGen curates data for ClinVar from other databases and structures data submissions into proper format and nomenclature. The program also developed a system to define the review level of submissions. To aid in reviewing submissions, ClinGen develops expert teams in various clinical realms to validate variant pathogenicity and gene-disease relationships (55).

An example of the power of using large databases comes from Brokamp et al., who reported a patient in whom they identified a de novo frameshift variant that had not been observed previously (56). There were no matches in the available matching tools (GeneMatcher, MyGene2, Matchmaker Exchange; refs. 57–59) or in ClinVar and gnomAD. However, by utilizing their in-house database of more than 3 million individuals’ electronic health records, many of which had accompanying genomic data (i.e., BioVU; ref. 60), they found two other individuals with de novo variants in the same gene and overlapping phenotypes. By identifying multiple, unrelated individuals with variants in the same gene and with very similar phenotypes, they were able to change the designation of the variant from one of uncertain significance to pathogenic and discover a new genetic disorder (56).

The case above illustrates the potential of leveraging large data to identify other exceptionally rare cases to make diagnoses. NGS has rapidly increased the amount of available genomic data, which presents both opportunities and difficulties associated with working with petabytes (1 petabyte = 1 million gigabytes) of data to solve cases. Unfortunately, the infrastructure necessary to utilize a data set of this size is prohibitive to most clinicians and independent laboratories. Cloud computing is a system in which resources are rented to mitigate the need to establish both the hardware and software necessary for data analysis of this magnitude (61). Addressing the data sharing protocols and patient privacy concerns that come with cloud computing will be necessary to be able to utilize these platforms to their full potential.

The All of Us Research Program is an example of using cloud computing to facilitate the application of genomics in health care. This program plans to provide a resource of genomic and phenotypic data of at least 1 million people, most of whom are from backgrounds underrepresented in biomedical research. The goal of the All of Us Research Program is to create a resource of health questionnaires, electronic health record data, physical measurements, and both digital data and biospecimens for a variety of applications including characterizing natural histories of diseases, identifying disease risk factors, and revealing new biomarkers. The design of the program should mitigate the small sample sizes and lack of diversity in many genomic data sets that limit medical discovery (62). While the intention is not directly for the diagnosis of rare and/or undiagnosed disease, study participants will have the option of learning about pharmacogenomic findings as well as actionable, highly penetrant, disease-causing variants (62).

Sequencing critically ill pediatrics patients. Much of this Review has discussed the application of genetic testing to improve the diagnostic rate in patients for whom there is a concern regarding an underlying genetic disorder. However, critically ill patients, who may not yet present the classic signs or symptoms of a rare and unfamiliar genetic disorder, present another opportunity for the application of NGS to detect an undiagnosed genetic disease. Many severe genetic conditions present in the neonatal period or in early childhood, but the onset of characteristic signs and symptoms is delayed because they are age dependent. Multiple studies of the utility of NGS in critically ill neonates have shown increased diagnostic rates, decreased costs associated with hospitalization, changes in management, and increased patient and family satisfaction. Studies that obtained NGS on critically ill pediatric patients with concerns regarding an underlying genetic disorder yielded diagnostic rates of 21% to 58% depending on patient selection, year the study was conducted, and NGS methodology (63–72). These studies defined clinical utility as changes in medical or surgical management, testing family members for related genotypes or phenotypes, informing recurrence risk, suggesting a potential pharmaceutical, and/or involving palliative care. They reported that, in 21%–83% of cases, NGS led to a change in management regardless of whether a diagnosis was made (63–68, 70–72). For those patients in whom there was a suspicion of a genetic disease, studies showed that either ES or GS yielded an increased diagnostic rate when compared with gene panel alone or standard genetic testing approaches: 58% with trio-based ES versus 12.5%–25% with gene panels (64), 57% with ES versus 13.75% with standard approaches (66), and 57% with GS versus 9% with standard approaches (70). The Newborn Sequencing in Genomic Medicine and Public Health randomized controlled trial 1 (NSIGHT1) was a program that tested the hypothesis that rapid GS “increased the proportion of [critically ill] infants receiving a genetic diagnosis within 28 days.” NSIGHT1 was terminated early because GS demonstrated an obvious clinical benefit compared with the standard approaches (73). Finally, one study showed that, in patients with a low suspicion of an underlying genetic disorder, NGS achieved a genetic diagnosis in 53% of their cases (71).