Using the transcriptome to annotate the genome

S Saha, AB Sparks, C Rago, V Akmaev, CJ Wang… - Nature …, 2002 - nature.com
S Saha, AB Sparks, C Rago, V Akmaev, CJ Wang, B Vogelstein, KW Kinzler, VE Velculescu
Nature biotechnology, 2002nature.com
A remaining challenge for the human genome project involves the identification and
annotation of expressed genes. The public and private sequencing efforts have identified∼
15,000 sequences that meet stringent criteria for genes, such as correspondence with
known genes from humans or other species, and have made another∼ 10,000–20,000
gene predictions of lower confidence, supported by various types of in silico evidence,
including homology studies, domain searches, and ab initio gene predictions,. These …
Abstract
A remaining challenge for the human genome project involves the identification and annotation of expressed genes. The public and private sequencing efforts have identified ∼15,000 sequences that meet stringent criteria for genes, such as correspondence with known genes from humans or other species, and have made another ∼10,000–20,000 gene predictions of lower confidence, supported by various types of in silico evidence, including homology studies, domain searches, and ab initio gene predictions,. These computational methods have limitations, both because they are unable to identify a significant fraction of genes and exons and because they are unable to provide definitive evidence about whether a hypothetical gene is actually expressed,. As the in silico approaches identified a smaller number of genes than anticipated,,,,, we wondered whether high-throughput experimental analyses could be used to provide evidence for the expression of hypothetical genes and to reveal previously undiscovered genes. We describe here the development of such a method—called long serial analysis of gene expression (LongSAGE), an adaption of the original SAGE approach—that can be used to rapidly identify novel genes and exons.
nature.com