Using the transcriptome to annotate the genome

Saurabh Saha1, 2, 5, Andrew B. Sparks1, 3, 5, Carlo Rago1, Viatcheslav Akmaev4, Clarence J. Wang4, Bert Vogelstein1, Kenneth W. Kinzler1& Victor E. Velculescu1

1Howard Hughes Medical Institute and the Sidney Kimmel Comprehensive Cancer Center, Baltimore, MD 41231.
2Program in Cellular and Molecular Medicine, Johns Hopkins Medical Institutions, Baltimore, MD 21231.
3Current address: GMP Genetics, 200 Prospect Street, Waltham, MA 02451.
5Genzyme Molecular Oncology, P.O. Box 9322, Framingham, MA 01701. 5. These authors contributed equally to this work.

Nature Biotechnology May 2002 Volume 20 Number 5 pp 508 - 512

LongSAGE protocol

A remaining challenge for the human genome project involves the identification and annotation of expressed genes. The public and private sequencing efforts have identified ~15,000 sequences that meet stringent criteria for genes, such as correspondence with known genes from humans or other species, and have made another ~10,000–20,000 gene predictions of lower confidence, supported by various types of in silico evidence, including homology studies, domain searches, and ab initio gene predictions1, 2. These computational methods have limitations, both because they are unable to identify a significant fraction of genes and exons and because they are unable to provide definitive evidence about whether a hypothetical gene is actually expressed3, 4. As the in silico approaches identified a smaller number of genes than anticipated5-9, we wondered whether high-throughput experimental analyses could be used to provide evidence for the expression of hypothetical genes and to reveal previously undiscovered genes. We describe here the development of such a method—called long serial analysis of gene expression (LongSAGE), an adaption of the original SAGE approach10 —that can be used to rapidly identify novel genes and exons.


Copyright © 2003 Sagenet. All Rights Reserved.
Site design Academic Web Pages