Using the transcriptome to annotate
Saurabh Saha1, 2, 5, Andrew B. Sparks1,
3, 5, Carlo Rago1, Viatcheslav Akmaev4, Clarence
J. Wang4, Bert Vogelstein1, Kenneth
W. Kinzler1& Victor E. Velculescu1
1Howard Hughes Medical Institute
and the Sidney Kimmel Comprehensive Cancer Center, Baltimore,
2Program in Cellular and Molecular Medicine, Johns
Hopkins Medical Institutions, Baltimore, MD 21231.
3Current address: GMP Genetics, 200 Prospect Street,
Waltham, MA 02451.
5Genzyme Molecular Oncology, P.O. Box 9322, Framingham,
MA 01701. 5. These authors contributed equally to this work.
Nature Biotechnology May 2002 Volume 20 Number 5 pp 508 -
A remaining challenge for the human genome project
involves the identification and annotation of expressed genes.
The public and private sequencing efforts have identified
~15,000 sequences that meet stringent criteria for genes,
such as correspondence with known genes from humans or other
species, and have made another ~10,000–20,000 gene predictions
of lower confidence, supported by various types of in silico
evidence, including homology studies, domain searches, and
ab initio gene predictions1, 2. These computational methods
have limitations, both because they are unable to identify
a significant fraction of genes and exons and because they
are unable to provide definitive evidence about whether a
hypothetical gene is actually expressed3, 4. As the in silico
approaches identified a smaller number of genes than anticipated5-9,
we wondered whether high-throughput experimental analyses
could be used to provide evidence for the expression of hypothetical
genes and to reveal previously undiscovered genes. We describe
here the development of such a method—called long serial
analysis of gene expression (LongSAGE), an adaption of the
original SAGE approach10 —that can be used to rapidly
identify novel genes and exons.