| Literature DB >> 17884916 |
Laurent Bianchetti1, Yan Wu, Eric Guerin, Frédéric Plewniak, Olivier Poch.
Abstract
SAGE (Serial Analysis of Gene Expression) experiments generate short nucleotide sequences called 'tags' which are assumed to map unambiguously to their original transcripts (1 tag to 1 transcript mapping). Nevertheless, many tags are generated that do not map to any transcript or map to multiple transcripts. Current bioinformatics resources, such as SAGEmap and TAGmapper, have focused on reducing the number of unmapped tags. Here, we describe SAGETTARIUS, a new high-throughput program that performs successive precise Nla3 and Sau3A tag to transcript mapping, based on specifically designed Virtual Tag (VT) libraries. First, SAGETTARIUS decreases the number of tags mapped to multiple transcripts. Among the various mapping resources compared, SAGETTARIUS performed the best in this respect by decreasing up to 11% the number of multiply mapped tags. Second, SAGETTARIUS allows the establishment of a guideline for SAGE experiment sequencing efforts through efficient mapping of the CRT (Cytoplasmic Ribosomal protein Transcripts)-specific tags. Using all publicly available human and mouse Nla3 SAGE experiments, we show that sequencing 100,000 tags is sufficient to map almost all CRT-specific tags and that four sequencing stages can be identified when carrying out a human or mouse SAGE project. SAGETTARIUS is web interfaced and freely accessible to academic users.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17884916 PMCID: PMC2094080 DOI: 10.1093/nar/gkm648
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Generation of SAGETTARIUS database information. The final results of the procedure are VT to transcript associations. Bold: procedure steps.
Figure 2.SAGETTARIUS progressive and reductive ET to transcript mapping process.
Characteristics of Nla3 and Sau3 VT sequences associated with human and mouse CRT. t.v.: splicing Transcript Variant. Asterisk symbol: no CRT presents this characteristic.
| Total number of genes encoding ribosomal proteins | 80 | 79 | ||
| Anchoring enzyme | Nla3 | Sau3A | Nla3 | Sau3A |
| Detectable CRT (splicing variants excluded) | 76 | 74 | 76 | 74 |
| AAAAAAAAAA tag | L4, L13 | * | * | * |
| Undetectable CRT | S21, L7A | S10, S12, S21, S25 L35A, L36 | S21, L6, L35A | S21, S25, L31 L37, L37A |
| Distinguishable CRT splicing variants | S29 t.v. 1 & 2 | S24 t.v. 1 & 2 S29 t.v. 1 & 2 | * | * |
| Indistinguishable CRT splicing variants (Variants 1 and 2) (Variants 1, 2 and 3) | S15A, S24, L3, L6, L8, L9, L14, L17, L34, L38,UBA52, L41, P0 L32 | S15A, L3, L6, L8, L9, L14, L17, L34, L38, UBA52, L41, P0, SA S14, L32 | * | * |
| VT derives from an IRE integrated in the CRT | * | L32 displays the same VT as the ERO1-like mRNA (F081886) | * | * |
VT to cDNA, HTC and EST associations in the SAGETTARIUS database. This assessment has been established on the database of VT to transcript associations built from Genbank release 156. H.s.: Homo sapiens, M.m.: Mus musculus. tr. seq.: transcript sequence. n.d.: not determined.
| cDNA | HTC | EST | ||||
|---|---|---|---|---|---|---|
| Genbank tr. seq. records | 31 331 | 15 153 | 75 275 | 169 332 | 6 771 069 | 4 059 938 |
| Genbank 3′ poly-A tr. seq. records | 6206 | 2955 | 18 099 | 15 703 | 693 858 | 196 048 |
| VT to tr. seq. associations (Nla3) | 6155 | 2936 | 13 537 | 12 541 | 135 793 | 60 535 |
| VT to tr. seq. associations (Sau3A) | 6025 | 2898 | 13 099 | 12 190 | 105 297 | 43 985 |
| tr. seq. without Nla3 site | 51 | 19 | 101 | 86 | n.d. | n.d. |
| tr. seq. without Sau3A site | 181 | 57 | 485 | 325 | n.d. | n.d |
| tr. seq. without Nla3/Sau3 site | 4 | 1 | 15 | 10 | n.d. | n.d |
Unique VT sequences associated with a single vs multiple transcripts in the SAGETTARIUS database (Nla3). H.s: Homo sapiens, M.m: Mus musculus. SAGE: 10 nt VT, LongSAGE: 17 nt VT. tr.: transcript
| VT to transcript (cDNA, HTC, EST) associations | ||||
|---|---|---|---|---|
| SAGE | LongSAGE | SAGE | LongSAGE | |
| unique VT seq. | 120 234 | 149 853 | 65 497 | 72 800 |
| unique VT seq. associated to a single tr. | 101 117 | 141 526 | 59 440 | 69 781 |
| unique VT seq. associated to multiple tr. | 19 117 | 8327 | 6057 | 3019 |
Figure 3.Progressive detection of CRT-specific tags in (a) 371 human Nla3 SAGE experiments with the number of sequenced ET ranging from 1430 to 308 589 and (b) 123 mouse Nla3 SAGE experiments with the number of sequenced ET ranging from 464 to 194 345. In both human and mouse, SAGE experiments can be divided into four major sequencing stages (- -) based on the detection of CRT-specific tags.
Probabilities (P) of individual CRT-specific tag detections correlated with the number of sequenced ET in human and mouse SAGE experiments. Group A contains 63 human CRT, namely S2, S3, S3A, S4X, S5, S6, S7, S8, S9, S10, S11, S12, S13, S14, S15, S15A, S16, S17, S18, S19, S20, S25, S26, S27, S27A, S28, S29, S30, L3, L5, L6, L7, L8, L9, L10A, L11, L12, L13A, L15, L17, L18, L18A, L19, L21, L23, L24, L26, L27, L27A, L29, L30, L31, L35, L35A, L36, L36A, L37A, L38, L39, L41, P0, P1 and P2. Group B contains 62 mouse CRT, namely SA, S2, S3, S3A, S4, S5, S6, S7, S9, S10, S11, S12, S13, S14, S15, S18, S19, S20, S23, S24, S26, S27, S27A, S28, S29, FBR-MuSV, L3, L4, L5, L7A, L8, L9, L10A, L11, L12, L13, L13A, L15, L17, L18, L18A, L19, L21, L23, L23A, L24, L27, L30, L31, L32, L34, L35, L36A, L37, L37A, L38, L39, UBA52, L41, P0, P1 and P2. Bold: invariably low detectable CRT
| Sequenced ET | ||||||
|---|---|---|---|---|---|---|
| 10 000 ±10% | Group A | S24, L14 | SA, S4Y, S23, | Group B, L26 | S8, S16, S17, L7, L14, L27A, L29 | S15A, S25, |
| 50 000 ±10% | Group A, S24, L14 | S4Y, S23, L22, L23A | SA, | Group B, S8, S16, S17, L7, L14, L27A, L29 | S15A, L26, | S25, |
| 100,000 ±10% | Group A, S23, S24, L14, L22, L23A, UBA52 | SA, S4Y, L32, L34 | Group B, S8, S16, S17, L7, L14, L22, L27A, L29 | S15A | S25, | |
Figure 4.Number of unmapped, multiply mapped and single-mapped ET sequences from the GSM14740 SAGE experiment (40 027 unique ET sequences). ET mappings have been carried out by SAGETTARIUS, TAGmapper, SAGEmap-reliable and SAGEmap-full resources.