| Literature DB >> 17062153 |
Foo Cheung1, Brian J Haas, Susanne M D Goldberg, Gregory D May, Yongli Xiao, Christopher D Town.
Abstract
BACKGROUND: In this study, we addressed whether a single 454 Life Science GS20 sequencing run provides new gene discovery from a normalized cDNA library, and whether the short reads produced via this technology are of value in gene structure annotation.Entities:
Mesh:
Year: 2006 PMID: 17062153 PMCID: PMC1635983 DOI: 10.1186/1471-2164-7-272
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Size distribution of the full length Medicago ESTs from convention sequencing using SMART technology
| Protein length (amino acids) | 0–50 | 51–100 | 101–150 | 151–200 | 201– 250 | 251– 300 | 301–350 | 351–400 | 401– 451 | 400– 501 | >501 |
| No of unique sequences | 0 | 45 | 188 | 298 | 320 | 241 | 257 | 194 | 124 | 88 | 161 |
Sequence length distribution before and after assembly of sequence reads from a single 454 run of a normalized cDNA library.
| 161,161 | 113,139 | |
| 91,203 | 67,715 | |
| 19 | 3,194 | |
| 1 | 448 | |
| 85 | ||
| 16 | ||
| 2 | ||
| 252,384 | 184,599 | |
Size distribution of 454 assemblies
| 150,734 | |
| 20,118 | |
| 11,209 | |
| 2,122 | |
| 369 | |
| 33 | |
| 12 | |
| 2 | |
Mapping 454 derived ESTs against the top five most abundant TCs in the MtGI
| TC106570 | 0.53 | 0.02 | RuBisCO small subunit |
| TC106485 | 0.47 | 0.01 | Elongation factor-1 |
| TC106598 | 0.34 | 0.04 | Methionine synthase |
| TC100296 | 0.34 | 0.01 | Cytochrome |
| TC100133 | 0.33 | 0.01 | Chlorophyll a/b binding protein |
Figure 1Representation of genome ontology assignments for M. truncatula ESTs derived from 454 sequencing and the MtGI.
Sequence length distribution before and after assembly of 23 Mb of randomly selected ESTs.
| 12 | 6 | |
| 1,717 | 714 | |
| 2,475 | 829 | |
| 4,802 | 1,458 | |
| 8,130 | 2,373 | |
| 26,742 | 11,518 | |
| 107 | 1,450 | |
| 41 | ||
| 43,985 | 18,839 | |
Size distribution of assemblies produced from 23 Mb of conventional ESTs
| 12,057 | |
| 2,785 | |
| 2,260 | |
| 780 | |
| 321 | |
| 83 | |
| 56 | |
| 38 | |
| 9 | |
Figure 2Matches of 53,796 unique sequences without a MtGI hit to other TIGR Plant Gene Indices.
Figure 3Gene discovery as a function of EST sequencing coverage.
Figure 4Examples of annotation updates and alignment assemblies. The protein coding segments of gene structures (top track) are shown in red and UTRs are shown in black. Alignment of 454 sequences are shown in black below the gene models. Boundaries consistent with the original gene structure annotation are highlighted in blue.
Statistics of PASA alignments of M. truncatula 454 cDNA reads on finished Medicago BACs
| Total 454 reads (mapped to genome using blat) | 70,026 |
| Valid Blat alignments | 40,537 |
| Valid Sim4 alignments | 3,643 |
| Total Valid alignments | 44,180 |
| Number of assemblies | 16,183 |
Gene structure updates generated by PASA alignments of 454 cDNA reads on finished M. truncatula BACs
| EST assembly extends UTRs | 1,061 |
| EST assembly alters protein sequence, passes validation | 278 |
| EST assembly found capable of merging multiple genes | 20 |
| EST assembly stitched into gene model requires alternative splicing isoform | 39 |