| Literature DB >> 19602295 |
Hiroyuki Wakaguri1, Yutaka Suzuki, Masahide Sasaki, Sumio Sugano, Junichi Watanabe.
Abstract
BACKGROUND: Apicomplexan parasites are causative agents of various diseases including malaria and have been targets of extensive genomic sequencing. We generated 5'-EST collections for six apicomplexa parasites using our full-length oligo-capping cDNA library method. To improve upon the current genome annotations, as well as to validate the importance for physical cDNA clone resources, we generated a large-scale collection of full-length cDNAs for several apicomplexa parasites.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19602295 PMCID: PMC2722674 DOI: 10.1186/1471-2164-10-312
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Example of the evaluation of and merging of a 5'-end cDNA sequence with its annotated gene model. (a) Genomic regions that were covered by both an annotated gene model and a cDNA were used for evaluation purposes (yellow boxes indicate exons, gray lines indicate introns). Inconsistency is illustrated here at the base level (1) and at the exon level (2). Blue dashed boxes and red dashed boxes represent consistent and inconsistent parts, respectively. Results of the evaluation are shown on the left. Because of the inconsistencies shown in (1) and (2), this annotated gene model was categorized as inconsistent at the transcript level in (3). (b) Example of a cDNA that corresponds to two annotated genes. The 5'-EST of the oligo-capped cDNA (XPFm1517; first line) and annotated gene models (PF11_0401 and PF11_0100; second line) are shown. XPFm1517 represents the three exons at the 5'-end, with an undetermined 3'-end.
Discrepancies between oligo-capped cDNAs and annotated gene models
| Species | Nucleotide level (%) | Exon level Discrepant No./Total No. (%) | Gene level Discrepant No./Total No. (%) |
| Pf | 2.6% | 175/2,075 (8%) | 133/1,543 (9%) |
| Pv | 3.9% | 320/2,371 (13%) | 258/1,457 (18%) |
| Py | 7.5% | 302/1,939 (16%) | 233/1,340 (17%) |
| Pb | 3.0% | 94/377 (25%) | 53/254 (21%) |
| Cp | 1.1% | 33/669 (5%) | 32/658 (5%) |
| Tg | 7.0% | 245/1,556 (16%) | 191/780 (24%) |
| Average | 4.2% | 14% | 16% |
For further detailed information see the text.
Characteristic features of the 5'-UTRs
| Species | Frequency of genes containing intron(s) in the 5'-UTRs (%) | Average 5'-UTR length (bp) | Standard deviation of 5'-UTR length (bp) |
| Pf | 22% | 303 | 155 |
| Pv | 20% | 304 | 199 |
| Py | 28% | 345 | 174 |
| Pb | 22% | 299 | 166 |
| Cp | 3% | 137 | 116 |
| Tg | 13% | 288 | 172 |
| Average | 18% | 279 | 164 |
See Methods for further details.
Characteristic features of the TSSs
| Species | Average number of cDNA members per cluster | Average number of TSS positions per cluster | Average of TSS standard deviation (bp) |
| Pf | 7 | 5 | 80 |
| Pv | 7 | 4 | 61 |
| Py | 9 | 6 | 61 |
| Pb | 3 | 2 | 27 |
| Cp | 22 | 5 | 16 |
| Tg | 11 | 4 | 34 |
4th column: average standard deviation of fluctuating positions of TSSs for each cluster. For further information see Methods.
Figure 2Length distribution of the deduced amino acids. Length distribution of the deduced amino acids in Toxoplasma full-length cDNAs. Length distribution of the deduced amino acids in annotated genes corresponding to full-length cDNAs. Length distribution of the inconsistent part of the amino acid sequences. Negative values on the horizontal axis indicate that the amino acid sequences were shorter in the cDNAs. Data for which there was no difference in amino acid length is not shown.
Mismatch types in Toxoplasma
| Complete match | Type i | Type ii | Type iii | Type iv | |
| Number of cDNAs | 343 | 112 | 47 | 14 | 70 |
| CDS (cDNA) average length (bp) | 780 | 772 | 693 | 951 | 452 |
| CDS (Gene model) average length (bp) | 780 | 1,559 | 1,097 | 980 | 1,531 |
| Average number of cDNA cluster members | 9 | 3 | 13 | 6 | 5 |
For details about the categorization of types i–iv see text.
Figure 3Common patterns of inconsistencies in the CDS of the completely determined Toxoplasma full-length cDNAs. Typical examples for categories i–iv as described in the text are shown. Yellow or purple boxes indicate the CDS.