| Literature DB >> 18586738 |
Ashley Vaughan1, Sum-Ying Chiu, Gowthaman Ramasamy, Ling Li, Malcolm J Gardner, Alice S Tarun, Stefan H I Kappe, Xinxia Peng.
Abstract
MOTIVATION: The sequencing of the Plasmodium yoelii genome, a model rodent malaria parasite, has greatly facilitated research for the development of new drug and vaccine candidates against malaria. Unfortunately, only preliminary gene models were annotated on the partially sequenced genome, mostly by in silico gene prediction, and there has been no major improvement of the annotation since 2002.Entities:
Mesh:
Year: 2008 PMID: 18586738 PMCID: PMC2718618 DOI: 10.1093/bioinformatics/btn140
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Summary characteristics of re-annotated orthologous P. yoelii genes
| Category | Re-annotated genes |
|---|---|
| Total number | 510 |
| Span two or more Contigs | 109 |
| AAs (Mean, Min, Max) | 310, 19, 4783 |
| Transmembrane domain | 116 |
| Signal peptide | 80 |
| EST hit | 452 |
| Pfam domain | 157 |
Summary of the enrichment and pre-processing of Plasmodium EST data from Genbank
| Organism | 10/2002 | 8/2007 | Cleaned | Contigs (No. of ESTs) | Singlets | Total | Year of Coverage | Publication | No. of full-length genes |
|---|---|---|---|---|---|---|---|---|---|
| All four | 39 848 | 138 827 | 121 308 | 15 605 (93 558) | 27 750 | 43 355 | |||
| 15 562 | 18 932 | 18 319 | 2401 (12 293) | 6026 | 8427 | 2002 | 5× | 5878 | |
| 5544 | 58 955 | 43 665 | 5379 (32 528) | 11 137 | 16 516 | 2005 | 4× | 5864 | |
| 18 742 | 38 702 | 37 959 | 5118 (30 875) | 7084 | 12 202 | 2002 | 14.5× | 5268 | |
| 0 | 22 238 | 21 365 | 2707 (17 862) | 3503 | 6210 | 2005 | 4× | 5698 |
Overview of the available EST evidence for 7861 currently annotated and 510 re-annotated P. yoelii CDS
| Shorter than | Complete | Partial | Re-annotated | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Total | w EST | Percentage | Total | w EST | Percentage | Total | w EST | Percentage | |
| Total | 5904 | 4766 | 81 | 1957 | 1095 | 56 | 510 | 452 | 89 |
| 200 aa | 1826 | 962 | 53 | 946 | 237 | 25 | 248 | 220 | 89 |
| 180 aa | 1676 | 832 | 50 | 910 | 214 | 24 | 213 | 189 | 89 |
| 160 aa | 1509 | 690 | 46 | 874 | 182 | 21 | 182 | 159 | 87 |
| 140 aa | 1355 | 558 | 41 | 844 | 161 | 19 | 149 | 128 | 86 |
| 120 aa | 1180 | 412 | 35 | 797 | 126 | 16 | 107 | 92 | 86 |
| 100 aa | 1015 | 286 | 28 | 760 | 99 | 13 | 74 | 64 | 86 |
| 80 aa | 861 | 180 | 21 | 706 | 66 | 9 | 42 | 35 | 83 |
Summary of mapping P. yoelii ESTs to annotated P. yoelii CDSs
| cDNA library | Total | Cleaned | Mapped | Unmapped (%) |
|---|---|---|---|---|
| Asexual blood stages | 12 471 | 12 043 | 10 474 | 1569 (13) |
| Salivary gland sporozoite | 3091 | 3072 | 2005 | 1067 (34.7) |
| Axenic early liver stages (<24 h; trophozoite) | 1452 | 1387 | 681 | 706 ( |
| Liver stage (40 h; schizont) | 1916 | 1815 | 845 | 970 ( |
Highlighted in bold are the percentage of ESTs from liver stage libraries which were not mapped to any annotated P. yoelii CDSs.
Analysis of peptides identified in liver stage by proteomics using different databases
| Summary | Peptides mapped to | Peptides mapped to | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Total | Not in Py (%) | PyCDS | PyGdna | PyEST | PlasEST | PlasESTLS | PyLCM | PyAxenic | PbLS31 | |
| PbCDS | 2644 | 87 (3.3) | 2308 (87.3) | 2331 (88.2) | 1666(63) | 2249 (85.1) | 952 | 240 | 156 | 862 |
| 6f | 3512 | 244 (6.9) | 2813 (80.1) | 3006 (85.6) | 2002(57) | 2763 (78.7) | 993 | 284 | 199 | 805 |
| PyCDS | 3459 | NA | 3459 (NA) | 3331 (96.3) | 2191 (63.3) | 2687 (77.7) | 1005 | 300 | 196 | 810 |
| Combined | 4234 | 309 (7.3) | 3547 (83.8) | 3717 (87.8) | 2453 (57.9) | 3264 (77.1) | 1190 | 355 | 238 | 963 |
aHighlighted in bold and italics are the number (and the percentage) of peptides identified which can be mapped to ESTs from the indicated liver stage library, but not to any annotated P. yoelii protein sequence.
bThe number (and the percentage) of peptides identified in each database search which cannot be mapped to any P. yoelii sequence in any of PyCDS, PyGdna and PyEST.
cThe total number of peptides (non-redundant) identified from all three database searches.
PyCDS: annotated P. yoelii protein sequences. 6f: protein sequences from translation of P. yoelii genomic sequences and all Plasmodium EST sequences in six-frames. PbCDS: annotated P. berghei protein sequences. PyGdna: P. yoelii genomic sequences. PyEST: P. yoelii EST sequences. PlasEST: all Plasmodium EST sequences. PyLCM: P. yoelii EST sequences from Sacci et al. (2005). PyAxenic: P. yoelii EST sequences from Wang et al. (2004). PbLS31: P. berghei EST sequences from (Ishino et al. GenBank accession DC195411-DC201252). PlasESTLS: liver EST sequences from three liver stage libraries: PyLCM, PyAxenic and PbLS31. NA: not applicable. See text for details.
Fig. 1.Venn diagram of peptides identified using different sequence databases. (A) Total number of peptides identified in each database search. (B) The subset of peptides identified which can be mapped to a P. yoelii sequence in any of PyCDS, PyGdna and PyEST. PyCDS, 6f, PbCDS, PyGdna and PyEST: see Table 4 legend.
Fig. 2.The strategy for the identification of orthologous P. yoelii genes potentially absent from the current annotation by comparative analysis. Orthologous genes from four Plasmodium species were identified based on the reciprocal best BLAST hit approach, and classified into six categories. Py: P. yoelii. Pb: P. berghei. Pf: P. falciparum. Pc: P.chabaudi. Solid lines with two arrows indicate proteins from two corresponding species are reciprocal best blast hits. The number on the left indicates the number of orthologous groups in each category. The asterisk indicates the orthologous groups from which potential P. yoelii genes orthologous to P. falciparum genes were identified (also highlighted in bold).
PCR verification of newly annotated genes
| Py Gene ID | Forward primer | Reverse primer | Genom | No. of intron | cDNA |
|---|---|---|---|---|---|
| PY_PFL0415w | CACAATCGTTATGCGAAAATG | TCCCATGCTCTATTATCTTTGG | 356 | 0 | 356 |
| PY_PF14_0205 | AAAAACCTTCATTTTATTTTATCTCCA | TTTTTGAGAGTGACTTTGAATGC | 651 | 1 | 317a |
| PY_PF14_0623 | GCGAACTTAACGGGATCTCA | GCACACCGATCCTTTCTCTT | 1388 | 4 | 756 |
| PY_PF14_0612 | CGGCGGGCTTATATTAAAAA | AGCAGCTCGTAATGCATCCT | 809 | 3 | 310a |
| PY_PFD0775c | CCCCCAAAGATTTGTCTGAA | TGCTCCTAAAACATTTCCCATA | 1775 | 4 | 1415a |
| PY_PFB0885w | TGGGTAAGTTTAAAGCGATTTTT | ACTTTTGAGTTAGGCCCTTTTT | 516 | 1 | 175a |
| 1.PY_PFB0505cb | TGGTCATTCTTATCCTTCACATGAA | ACATTCCAGCCCCAAAACC | ∼1500 | ∼5 | ∼900a |
| 2.PY_PFB0505cb | GATAATTTAGATGCCCCAAACCAA | ACATTCCAGCCCCAAAACC | 645 | 2 | 350 |
| 3.PY_PFB0505cb | AAACACATCAGCAGCTTCAATACC | ACATTCCAGCCCCAAAACC | 252 | 1 | 92a |
aMultiple PCR products observed from cDNA suggesting alternative splicing.
bThree different forward primers were used to verify the intron positions and confirm that the gene spanned two contigs (see text).
Genom, expected size in base pair from genomic DNA; cDNA, expected size in base pair from cDNA.
Fig. 3.PCR products confirming the expression of newly annotated genes. Oligonucleotide primers flanking selected predicted genes were amplified from genomic DNA (gDNA, G) and reverse-transcribed mRNA of mixed asexual stages (cDNA, C).