| Literature DB >> 17662120 |
Fangli Lu1, Hongying Jiang, Jinhui Ding, Jianbing Mu, Jesus G Valenzuela, José M C Ribeiro, Xin-zhuan Su.
Abstract
BACKGROUND: The completion of the Plasmodium falciparum genome represents a milestone in malaria research. The genome sequence allows for the development of genome-wide approaches such as microarray and proteomics that will greatly facilitate our understanding of the parasite biology and accelerate new drug and vaccine development. Designing and application of these genome-wide assays, however, requires accurate information on gene prediction and genome annotation. Unfortunately, the genes in the parasite genome databases were mostly identified using computer software that could make some erroneous predictions.Entities:
Mesh:
Year: 2007 PMID: 17662120 PMCID: PMC1978503 DOI: 10.1186/1471-2164-8-255
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Diagram of the 14 P. falciparum chromosomes showing positions of potentially expressed genes. Expressed sequence tags (EST) from our libraries or from public databases were assembled against predicted coding sequences in PlasmoDB; genes that matched our EST only (green), EST already in public databases (red), or both (yellow) are displayed according to gene order on the chromosomes. Those in white are CDS that were not covered by any EST. Approximately 70% of the 5485 predicted CDS were matched with one or more EST.
Predicted coding regions that were covered fully by cDNA and their mismatched introns
| Ch | No. genes | Mis intr | New intr | Lost intr | Size change | AS intr |
| 1 | 11 | 1 | 1 | 0 | 0 | 0 |
| 2 | 18 | 6 | 7 | 0 | 1 | 1 |
| 3 | 14 | 4 | 5 | 0 | 0 | 1 |
| 4 | 14 | 2 | 0 | 1 | 0 | 1 |
| 5 | 23 | 4 | 5 | 0 | 1 | 0 |
| 6 | 17 | 3 | 4 | 0 | 0 | 0 |
| 7 | 14 | 1 | 1 | 0 | 0 | 0 |
| 8 | 14 | 6 | 7 | 0 | 1 | 1 |
| 9 | 24 | 4 | 4 | 0 | 1 | 0 |
| 10 | 23 | 5 | 5 | 0 | 0 | 1 |
| 11 | 39 | 13 | 16 | 1 | 0 | 1 |
| 12 | 22 | 2 | 1 | 0 | 1 | 0 |
| 13 | 61 | 17 | 20 | 1 | 3 | 6 |
| 14 | 62 | 17 | 24 | 2 | 2 | 2 |
| 356 | 85 | 100 | 5 | 10 | 14 | |
Ch, chromosome; No genes, numbers of genes; Mis intr, numbers of genes with introns not matching those predicted; New intr, numbers of new introns found; Lost intr, numbers of introns that may not exist as predicted; Size changes, numbers of introns with sizes that do not match those predicted; AS intr, numbers of introns that may be alternatively spliced. Most known genes are housekeeping genes, consistent with expression profiles.
Genes having introns that do not match those predicted in public databases
| Ch | No Genes | New intr | Lost intr | Larger intr | Smaller intr | AS intr | Antisense |
| 1 | 7 | 4 | 3 | 1 | 3 | 0 | 0 |
| 2 | 14 | 11 | 3 | 7 | 2 | 4 | 1 |
| 3 | 10 | 1 | 3 | 2 | 2 | 2 | 0 |
| 4 | 18 | 2 | 12 | 8 | 4 | 2 | 2 |
| 5 | 17 | 5 | 3 | 8 | 6 | 2 | 1 |
| 6 | 18 | 4 | 9 | 10 | 4 | 0 | 0 |
| 7 | 13 | 6 | 6 | 6 | 3 | 2 | 1 |
| 8 | 4 | 1 | 1 | 1 | 0 | 0 | 0 |
| 9 | 3 | 1 | 0 | 3 | 0 | 0 | 0 |
| 10 | 41 | 18 | 18 | 12 | 12 | 5 | 1 |
| 11 | 46 | 46 | 10 | 17 | 7 | 3 | 2 |
| 12 | 37 | 6 | 13 | 10 | 7 | 5 | 4 |
| 13 | 36 | 3 | 29 | 10 | 8 | 5 | 0 |
| 14 | 41 | 44 | 12 | 19 | 10 | 9 | 0 |
| 305 | 152 | 122 | 114 | 68 | 39 | 12 | |
Ch, chromosome; No genes, numbers of genes; New intr, numbers of genes with new introns; Lost intr, numbers of genes with predicted introns, but not confirmed with cDNA; Larger intr, number of genes with introns larger than predicted; Smaller intr, number of genes with introns smaller than predicted; AS intr, introns potentially alternatively spliced. Antisense, numbers of antisense transcripts based on the presence of conserved intron splice sites (GT-AG) in antisense orientation.
PCR verification of selected introns that were alternatively spliced
| Gene ID | Ch | Forward primer | Reversed primer | Gen | Spl | Comments |
| PFB0260w | 2 | TCAAACACACGTTACACCT | ATGACAATACCTTCTAAGG | 242 | 102 | Confirmed |
| PFB0305c | 2 | ACCTTTTGTTAATTATGGA | CCACCTTCTCCTTTTTCG | 255 | 135 | Confirmed |
| PFB0177c | 2 | ACTAATGGTAGAATAGGTG | TTTCTCCATTTTGTATATCG | 373 | 161 | Confirmed |
| PFB0535w | 2 | CAAAGATAAAATGGTAATGTT | ATTCCTATTATAGTGTGTGT | 507 | 297/243 | No 243-bp band |
| PFC0371w | 3 | CCTACCTTCTATTTACAAAT | ACTTGTTGCTCTGATATAAT | 301 | 204 | No 204-bp band |
| PFD0810w | 4 | GCTGTGAAAAAAGAAAACAA | TTGTTTTCTTTTTTCACAGC | 320 | 174 | No products |
| PFD0895c | 4 | TTGATAACAATCCTTTAAGC | AATTCGTAATAATCATCTCC | 374 | 205 | Confirmed |
| PFE1540w | 5 | GATCCTGAAATTGTTTGTG | ATGGCCAAAATGTTTCACA | 393 | 328/283/210 | Confirmed |
| MAL8P1.81 | 8 | GCTGACATATTTATCTTATG | CATATAAGTATTCATGCATG | 303 | 147 | Confirmed |
| PF10_0096 | 10 | ATATTATCGATATTGTCTATATTC | CTTGCTTTGTTTGGCTTCCA | 441 | 182 | Confirmed/antisense |
| PF10_0170 | 10 | TATATTTGTCCTCAGTGC | CTTCCATATCAGATGCCA | 300 | 135 | 90-bp band, not 135 |
| PF10_0017 | 10 | GGATAAATAGTTTTTTGCTT | CTCAGACAATGTACGCATA | 410 | 263 | Confirmed |
| PF10_0117 | 10 | ATTGGAATTTAACTAGCAAC | TTCATAAGAGTGTTGTTCG | 330 | 134 | Confirmed |
| PF10_0213 | 10 | GGTGCGAATAATAAAGTAG | CTACTTTGTTATTATCTCC | 349 | 229 | Add'l 150-bp band |
| PF10_0247 | 10 | AATTACAAACAATTTGAGGG | TTCATTTTTCAAAAATGCGG | 383 | 152 | No 383-bp band |
| PF10_0258 | 10 | AAAGACGAGGAACTTAATAC | CTCTGATTCTTTTATGAAAG | 270 | 150 | Confirmed |
| PF10_0415 | 10 | CACCAATTTATAAAAGAAGAA | GGCAATAAAAAAGCCTGTTA | 370 | 183 | Confirmed |
| PF11_0292 | 11 | AAGATGACCAACAAGAAGAA | TTATAGTACTCAATAACCTG | 340 | 153 | Confirmed |
| PF11_0377 | 11 | CCGAAAAGGATAAGAAGAAG | TGATTATATGCTGCATATAC | 1425 | 168 | No 1425-bp band |
| PF11_0167 | 11 | TAAGAAATTATGTTCCCAAT | TTTTTCTCCTACACAAGTGC | 354 | 152 | Confirmed |
| PF11_0405 | 11 | TGAACTTAATACACATACGT | ACAGTATCTGAAGGATCTGT | 201 | 130 | No 130-bp band |
| PFL0020w | 12 | TTCGATATATCATTCCATTC | AAACAGCTACTAGTTGTCC | 261 | 78 | No 78-bp band |
| PFL0290w | 12 | CTTTATATTATCCAACAACAC | TTGTAATTACTTATAGGAGC | 454 | 167 | No 167-bp band |
| PFL0580w | 12 | GATGCAATATTAGGTAGACT | ACTAAAGATTAGGTTAACAC | 294 | 193 | Confirmed |
| PFL0890c | 12 | GAAATGCTCAACAAATTTGA | ACAGATATTATGGGAATTTC | 255 | 130 | Confirmed |
| MAL13P1.130 | 13 | GTATCCAGAAATATTTTTTAC | GTATCAAAAATCCAACACGTA | 800 | 303 | No 303-bp band |
| MAL13P1.183 | 13 | CTCCTAGAAATCCTAGATAT | GACTATGCAGTTTTTTTTATC | 315 | 311 | Add'l ~90-bp band |
| MAL13P1.51 | 13 | CATTTATTGAATGCTCAGC | GTAGTAATATTCTCTCCTG | 180 | 46 | Confirmed |
| MAL13P1.80 | 13 | CCAAAAAAGGACCTAATAAA | TATATATATGCACACGACAT | 376 | 219/150 | No 150 bp band |
| PF13_0082 | 13 | CGAAGTGACAAAAAAAAGGA | CAGAATTTTTCCTATTATCG | 294 | 118 | Confirmed |
| PF13_0224 | 13 | CTGATTTGTTTTTTCAACAAT | GAGTTATCTATTTTTTTAACC | 351 | 152 | Confirmed |
| MAL13P1.195 | 13 | GAAAATGTCTGTCTTGTCAA | GCGTTCATATCGTCAAAAGA | 297 | 179 | Confirmed |
| MAL13P1.253 | 13 | TTTTTACGAACAAAACGGTT | CTTTTGTTTGATCTAATACC | 215 | 117 | Confirmed |
| PF13_0220 | 13 | AGTCATATCAAAAAATAGCT | GTACTTGTCTGATCTTTCTT | 284 | 123/167 | Confirmed |
| PF13_0301 | 13 | AAAAATGAATGGAGTCCAGC | GCTGTTTTTAAATAAAGGGA | 243 | 146 | Confirmed |
| PF14_0434 | 14 | GGATAGAAGAAACTATAACC | ATGCTATCATACTTACTGG | 206 | 104 | Confirmed |
| PF14_0779 | 14 | CCTGATATGCGTGAAATT | TTTTTTCAATATTGTCGTACC | 525 | 90 | No 525-bp band |
| PF14_0338 | 14 | AAAACAAGAATTTATCACGG | GATTCATTCCTGAATGGTCT | 727 | 116 | Confirmed |
| PF14_0488 | 14 | AAAAAAAGGTCTACAAAAGC | TTGTTAAAATATTCCAAGGC | 230 | 92 | Confirmed |
| PF14_0576 | 14 | GCACAATTTGAAAGAAAATT | ACTCGTGATGTAAATTTTCA | 629 | 230 | No 629-bp band |
| PF14_0787 | 14 | CCTTTATTCATATGTGGAAT | GCAAGAGAAAATGGTTTAATAC | 585 | 120 | Add'l genomic bands |
| PF14_0790 | 14 | GAATAGGAAAATATGCCAAG | GAATTATTACTATTCATCAC | 239 | 111 | Confirmed |
Ch, chromosome; Gen, expected sizes in base pair from genomic DNA; Spl, expected sizes in base pair if introns are spliced out. PFL0020w has a cDNA with 78 bp intron having GT-AG sites inside an ORF, but no spliced band was detected using PCR. Similarly, PFL0290w has a 287 bp gap without GT-AG sites; this gap was not confirmed. Primers for PFE1540 cover two introns, so multiple forms were expected. Indeed, three transcripts of different sizes were found in one of the introns.
Figure 2PCR products confirming alternatively spliced introns. Oligonucleotide primers flanking selected predicted introns that might be alternatively spliced were amplified from genomic DNA (G lanes), reverse-transcribed mRNA of mixed asexual stages (C lanes), and mRNA controls of mixed asexual stages (without reverse transcriptase, R lanes). Genes with alternatively spliced introns are as marked; M, 100 bp DNA ladder. Note that more than two bands were amplified from PFE1540w, PF13_0220, and PF13_0224.
Figure 3Diagram of exon/intron structures of predicted gene PFL1420w and cDNA contigs covering the gene. FC (forward contig) is a sense transcript with an intron matching the predicted intron. RC (reverse contig) is an antisense transcript having a smaller intron with GT-AG sites in the opposite direction. The line on top represents plus strand genomic DNA. Dashed lines are introns; heavy lines are predicted exons or ORF.
Figure 4Functional categories of expressed genes covered by all EST. A total of 3862 genes matched by EST were sorted according to GO molecular functions with P values < 0.0001 on sequence matches. The majority of the genes encode housekeeping proteins involved in DNA/RNA and protein binding, enzyme catalytic activities, transcription, translation, signal transduction, and transport activities.