| Literature DB >> 25875305 |
Abhinay Ramaprasad1, Tobias Mourier2, Raeece Naeem1, Tareq B Malas1, Ehab Moussa1, Aswini Panigrahi3, Sarah J Vermont4, Thomas D Otto5, Jonathan Wastling4, Arnab Pain1.
Abstract
Toxoplasma gondii is an important protozoan parasite that infects all warm-blooded animals and causes opportunistic infections in immuno-compromised humans. Its closest relative, Neospora caninum, is an important veterinary pathogen that causes spontaneous abortion in livestock. Comparative genomics of these two closely related coccidians has been of particular interest to identify genes that contribute to varied host cell specificity and disease. Here, we describe a manual evaluation of these genomes based on strand-specific RNA sequencing and shotgun proteomics from the invasive tachyzoite stages of these two parasites. We have corrected predicted structures of over one third of the previously annotated gene models and have annotated untranslated regions (UTRs) in over half of the predicted protein-coding genes. We observe distinctly long UTRs in both the organisms, almost four times longer than other model eukaryotes. We have also identified a putative set of cis-natural antisense transcripts (cis-NATs) and long intergenic non-coding RNAs (lincRNAs). We have significantly improved the annotation quality in these genomes that would serve as a manually curated dataset for Toxoplasma and Neospora research communities.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25875305 PMCID: PMC4395442 DOI: 10.1371/journal.pone.0124473
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Summary of the manually curation of TgVEG and NcLIV (ToxoDb v8.0) genes.
A gene model was “corrected” by adding/deleting exons or altering their exon-intron boundaries to conform to the transcript and peptide evidence. The corrected genes also include the models that were either “split” into two separate genes or “merged” into a single gene based on transcript splice-site evidence. “New” genes were annotated in open reading frames with clear expression evidence. Genes that lacked expression evidence and overlapped with an expressed gene model were considered spurious and “deleted”.
Fig 2Qualitative and quantitative assessment of manually re-evaluated genomes.
(A) We took confidence scores of Pfam domain hits (Pfam database v 26.0) as a rough indicator of the annotation quality and compared the e-value scores of Pfam domains before and after curation. Repetitive domains were omitted and the copy with the lowest e-value score is used for comparison. A 10-fold change in the e-value was considered as a significant change. While the e-values of Pfam domains in unchanged genes remain the same, we find a general increase in Pfam domain hit significance (decreasing e-values) after our curation (orange). We also find new Pfam domain hits appearing in the corrected genes (green) and in newly created genes (red). (B) Quantitative assessment of manual curation. Functional domain content of a genome is a crude indicator of annotation quality; therefore we compared the proportion of genes having a domain hit from InterProScan (grey) to genes without any domains (black). We find a ~20% increase in genes with functional domains after curation in TgVEG and ~10% in NcLIV genome.
Fig 3Long untranslated regions and putative anti-sense non-coding RNAs in TgVEG and NcLIV.
(A) Length distribution of 5’UTRs, 3’UTRs and CDS in Toxoplasma gondii, Neospora caninum, Schizosaccharomyces pombe, Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens. 5’UTRs are found to be strikingly large in the parasites, almost 4 times higher than other eukaryotes. 3’UTRs are comparable to those in human and longer than other eukaryotes. (B) Sequence conservation across UTRs and their flanking intergenic regions. UTR regions are generally more conserved than their flanking intergenic regions. (C) Log abundance ratio of antisense non-coding RNA (ancRNA) and sense coding mRNA pair versus sense coding RNA. There is an inverse relation between abundances of ancRNA and their sense mRNA counterpart.
Comparison of UTR sizes of specific gene families in T. gondii VEG and N. caninum LIV.
| Organism | Gene group | Number of genes with an annotated UTR | 5'UTRs | 3'UTRs | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Median size | Average size | Number of UTRs larger than median UTR size in genome | Number of UTRs smaller than median UTR size in genome | Probability* | Median size | Average size | Number of UTRs larger than median UTR size in genome | Number of UTRs smaller than median UTR size in genome | Probability | |||
|
| All genes | 4232 | 857 | 969.2 | ———- | ———- | 741.5 | 934.5 | ———- | ———- | ||
| Rhoptry proteins | 30 | 926 | 1031.5 | 16 | 14 | 615 | 888.8 | 14 | 16 | |||
| Microneme related | 15 | 716 | 697.1 | 7 | 8 | 609 | 880.5 | 6 | 9 | |||
| Dense granules | 5 | 448 | 720.6 | 1 | 4 | 370 | 1224.6 | 2 | 3 | |||
| AP2 transcription factors | 46 | 1445 | 1644.3 | 37 (28) | 9 | 2 x 10–5 | 1490.5 | 1663.9 | 38 (28) | 8 | 4.6 x 10–6 | |
|
| All genes | 4333 | 816 | 919.9 | 755 | 902.4 | ||||||
| Rhoptry proteins | 29 | 792 | 949.5 | 14 | 15 | 857 | 1006.9 | 16 | 13 | |||
| Microneme related | 13 | 392 | 485.2 | 2 | 11 | 0.011 | 653 | 781.9 | 6 | 7 | ||
| Dense granules | 4 | 271.5 | 630.2 | 1 | 3 | 279 | 338.8 | 1 | 3 | |||
| AP2 transcription factors | 48 | 1465.5 | 1644.3 | 41 (28) | 7 | 3.1 x 10–7 | 1438 | 1591.4 | 41 (28) | 7 | 3.1 x 10–7 | |
The median sizes of UTRs of genes belonging to specific gene families that had an annotated UTR were calculated and compared against the median size of UTRs of all genes in the genomes that had an annotated UTR. Wherever a significant difference was found, the probability values have been shown. AP2 transcription factors have significantly longer UTRs than rest of the genes.
a Probability of the identified UTRs having size higher/lower than the median size for all genes, just by chance assuming a binomial distribution.
b Numbers within brackets are the number of TgVEG-NcLIV orthologs.