| Literature DB >> 26421846 |
Abdel Belkorchia1, Cyrielle Gasc2, Valérie Polonais1, Nicolas Parisot3, Nicolas Gallois2, Céline Ribière2, Emmanuelle Lerat4, Christine Gaspin5, Jean-François Pombert6, Pierre Peyret2, Eric Peyretaillade2.
Abstract
The proper prediction of the gene catalogue of an organism is essential to obtain a representative snapshot of its overall lifestyle, especially when it is not amenable to culturing. Microsporidia are obligate intracellular, sometimes hard to culture, eukaryotic parasites known to infect members of every animal phylum. To date, sequencing and annotation of microsporidian genomes have revealed a poor gene complement with highly reduced gene sizes. In the present paper, we investigated whether such gene sizes may have induced biases for the methodologies used for genome annotation, with an emphasis on small coding sequence (CDS) gene prediction. Using better delineated intergenic regions from four Encephalitozoon genomes, we predicted de novo new small CDSs with sizes ranging from 78 to 255 bp (median 168) and corroborated these predictions by RACE-PCR experiments in Encephalitozoon cuniculi. Most of the newly found genes are present in other distantly related microsporidian species, suggesting their biological relevance. The present study provides a better framework for annotating microsporidian genomes and to train and evaluate new computational methods dedicated at detecting ultra-small genes in various organisms.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26421846 PMCID: PMC4589312 DOI: 10.1371/journal.pone.0139075
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Example of the genomic context of previously annotated genes and newly-identified sCDSs in Encephalitozoon genomes.
The transcriptional signals of the newly predicted genes are highlighted in red (promoter signal) and green (polyadenylation signal), respectively. The putative polyadenylation signals of the genes flanking the new sCDSs are highlighted in light blue.
Predicted small protein-coding gene orthologs in the four Encephalitozoon species.
Orthologs in other microsporidian genomes were predicted using PSI-BLAST and manual validation. Additional functional inferences were performed using InterProScan 5 (conserved amino-acids motifs), TMHMM (transmembrane helices) and SignalP (signal peptides). Bold: Genes present in independent RNA-Seq datasets [48].
| Locus tag | Gene product size (aa) | Microsporidian species with orthologs | AAATTT or Adenine/Thymine rich signals for | Interpro domain | TMHMM | SignalP | |||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
| ||||||
| ECU01_1065 | Eint_010975 | EHEL_010945 | EROM_010865 | 73 | Th | ||||
| ECU02_0235 | Eint_020155 | EHEL_020165 | EROM_020145 | 57 | Ea, Na, Nb, Nc, Th | ||||
| ECU02_0425 | Eint_020355 | EHEL_020345 | EROM_020335 | 57 | + (1) | ||||
| ECU02_0885 | Eint_020835 | EHEL_020805 | EROM_020795 | 59 | Oc | + (1) | |||
|
|
|
|
|
|
|
| |||
|
|
|
|
|
|
|
| |||
| ECU04_0123 | Eint_040045 | EHEL_040035 | EROM_040065 | 55 | Oc | + (1) | |||
| ECU04_0152 | Eint_040082 | EHEL_040072 | EROM_040102 | 54 | Th, Vc, Np | ||||
| ECU04_1622 | Eint_041635 | EHEL_041595 | EROM_041652 | 28 | Na, Nb, Aa, Ea, Nc, Th, Vc, Oc, Vco | ||||
| ECU04_1635 | — | — | EROM_041665 | 55 | |||||
|
|
|
|
|
|
|
|
| ||
| ECU05_0115 | Eint_050105 | EHEL_050165 | EROM_050085 | 65 | |||||
| ECU05_1185 | Eint_051235 | EHEL_051295 | EROM_051225 | 51 | Nc, Ea, Na, Nb, Oc | ||||
| ECU05_1275 | Eint_051335 | EHEL_051395 | EROM_051335 | 42 | Oc | ||||
| ECU06_0285 | Eint_060185 | EHEL_060205 | EROM_060195 | 33 | Eb, Na, Nb, Nc, Ea | + (1) | |||
| ECU07_0862 | Eint_070802 | EHEL_070832 | EROM_070812 | 41 | |||||
| ECU07_1385 | Eint_071345 | EHEL_071365 | EROM_071325 | 84 | Aa, Ea, Eb | IPR024766 | |||
| ECU07_1645 | Eint_071493 | EHEL_071625 | EROM_071565 | 69 | |||||
| ECU07_1775 | Eint_071493 | EHEL_071755 | EROM_071695 | 75 | + | + (1) | |||
|
|
|
|
|
|
| ||||
|
|
|
|
|
|
|
| |||
| ECU09_0465 | Eint_090475 | EHEL_090465 | EROM_090475 | 42 | |||||
|
|
|
|
|
| |||||
| ECU09_1665 | Eint_091675 | EHEL_091675 | EROM_091655 | 49 | Aa, Sl, Th, Vc, Oc | ||||
| ECU09_1755 | Eint_091775 | EHEL_091775 | EROM_091755 | 43 | Nc, Oc, Na, Ea | ||||
|
|
|
|
|
|
| ||||
| ECU11_0185 | Eint_110055 | EHEL_110065 | EROM_110055 | 42 | Nb, Nc, Eb | ||||
| ECU11_0525 | Eint_110375 | EHEL_110395 | EROM_110385 | 25 |
| ||||
| ECU11_0575 | Eint_110425 | EHEL_110445 | EROM_110435 | 49 | Oc | ||||
| ECU11_1175 | Eint_111055 | EHEL_111055 | EROM_111055 | 84 | Oc | ||||
|
|
|
|
|
|
| ||||
|
|
|
|
|
|
|
| |||
a Previously predicted
(1) Accession numbers, positions and locus tags are listed in the S3 Table.
Aa (Anncaliia algerae); Ea (Edhazardia aedis); Eb (Enterocytozoon bieneusi); Ht (Hamiltosporidium tvaerminnensis); Oc (Ordospora colligata); Na (Nosema apis); Nb (Nosema bombycis), Nc (Nosema ceranae); Np (Nematocida parisii); Sl (Spraguea lophii); Th (Trachipleistophora hominis); Vc (Vavraia culicis); Vco (Vittaforma cornae)
Fig 2Validation example of the newly predicted orthologs using both protein and nucleotide sequence alignments.
Protein and nucleotide alignments were performed using MUSCLE and Clustal Omega, respectively.
Fig 3Identification of the 5' and 3' maturation sites of the newly predicted small CDSs.
Translation initiation codons and stop codons are highlighted in light-grey for all genes. Putative polyadenylation signals are underlined and highlighted in bold characters. Distances between putative polyadenylation signals and polyadenylation sites are indicated between parentheses. Putative microsporidian promoter specific signals, located upstream the transcription start sites, are highlighted in dark grey. For brevity, the complete CDS sequences were not included and are represented instead by the corresponding gene names. ND; Not Defined.
Fig 4Phylogenetic distribution of the newly predicted small protein-coding genes across 17 sequenced microsporidian species.
Left: The HKY85 Maximum Likelihood phylogenetic tree shown here is derived from the small ribosomal RNA-encoding gene. Bootstrap support for each cluster is indicated on the corresponding nodes; only bootstraps greater than 50% are indicated. Right: The presence/absence of the newly identified sCDSs in the corresponding species are denoted by filled and empty circles, respectively. The two grey circles indicate genes that fall within unsequenced regions in the E. intestinalis and E. hellem genomes and whose presence could not be confirmed. Locus names of the new sCDSs (on top) are derived from the E. cuniculi accessions.