| Literature DB >> 23148687 |
Larissa Lopes Silva1, Marina Marcet-Houben, Laila Alves Nahum, Adhemar Zerlotini, Toni Gabaldón, Guilherme Oliveira.
Abstract
BACKGROUND: Schistosoma mansoni is one of the causative agents of schistosomiasis, a neglected tropical disease that affects about 237 million people worldwide. Despite recent efforts, we still lack a general understanding of the relevant host-parasite interactions, and the possible treatments are limited by the emergence of resistant strains and the absence of a vaccine. The S. mansoni genome was completely sequenced and still under continuous annotation. Nevertheless, more than 45% of the encoded proteins remain without experimental characterization or even functional prediction. To improve our knowledge regarding the biology of this parasite, we conducted a proteome-wide evolutionary analysis to provide a broad view of the S. mansoni's proteome evolution and to improve its functional annotation.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23148687 PMCID: PMC3534613 DOI: 10.1186/1471-2164-13-617
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Proteomes selected for the phylome reconstruction
| MONBE | 81824 | 9,170 | JGI | 2011-06-01 | |
| CIOIN | 7719 | 14,048 | UniProt Reference Proteomes | 2011-07-09 | |
| NEMVE | 45351 | 24,424 | UniProt Reference Proteomes | 2011-07-09 | |
| SCHHA | 6185 | 12,767 | SchistoDB | 2012-03-09 | |
| SCHMA | 6183 | 11,103 | SchistoDB | 2012-03-09 | |
| SCHJA | 6182 | 12,636 | SchistoDB | 2012-03-09 | |
| CAEEL | 6239 | 19,758 | UniProt Reference Proteomes | 2011-07-09 | |
| ASCSU | 6253 | 18,430 | WormBase | 2012-03-09 | |
| BRUMA | 6279 | 19,916 | WormBase | 2012-03-09 | |
| TRISP | 6334 | 15,878 | WormBase | 2012-03-09 | |
| DROME | 7227 | 11,794 | FlyBase | 2011-09-13 | |
| TRICA | 7070 | 16,533 | BeetleBASE - HGSC | 2011-12-16 | |
| HUMAN | 9606 | 20,965 | UniProt Reference Proteomes | 2011-07-09 |
1 - Code assigned to each species in the S. mansoni phylome. 2 - Taxonomic identifier at NCBI (TaxID). 3 - Number of proteins analyzed per species. 4 - Database from which the protein data were retrieved.
Figure 1Pipeline used to reconstruct and analyze the phylome. Each protein sequence encoded in the parasite genome was compared against a database of proteins from other 12 fully sequenced eukaryotic proteomes (Table 1) to select putative homologous proteins. Groups of potential homologs were aligned and subsequently trimmed to remove gap-rich regions. The refined alignment was used to build a NJ tree, which was then used as a “seed” tree to perform a ML likelihood analysis as implemented in PhyML. In the ML analysis, up to five different evolutionary models were tested and the model best fitting to the data was determined by the Akaike Information Criterion (AIC). Different algorithms were used to identify homology relationships and lineage-specific duplications. To extract and interpret the large data set obtained a Structured Query Language (SQL) relational database was built. This database was the main resource for data mining in this work. Adapted from [39].
Figure 2Homology relationships and evolutionary events inferred from the analysis of a protein.A) Phylogenetic tree reconstructed for the parasite “seed” protein Phy000V0I5_SCHMA (Smp_175750). B) Homology relationships identified between the “seed” protein and its homologs in the other species.
Figure 3Example of functional prediction based on phylogenetic analysis. The protein sequences are represented by the internal identifier in PhylomeDB. Relationships among the parasite Phy000V14T_SCHMA “seed” protein (Smp_170950) and its homologs in other species (Table 1) as inferred by maximum likelihood method implemented in PhyML. Support values were computed by approximate likelihood ratio test (aLTR). Curly brackets hold Gene Ontology (GO) terms for proteins in this dataset.
Figure 4Phylogenetic relationships of schistosome lineage-specific duplicated tetraspanins. Analysis was performed with trimmed sequence alignment by using the maximum likelihood method as implemented in PhyML. Best fit model (WAG) and support values for each node were estimated by the Akaike Likelihood Ratio Test (aLRT). Sequence labels follow the PhylomeDB internal identifier. For details, see supplementary data (Additional file 1 Table S3).