| Literature DB >> 22984403 |
Pietro Liò1, Claudia Angelini, Italia De Feis, Viet-Anh Nguyen.
Abstract
A major goal of bioinformatics is the characterization of transcription factors and the transcriptional programs they regulate. Given the speed of genome sequencing, we would like to quickly annotate regulatory sequences in newly-sequenced genomes. In such cases, it would be helpful to predict sequence motifs by using experimental data from closely related model organism. Here we present a general algorithm that allow to identify transcription factor binding sites in one newly sequenced species by performing Bayesian regression on the annotated species. First we set the rationale of our method by applying it within the same species, then we extend it to use data available in closely related species. Finally, we generalise the method to handle the case when a certain number of experiments, from several species close to the species on which to make inference, are available. In order to show the performance of the method, we analyse three functionally related networks in the Ascomycota. Two gene network case studies are related to the G2/M phase of the Ascomycota cell cycle; the third is related to morphogenesis. We also compared the method with MatrixReduce and discuss other types of validation and tests. The first network is well known and provides a biological validation test of the method. The two cell cycle case studies, where the gene network size is conserved, demonstrate an effective utility in annotating new species sequences using all the available replicas from model species. The third case, where the gene network size varies among species, shows that the combination of information is less powerful but is still informative. Our methodology is quite general and could be extended to integrate other high-throughput data from model organisms.Entities:
Mesh:
Year: 2012 PMID: 22984403 PMCID: PMC3439465 DOI: 10.1371/journal.pone.0042489
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Motifs detected for septation transcriptional network in fission yeast clade (case study 1).
| Motifs detected in | |||
| by Bayesian variable selection | marginal probability | by MatrixREDUCE | t-value |
|
| 0.995872 |
| −8.495 |
|
| 0.992299833 |
| 7.854 |
|
| 0.990841403 |
| −7.146 |
|
| 0.966439608 |
| −6.868 |
|
| 0.964823733 |
| −6.416 |
|
| 0.964302733 |
| −6.119 |
|
| 0.952851275 |
| −6.017 |
|
| 0.930580914 |
| 5.857 |
|
| 0.930109523 |
| −5.723 |
|
| 0.913970665 |
| −5.602 |
|
| 0.911840485 | ||
|
| 0.899786547 | ||
|
| 0.892315745 | ||
|
| 0.8920356 | ||
|
| 0.868056013 | ||
|
| 0.773856698 | ||
|
| 0.728913078 | ||
Figure 1ENG1 ML tree.
Maximum Likelihood tree, based on JTT model of evolution, inferred using Eng1 protein sequence from the following species: S. Japonicus, S. Octosporus, S. Cerevisiae, S. Pombe, Kluyveromyces lactis, Debaryomyces hansenii, Candida Albicans, Yarrowia lipolytica, Aspergillus oryzae, Phaeosphaeria nodorum, Neurospora crassa, Vanderwaltozyma polyspora Neosartorya fischeri Pichia guilliermondii,Coccidioides posadasii, Gibberella zeae, Ashbya gossypii, Sclerotinia sclerotiorum, Magnaporthe grisea, Ajellomyces capsulatus, Aspergillus clavatus, Aspergillus niger, Pichia stipitis, Lodderomyces elongisporus, Candida glabrata,Candida Tropicalis,Candida dubliniensis,Candida parapsilosis; Brassica napus and Sorangium cellulosum are plant sequences used as outgroups, i.e. to facilitate the rooting of fungi phylogeny; we also include S. Japonicus Eng1 and Eng2 proteins and S. Cerevisiae Acf1 and Acf2 proteins. From a methodological purpose, we validate this phylogeny with a phylogeny with the same number of species, based on cdc5, a regulator of G2/M transition of mitotic cell cycle with the same visualisation as in [28]; the width corresponds to phylogenetic agreement.
Figure 2RAM ML tree.
Maximum Likelihood tree, based on JTT model of evolution, inferred using RAM protein sequences from the species of figure 1. We validate this phylogeny with a phylogeny with the same number of species, based on cdc5, a regulator of G2/M transition of mitotic cell cycle with the same visualisation as in [28].
Motifs detected for cytokinesis transcriptional network in Candida clade (case study 2).
| Motifs detected in | |||
| by Bayesian variable selection | marginal probability | by MatrixREDUCE | t-value |
|
| 0.978005 |
| 10.56 |
|
| 0.900631 |
| −8.362 |
|
| 0.666508 |
| 8.143 |
|
| 0.663159 |
| 7.691 |
|
| 0.650006 |
| 7.129 |
|
| 0.637142 |
| 6.78 |
|
| 0.626425 |
| 6.09 |
|
| 6.09 | ||
|
| 6.086 | ||
|
| 6.086 | ||
Motifs detected for RAM transcriptional network in the Ascomycota (case study 3).
| Motifs detected in | |||
| by Bayesian variable selection | marginal probability | by MatrixREDUCE | t-value |
|
| 1 |
| 10.494 |
|
| 0.7266 |
| −10.281 |
|
| 0.7263 |
| 8.597 |
|
| 0.6174 |
| 8.591 |
|
| 0.4965 |
| 8.591 |
|
| 0.4731 |
| −8.383 |
|
| 0.4458 |
| −7.47 |
|
| 0.4028 |
| 7.47 |
|
| 0.3884 |
| 7.104 |
|
| 0.3768 |
| −6.759 |
|
| 0.3666 | ||
|
| 0.3254 | ||
|
| 0.319 | ||