| Literature DB >> 24516366 |
Faouzi Jaziri1, Eric Peyretaillade2, Mohieddine Missaoui1, Nicolas Parisot3, Sébastien Cipière4, Jérémie Denonfoux3, Antoine Mahul5, Pierre Peyret2, David R C Hill4.
Abstract
Phylogenetic Oligonucleotide Arrays (POAs) were recently adapted for studying the huge microbial communities in a flexible and easy-to-use way. POA coupled with the use of explorative probes to detect the unknown part is now one of the most powerful approaches for a better understanding of microbial community functioning. However, the selection of probes remains a very difficult task. The rapid growth of environmental databases has led to an exponential increase of data to be managed for an efficient design. Consequently, the use of high performance computing facilities is mandatory. In this paper, we present an efficient parallelization method to select known and explorative oligonucleotide probes at large scale using computing grids. We implemented a software that generates and monitors thousands of jobs over the European Computing Grid Infrastructure (EGI). We also developed a new algorithm for the construction of a high-quality curated phylogenetic database to avoid erroneous design due to bad sequence affiliation. We present here the performance and statistics of our method on real biological datasets based on a phylogenetic prokaryotic database at the genus level and a complete design of about 20,000 probes for 2,069 genera of prokaryotes.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24516366 PMCID: PMC3913353 DOI: 10.1155/2014/350487
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1Summary of algorithm steps.
Figure 2Parallelization strategy to define and submit jobs over the grid.
A comparison of the performance of the alignment method used in our software with that used in PhylArray [29], using 100 cores.
| Aligned group | Number of sequences | Number of subgroups | Alignment time (seconds) | Speedup | |
|---|---|---|---|---|---|
| PhylArray | PhylGrid 2.0 | ||||
|
| 1,174 | 37 | 2,542 | 1,247 | 2.03 |
|
| 3,947 | 58 | 12,586 | 3,130 | 4.02 |
Figure 3A comparison of our load balancing method with PhylArray [29] using 16 processors to select probes for “Citrobacter” group.
Figure 4A comparison of our load balancing method with PhylArray [29] using 8 processors to select probes for “Haemophilus” group.
A Comparison of our load balancing method with PhylArray [29] using 4 processors to select probes for 3 genus groups.
| Group |
|
|
| |||
|---|---|---|---|---|---|---|
| Software | PhylArray | PhylGrid 2.0 | PhylArray | PhylGrid 2.0 | PhylArray | PhylGrid 2.0 |
| Mean degeneracy | 26,722.75 | 26,722.75 | 37,132.25 | 37,132.25 | 20,100.75 | 20,100.75 |
| Degeneracy job 1 | 13,068 | 26,723 | 41,435 | 37,133 | 28,600 | 20,101 |
| Degeneracy job 2 | 41,782 | 26,723 | 43,466 | 37,132 | 32,335 | 20,101 |
| Degeneracy job 3 | 16,381 | 26,723 | 10,273 | 37,132 | 4,314 | 20,101 |
| Degeneracy job 4 | 35,660 | 26,722 | 53,355 | 37,132 | 15,154 | 20,100 |
| Standard deviation |
|
|
|
|
|
|
Figure 5The median execution result of probe selection for 10 genus groups on the EGI grid using 586 jobs.