| Literature DB >> 29297337 |
Quan Zou1,2,3, Shixiang Wan1, Xiangxiang Zeng4, Zhanshan Sam Ma5.
Abstract
BACKGROUND: Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel.Entities:
Keywords: Algorithm; Computational biology; Evolutionary tree; Hadoop; Multiple sequence alignment; Spark
Mesh:
Year: 2017 PMID: 29297337 PMCID: PMC5751538 DOI: 10.1186/s12918-017-0476-3
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Fig. 1The running time on mt genome datasets with different numbers of Hadoop nodes. Running time of different software tools on mtDNA datasets
Fig. 2The running time on mt genome datasets with different numbers of Spark nodes. Running time with HPTree on 16S rRNA datasets
The running time of differen nodes comparison on human mitochondrial genomes dataset (Unit: seconds)
| 1× | 20× | 50× | 100× | |
|---|---|---|---|---|
| 4-nodes(Hadoop) | 72 | 198 | 988 | 2657 |
| 3-nodes(Hadoop) | 110 | 324 | 1631 | 3487 |
| 2-nodes(Hadoop) | 157 | 494 | 2235 | 5384 |
| 4-nodes(Spark) | 27 | 65 | 423 | 1095 |
| 3-nodes(Spark) | 35 | 96 | 765 | 1770 |
| 2-nodes(Spark) | 67 | 189 | 1232 | 2586 |
The running time of human mitochondrial genomes datasets between aligned and unaligned sequences (Unit: seconds)
| 1× | 20× | 50× | 100× | |
|---|---|---|---|---|
| Unaligned(Hadoop) | 213 | 851 | 1722 | 4365 |
| Aligned(Hadoop) | 72 | 198 | 868 | 2657 |
| Unaligned(Spark) | 56 | 238 | 846 | 1720 |
| Aligned(Spark) | 27 | 65 | 423 | 1095 |
The running time of 16S rRNA datasets between aligned and unaligned sequences (Unit: seconds)
| Small | Big | |
|---|---|---|
| Unaligned(Hadoop) | 15,736 | 106,400 |
| Aligned(Hadoop) | 12,464 | 86,400 |
| Unaligned(Spark) | 4739 | 35,869 |
| Aligned(Spark) | 3159 | 30,012 |
Fig. 3MSA procedures based on Spark distributed framework