| Literature DB >> 31437210 |
Abstract
Rapid improvements in DNA sequencing technology have resulted in long genome sequences for a large number of similar isolates with a wide range of single nucleotide polymorphism (SNP) rates, where some isolates can have thousands of times lower SNP rates than others. Genome sequences of this kind are a challenge to existing methods for construction of phylogenetic trees. We address the issues by developing a hierarchical approach to phylogeny construction. In this method, the construction is performed at multiple levels, where at each level, groups of isolates with similar levels of similarity are identified and their phylogenetic trees are constructed. Time savings are achieved by using a sufficiently large number of columns from the input alignment, instead of all its columns. Our results show that the new approach is 20-60 times more efficient than existing programs and more accurate in situations where highly similar isolates have a wide range of SNP rates.Entities:
Mesh:
Year: 2019 PMID: 31437210 PMCID: PMC6705828 DOI: 10.1371/journal.pone.0221357
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Illustration of the partition step.
A collection of isolates (colored ovals) is partitioned into a group of moderately similar isolates and two subgroups of highly similar isolates.
Time required by three versions of HPC and three existing programs.
| Data | RAxML | IQ-TREE | FastTree | |||
|---|---|---|---|---|---|---|
| Narrow | > 168 hr | 12 hr | 31 hr | 35 min | 5 hr | 15 min |
| Wide | > 168 hr | 15 hr | 60 hr | 1 hr | 8 hr | 20 min |
Fig 2Accuracy assessment of IQ-TREE and HPC on 300 Wide datasets.
Fig 3Accuracy assessment of FastTree and HPC on 300 Wide datasets.
Fig 4Accuracy assessment of IQ-TREE and HPC on 200 Narrow datasets.
Fig 5Accuracy assessment of FastTree and HPC on 200 Narrow datasets.
Average normalized Robinson-Foulds distance for each program.
| Data | RAxML | IQ-TREE | FastTree | |||
|---|---|---|---|---|---|---|
| Narrow | N/A | 0.0151 | 0.0167 | 0.0164 | 0.3807 | 0.0314 |
| Wide | N/A | 1.5137 | 2.3819 | 1.7151 | 2.4674 | 1.5938 |
1 RAxML did not run to completion in the allocated amount of time.