| Literature DB >> 21113335 |
C Randal Linder1, Rahul Suri, Kevin Liu, Tandy Warnow.
Abstract
We have assembled a collection of web pages that contain benchmark datasets and software tools to enable the evaluation of the accuracy and scalability of computational methods for estimating evolutionary relationships. They provide a resource to the scientific community for development of new alignment and tree inference methods on very difficult datasets. The datasets are intended to help address three problems: multiple sequence alignment, phylogeny estimation given aligned sequences, and supertree estimation. Datasets from our work include empirical datasets with carefully curated alignments suitable for testing alignment and phylogenetic methods for large-scale systematics studies. Links to other empirical datasets, lacking curated alignments, are also provided. We also include simulated datasets with properties typical of large-scale systematics studies, including high rates of substitutions and indels, and we include the true alignment and tree for each simulated dataset. Finally, we provide links to software tools for generating simulated datasets, and for evaluating the accuracy of alignments and trees estimated on these datasets. We welcome contributions to the benchmark datasets from other researchers.Entities:
Year: 2010 PMID: 21113335 PMCID: PMC2989560 DOI: 10.1371/currents.RRN1195
Source DB: PubMed Journal: PLoS Curr ISSN: 2157-3999
|
| ||||||
|
|
|
|
|
|
|
|
| 16S.B.ALL | 16S rRNA | Bacteria | 27,643 | 6,857 | 80.0 | 4.9 |
| 16S.T | 16S rRNA | The three domains of life plus mitochondria and chloroplasts | 7,350 | 11,856 | 87.4 | 12.1 |
| 16S.3 | 16S rRNA | The three domains of life | 6,323 | 8,716 | 82.1 | 9.4 |
| 16S.M.aa_agc | 16S rRNA | Mitochondria | 1,028 | 4,907 | 82.6 | 22.0 |
| 16S.M | 16S rRNA | Mitochondria | 901 | 4,722 | 78.1 | 17.2 |
| 23S.M | 23S rRNA | Mitochondria | 278 | 10,738 | 83.7 | 31.9 |
| 23S.M.aa_agc | 23S rRNA | Mitochondria | 263 | 10,305 | 83.5 | 34.2 |
| 23S.E.aa_agc | 23S rRNA | Eukaryotes nuclear | 144 | 8,619 | 61.1 | 13.5 |
| 23S.E | 23S rRNA | Eukaryotes | 117 | 9,079 | 59.7 | 12.6 |
|
aUnless otherwise noted, all datasets in this table are taken from Cannone et al. | ||||||
|
| |||||
|
|
|
|
|
|
|
| FastTree | Price et al. | AA | 250; 1,250; 5,000 | N/A | Rose |
| SATé | Liu et al. | NA | 100; 500; 1,000 | 1,000 | SeqGen |
| RNASim |
| NA (SSU rRNA) | 128; 256; 512; 1,024; 2,048; 4,096; 8,192; 16,384; 1,000,000 | 1,542 | RNASim |
|
aAA = amino acid; NA = nucleic acid | |||||
| | |||
| |
|
|
|
|
| | | |
| | | | |
| | | | |
| | | | |
|
| | | |