| Literature DB >> 22168258 |
Jorge Barriuso1, Jose R Valverde, Rafael P Mellado.
Abstract
BACKGROUND: Next generation sequencing (NGS) enables a more comprehensive analysis of bacterial diversity from complex environmental samples. NGS data can be analysed using a variety of workflows. We test several simple and complex workflows, including frequently used as well as recently published tools, and report on their respective accuracy and efficiency under various conditions covering different sequence lengths, number of sequences and real world experimental data from rhizobacterial populations of glyphosate-tolerant maize treated or untreated with two different herbicides representative of differential diversity studies.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22168258 PMCID: PMC3258296 DOI: 10.1186/1471-2105-12-473
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Accuracy test
| 60 seqs | 50 × 60 seqs | |||||
|---|---|---|---|---|---|---|
| Reference expected values | 59 | 57 | 52 | 59 | 57 | 52 |
| CROP | 59 | 59 | 59 | 60 | 60 | 60 |
| ESPRIT | 59 | 57 | 52 | 59 | 57 | 52 |
| MAFFT+JC | 58 | 56 | 50 | 58 | 57 | 51 |
| MAFFT+MAFFT | 59 | 59 | 59 | 59 | 59 | 59 |
| MAFFT+Mothur | 59 | 56 | 51 | 59 | 57 | 54 |
| Mothur+JC | 55 | 55 | 55 | 2992 | 2992 | 2992 |
| Mothur+MAFFT | 41 | 40 | 36 | 252 | 251 | 243 |
| Mothur+Mothur | 48 | 48 | 48 | 48 | 48 | 48 |
| Mothur+PreC+JC | 44 | 44 | 44 | 43 | 43 | 43 |
| Mothur+PreC+MAFFT | 47 | 47 | 47 | 47 | 47 | 47 |
| Mothur+PreC+Mothur | 48 | 48 | 48 | 48 | 48 | 48 |
| MUSCLE+JC | 58 | 56 | 50 | 63 | 59 | 59 |
| MUSCLE+MAFFT | 59 | 59 | 59 | 59 | 59 | 59 |
| MUSCLE+Mothur | 59 | 56 | 55 | 66 | 62 | 56 |
| Otupipe | 59 | 57 | 52 | 59 | 57 | 52 |
| RDP | 59 | 57 | 52 | 59 | 57 | 52 |
OTUs observed with each of the workflows analysed at distances of 3, 5 and 10% for the datasets containing 60 test sequences and 50 replicas of the same (50 × 60). JC stands for Jukes-Cantor and PreC for the pre-clustering step applied after Mothur MSA. Combined workflows are indicated stating first the method used for alignment (MAFFT, Mucle, Mothur or Mothur with pre-clustering), and then the distance method used (Jukes-Cantor, MAFFT or Mothur). Clustering was performed with Mothur for all combined workflows.
Alignment test
| 50 × 60 mutated (interleaved) | 50 × 60 mutated (stacked) | |||||
|---|---|---|---|---|---|---|
| 3% | 5% | 10% | 3% | 5% | 10% | |
| Reference expected values | 59 | 57 | 52 | 59 | 57 | 52 |
| CROP | 1959 | 1954 | 1955 | 1865 | 1850 | 1850 |
| ESPRIT | 193 | 59 | 56 | 205 | 59 | 56 |
| MAFFT+JC | 141 | 101 | 85 | 132 | 113 | 113 |
| MAFFT+MAFFT | 2289 | 1708 | 80 | 2289 | 1777 | 1270 |
| MAFFT+Mothur | 261 | 119 | 96 | 279 | 121 | 121 |
| Mothur+JC | 2947 | 2947 | 2948 | 2985 | 2985 | 2985 |
| Mothur+MAFFT | 899 | 685 | 477 | 2999 | 2999 | 2999 |
| Mothur+Mothur | 1087 | 923 | 736 | 1100 | 930 | 888 |
| Mothur+PreC+ JC | 1198 | 1198 | 1198 | 1333 | 1333 | 1333 |
| Mothur+PreC+MAFFT | 1328 | 1328 | 1328 | 1346 | 1346 | 1346 |
| Mothur+PreC+Mothur | 1080 | 938 | 938 | 1087 | 940 | 915 |
| MUSCLE+JC | 70 | 59 | 44 | 2999 | 2999 | 2999 |
| MUSCLE+MAFFT | 2287 | 1707 | 80 | 2288 | 1785 | 1269 |
| MUSCLE+Mothur | 264 | 64 | 57 | 571 | 466 | 466 |
| Otupipe | 139 | 91 | 59 | 144 | 95 | 61 |
| RDP | 59 | 58 | 53 | 59 | 57 | 52 |
OTUs observed with each of the workflows analysed at distances of 3, 5 and 10% for the datasets containing 50 different mutated replicas of the 60 test sequences stacked or interleaved.
Quince's 454 data
| Artificial | Priest Pot | |||||
|---|---|---|---|---|---|---|
| 3% | 5% | 10% | 3% | 5% | 10% | |
| CROP | 41 | 25 | 15 | 562 | 246 | 42 |
| ESPRIT | 248 | 77 | 38 | 1115 | 773 | 394 |
| MAFFT+JC | 686 | 686 | 686 | 3764 | 3764 | 3764 |
| MAFFT+MAFFT | 31933 | 31933 | 31933 | 15984 | 15984 | 15984 |
| MAFFT+Mothur | 1756 | 1756 | 1756 | 6672 | 6672 | 6672 |
| Mothur+JC | 49 | 36 | 33 | 640 | 537 | 537 |
| Mothur+MAFFT | 4276 | 4276 | 4276 | 2824 | 2824 | 2824 |
| Mothur+Mothur | 113 | 53 | 53 | 766 | 642 | 642 |
| Mothur+PreC+ JC | 61 | 40 | 36 | 662 | 482 | 482 |
| Mothur+PreC+MAFFT | 3864 | 3141 | 3141 | 1905 | 1625 | 1625 |
| Mothur+PreC+Mothur | 136 | 65 | 46 | 810 | 575 | 575 |
| MUSCLE+JC | 146 | 146 | 146 | 4059 | 4059 | 4059 |
| MUSCLE+MAFFT | 33491 | 33491 | 33491 | 15718 | 15718 | 15718 |
| MUSCLE+Mothur | 258 | 258 | 258 | 6433 | 6433 | 6433 |
| Otupipe | 66 | 39 | 24 | 793 | 570 | 302 |
| RDP | 250 | 94 | 43 | 1209 | 862 | 456 |
OTUs observed with each of the workflows analysed at distances of 3, 5 and 10% for Quince's Artificial and Priest Pot datasets. For the Priest Pot data, 855 OTUs at 3% and 699 at 5% were previously estimated by Quince et al. [26].
Huse short read data
| 3% | 5% | 10% | |
|---|---|---|---|
| CROP | NC | NC | NC |
| ESPRIT | 6464 | 3308 | 1402 |
| Unique:MAFFT+JC | 23442 | 23442 | 23442 |
| Unique: MAFFT+MAFFT | 23445 | 23445 | 23445 |
| Unique: MAFFT+Mothur | 23445 | 23445 | 23445 |
| Unique:Mothur+JC | 23441 | 23441 | 23441 |
| Unique: Mothur+MAFFT | 23441 | 23441 | 23441 |
| Unique: Mothur+Mothur | 18210 | 18210 | 18210 |
| Mothur+PreC+ JC | 15594 | 15594 | 15594 |
| Mothur+PreC+MAFFT | 15601 | 15601 | 15601 |
| Mothur+PreC+Mothur | 14776 | 14776 | 14776 |
| Unique:MUSCLE+JC | 22816 | 22816 | 22816 |
| Unique:MUSCLE+MAFFT | 23444 | 23444 | 23444 |
| Unique:MUSCLE+Mothur | 21318 | 21318 | 21318 |
| Otupipe | 2149 | 1422 | 878 |
| RDP | 4228 | 2932 | 1777 |
OTUs observed with each of the workflows analysed at distances of 3, 5 and 10% for a large number of short length reads dataset [24]. NC = non computable.
Near-full length sequences
| Skin axillary | Prairie soil | |||||
|---|---|---|---|---|---|---|
| 3% | 5% | 10% | 3% | 5% | 10% | |
| CROP | 1009 | 1009 | 1009 | 1128 | 1128 | 1128 |
| ESPRIT | 59 | 43 | 26 | 504 | 340 | 162 |
| MAFFT+JC | 47 | 39 | 39 | 490 | 346 | 183 |
| MAFFT+MAFFT | 266 | 126 | 109 | 1007 | 895 | 714 |
| MAFFT+Mothur | 50 | 39 | 36 | 503 | 351 | 181 |
| Mothur+JC | 47 | 39 | 36 | 491 | 352 | 184 |
| Mothur+MAFFT | 269 | 126 | 109 | 977 | 872 | 689 |
| Mothur+Mothur | 54 | 40 | 38 | 535 | 396 | 204 |
| Mothur+PreC+ JC | 47 | 39 | 36 | 491 | 353 | 268 |
| Mothur+PreC+MAFFT | 287 | 127 | 109 | 977 | 877 | 836 |
| Mothur+PreC+Mothur | 55 | 40 | 38 | 536 | 397 | 305 |
| MUSCLE+JC | 50 | 39 | 37 | 480 | 337 | 175 |
| MUSCLE+MAFFT | 266 | 126 | 109 | 1007 | 895 | 714 |
| MUSCLE+Mothur | 58 | 40 | 38 | 487 | 339 | 173 |
| Otupipe | 49 | 37 | 26 | 490 | 336 | 160 |
| RDP | 60 | 43 | 27 | 504 | 348 | 175 |
OTUs observed with each of the workflows analysed at distances of 3, 5 and 10% for the skin axillary microbiome data described by Hao et al. [23], and tall grass prairie soil data [37].
Comparative diversity analysis
| Control | GTZ | Glyphosate | ||||
|---|---|---|---|---|---|---|
| t1 | t2 | t1 | t2 | t1 | t2 | |
| CROP | 812 | 286 | 414 | 625 | 454 | 477 |
| ESPRIT | 1631 | 1053 | 1227 | 922 | 1951 | 1102 |
| MAFFT+JC | 1936 | 1087 | 1025 | 975 | 1977 | 1088 |
| MAFFT+MAFFT | 2792 | 1588 | 1971 | 1329 | 3296 | 1577 |
| MAFFT+Mothur | 2112 | 1215 | 1655 | 1163 | 2646 | 1326 |
| Mothur+JC | 3464 | 1622 | 3842 | 823 | 5024 | 1090 |
| Mothur+MAFFT | 2789 | 1589 | 1969 | 1328 | 3323 | 1574 |
| Mothur+Mothur | 1730 | 1128 | 1310 | 982 | 2122 | 1209 |
| Mothur+PreC+JC | 2479 | 1460 | 1958 | 826 | 3134 | 1498 |
| Mothur+PreC+MAFFT | 2471 | 1459 | 1961 | 1252 | 3135 | 1495 |
| Mothur+PreC+Mothur | 2063 | 1143 | 1890 | 1010 | 2516 | 1221 |
| MUSCLE+JC | 1762 | 1247 | 1257 | 950 | 2020 | 1238 |
| MUSCLE+MAFFT | 2790 | 1587 | 1970 | 1330 | 3296 | 1577 |
| MUSCLE+Mothur | 2278 | 1352 | 1800 | 1059 | 2772 | 1477 |
| Otupipe | 1314 | 881 | 938 | 795 | 1428 | 948 |
| RDP | 1762 | 1106 | 1236 | 901 | 1932 | 1094 |
OTUs observed with each of the workflows analysed. For simplicity, only results at 3% dissimilarity are shown from pooled samples collected at two different times from control, GTZ and glyphosate treated soils [32]. Full data at 3, 5 and 10% is provided in the comprehensive OTU results table of the Additional file 1, Tables S1.1, S1.2 and S1.3.
Observed output variability
| 50 × 60 seqs mut stacked | 50 × 60 seqs mut interleaved | Prairie soil | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| N | μ | Err | Min | Max | μ | Err | Min | Max | μ | Err | Min | Max | |
| CROP 3% | 20 | 1852.2 | 2.16 | 1847.6 | 1856.7 | 1954.6 | 0.83 | 1952.8 | 1956.3 | 1128 | 0 | 1128 | 1128 |
| CROP 5% | 20 | 1853.5 | 2.43 | 1848.4 | 1858.5 | 1952.4 | 0.66 | 1951 | 1953.7 | 1128 | 0 | 1128 | 1128 |
| CROP 10% | 20 | 1850.9 | 1.65 | 1847.4 | 1854.4 | 1951.7 | 0.62 | 1950.4 | 1953 | 1128 | 0 | 1128 | 1128 |
| ESPRIT 3% | 20 | 205 | 0 | 205 | 205 | 193 | 0 | 193 | 193 | 565 | 0 | 565 | 565 |
| ESPRIT 5% | 20 | 59 | 0 | 59 | 59 | 59 | 0 | 59 | 59 | 381 | 0 | 381 | 381 |
| ESPRIT 10% | 20 | 56 | 0 | 56 | 56 | 56 | 0 | 56 | 56 | 180 | 0 | 180 | 180 |
| Mothur 3% | 20 | 1465.3 | 0.23 | 1464.8 | 1465.8 | 1173 | 0 | 1173 | 1173 | 541 | 0 | 541 | 541 |
| Mothur 5% | 20 | 1465.3 | 0.23 | 1464.8 | 1465.8 | 1173 | 0 | 1173 | 1173 | 400 | 0 | 400 | 400 |
| Mothur 10% | 20 | 1465.3 | 0.23 | 1464.8 | 1465.8 | 1173 | 0 | 1173 | 1173 | 347 | 0 | 347 | 347 |
| Mothur+PreC 3% | 20 | 1490.6 | 0.16 | 1490.3 | 1490.9 | 1197.3 | 0.12 | 1197 | 1197.6 | 541 | 0 | 541 | 541 |
| Mothur+PreC 5% | 20 | 1490.6 | 0.16 | 1490.3 | 1490.9 | 1197.3 | 0.12 | 1197 | 1197.6 | 400 | 0 | 400 | 400 |
| Mothur+PreC 10% | 20 | 1490.6 | 0.16 | 1490.3 | 1490.9 | 1197.3 | 0.12 | 1197 | 1197.6 | 348 | 0 | 348 | 348 |
Observed output variability with 20 equal runs of CROP, ESPRIT and Mothur (with (PreC) and without preclustering). N is the number of observations for all samples; μ is the observed mean of the sample. Standard error (Err) and confidence intervals (Min and Max) were calculated from these values, as described in Methods. Many of the CROP calculations failed to complete in this test, as reflected by N (Prairie soil), and statistics were adjusted accordingly.