| Literature DB >> 23573212 |
Ke Yu1, Tong Zhang.
Abstract
We developed a fast method to construct local sub-databases from the NCBI-nr database for the quick similarity search and annotation of huge metagenomic datasets based on BLAST-MEGAN approach. A three-step sub-database annotation pipeline (SAP) was further proposed to conduct the annotation in a much more time-efficient way which required far less computational capacity than the direct NCBI-nr database BLAST-MEGAN approach. The 1(st) BLAST of SAP was conducted using the original metagenomic dataset against the constructed sub-database for a quick screening of candidate target sequences. Then, the candidate target sequences identified in the 1(st) BLAST were subjected to the 2(nd) BLAST against the whole NCBI-nr database. The BLAST results were finally annotated using MEGAN to filter out those mistakenly selected sequences in the 1(st) BLAST to guarantee the accuracy of the results. Based on the tests conducted in this study, SAP achieved a speedup of ~150-385 times at the BLAST e-value of 1e-5, compared to the direct BLAST against NCBI-nr database. The annotation results of SAP are exactly in agreement with those of the direct NCBI-nr database BLAST-MEGAN approach, which is very time-consuming and computationally intensive. Selecting rigorous thresholds (e.g. e-value of 1e-10) would further accelerate SAP process. The SAP pipeline may also be coupled with novel similarity search tools (e.g. RAPsearch) other than BLAST to achieve even faster annotation of huge metagenomic datasets. Above all, this sub-database construction method and SAP pipeline provides a new time-efficient and convenient annotation similarity search strategy for laboratories without access to high performance computing facilities. SAP also offers a solution to high performance computing facilities for the processing of more similarity search tasks.Entities:
Mesh:
Year: 2013 PMID: 23573212 PMCID: PMC3613424 DOI: 10.1371/journal.pone.0059831
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The maps of sub-databases constructed by the proposed method.
16089, 1023, and 4318 NCBI-nr database sequences were annotated to fatty acid metabolism pathway, bisphenol A degradation metabolism pathway, and the four processes in nitrogen metabolism, respectively. (A) Fatty acid metabolism pathway sub-database; (B) Sub-database of Bisphenol A degradation pathway; (C) The four processes in nitrogen metabolism. The bar in the figures showed the number of sequences annotated to the EC numbers. The EC numbers with relative high counts were highlighted in purple.
Number of sequences derived from NCBI-nr database, which were annotated to the ammonification, de-nitrification, nitrification, and nitrogen fixation processes in nitrogen metabolism by MEGAN.
| Ammonification, de-nitrification, nitrification, and nitrogen fixation processes in nitrogen metabolism | ||||
| EC number | KO number | Name | Definition | Number of sequences derived from the NCBI-nr database |
| 1.13.12.- | K10944 | amoA | Ammonia monooxygenase subunit A | 3 |
| K10945 | amoB | Ammonia monooxygenase subunit B | 2 | |
| K10946 | amoC | Ammonia monooxygenase subunit C | 7 | |
| 1.13.12.16 | K00459 | E1.13.12.16 | Nitronate monooxygenase | 267 |
| 1.18.6.1 | K00531 | anfG | Nitrogenase | 13 |
| K02588 | nifH | Nitrogenase iron protein NifH | 247 | |
| K02586 | nifD | Nitrogenase molybdenum-iron protein alpha chain | 195 | |
| K02591 | nifK | Nitrogenase molybdenum-iron protein beta chain | 209 | |
| 1.7.1.1 | K00360 | E1.7.1.1 | Nitrate reductase (NADH) | 21 |
| 1.7.1.3 | K10534 | NIAD | Nitrate reductase (NADPH) | 6 |
| 1.7.1.4 | K00362 | nirB | Nitrite reductase (NAD(P)H) large subunit | 457 |
| K00363 | nirD | Nitrite reductase (NAD(P)H) small subunit | 303 | |
| 1.7.2.1 | K00368 | E1.7.2.1 | Nitrite reductase (NO-forming)tabl | 96 |
| 1.7.2.2 | K03385 | nrfA | Cytochrome c-552 | 147 |
| 1.7.3.4 | K10535 | hao | Hydroxylamine oxidase | 16 |
| 1.7.7.1 | K00366 | nirA | Ferredoxin-nitrite reductase | 216 |
| 1.7.7.2 | K00367 | narB | Ferredoxin-nitrate reductase | 20 |
| 1.7.99.4 | K00369 | narX | Nitrate reductase-like protein | 14 |
| K00370 | narG | Nitrate reductase 1, alpha subunit | 260 | |
| K00371 | narH | Nitrate reductase 1, beta subunit | 239 | |
| K00372 | E1.7.99.4C | Nitrate reductase catalytic subunit | 220 | |
| K00373 | narJ | Nitrate reductase 1, delta subunit | 226 | |
| K00374 | narI | Nitrate reductase 1, gamma subunit | 221 | |
| K02567 | napA | Periplasmic nitrate reductase NapA | 183 | |
| K08345 | narZ | Nitrate reductase 2, alpha subunit | 35 | |
| K08346 | narY | Nitrate reductase 2, beta subunit | 28 | |
| K08347 | narV | Nitrate reductase 2, gamma subunit | 24 | |
| 1.7.99.6 | K00376 | nosZ | Nitrous-oxide reductase | 98 |
| 1.7.99.7 | K02164 | norE | Nitric oxide reductase NorE protein | 53 |
| K02305 | norC | Nitric oxide reductase subunit C | 67 | |
| K02448 | norD | Nitric oxide reductase NorD protein | 67 | |
| K04561 | norB | Nitric oxide reductase subunit B | 210 | |
| K04747 | norF | Nitric oxide reductase NorF protein | 4 | |
| K04748 | norQ | Nitric oxide reductase NorQ protein | 123 | |
Comparison of the number of sequences annotated by SAP with direct BLAST against the NCBI-nr database.
| EC number | SAP 1st BLAST | SAP 2nd BLAST | BLAST against nr | EC number | SAP 1st BLAST | SAP 2nd BLAST | BLAST against nr |
| Fatty acid metabolism pathway | |||||||
| 1.1.1.1 | 37 | 24 | 24 | 1.14.15.3 | 9 | 3 | 3 |
| 1.1.1.35 | 55 | 43 | 43 | 1.18.1.3 | 1 | 0 | 0 |
| 1.2.1.3 | 91 | 53 | 53 | 2.3.1.16 | 11 | 8 | 8 |
| 1.3.3.6 | 2 | 2 | 2 | 2.3.1.9 | 67 | 52 | 52 |
| 1.3.99.- | 3 | 3 | 3 | 4.2.1.17 | 59 | 48 | 48 |
| 1.3.99.2 | 45 | 11 | 11 | 5.1.2.3 | 16 | 13 | 13 |
| 1.3.99.3 | 90 | 40 | 40 | 5.3.3.8 | 17 | 13 | 13 |
| 1.3.99.7 | 24 | 20 | 20 | 6.2.1.3 | 100 | 50 | 50 |
| 1.3.99.13 | 10 | 7 | 7 | 6.2.1.20 | 3 | 3 | 3 |
| 1.14.14.1 | 5 | 2 | 2 | ||||
| Bisphenol A degradation pathway | |||||||
| 1.1.-.- | 7 | 4 | 4 | 1.14.13.- | 21 | 5 | 5 |
| 1.1.1.- | 61 | 16 | 16 | ||||
| Ammonification, de-nitrification, nitrification, and nitrogen fixation processes in nitrogen metabolism | |||||||
| 1.13.12.- | 6 | 2 | 2 | 1.7.1.4 | 14 | 13 | 13 |
| 1.13.12.16 | 5 | 1 | 1 | 1.7.2.1 | 11 | 5 | 5 |
| 1.18.6.1 | 5 | 5 | 5 | 1.7.99.6 | 15 | 10 | 10 |
| 1.7.1.1 | 1 | 1 | 1 | 1.7.99.7 | 26 | 19 | 19 |
Time consumption in SAP annotation and direct the NCBI-nr annotation.
| Testdataset | Number of reads | Read length (nt) | Sub-database | BLAST e-value | Subjects number of sub-database | Running time (CPU hours) | Fold increase in speed | |||
| SAP (1st step) | SAP (2nd step) | SAP total | NCBI-nr annotation | |||||||
| SL_DNA | 300,000 | 100 | FA | 1e–5 | 16,089 | 0.13±0.01 | 0.86±0.05 | 0.99±0.05 | 150±0.4 | 152 |
| N | 4,318 | 0.16±0.01 | 0.42±0.03 | 0.58±0.04 | 259 | |||||
| BPA | 1,023 | 0.03±0 | 0.36±0.02 | 0.39±0.02 | 385 | |||||
| FA | 1e–10 | 16,089 | 0.13±0.01 | 0.22±0.01 | 0.35±0.02 | 150±0.4 | 429 | |||
| N | 4,318 | 0.16±0.01 | 0.08±0 | 0.24±0.01 | 625 | |||||
| BPA | 1,023 | 0.01±0 | 0.04±0 | 0.05±0 | 3000 | |||||
Fold increase in speed refers to the speedup achieved by SAP as compared to the direct NCBI-nr blast.
Figure 2Procedure of sub-database construction using MEGAN.
Figure 3Sub-database annotation pipeline.
A) SAP pipeline: two steps BLAST using coupled sub-database and the NCBI-nr database; B) Direct BLAST against the NCBI-nr database.