| Literature DB >> 21899761 |
Sitao Wu1, Zhengwei Zhu, Liming Fu, Beifang Niu, Weizhong Li.
Abstract
BACKGROUND: The new field of metagenomics studies microorganism communities by culture-independent sequencing. With the advances in next-generation sequencing techniques, researchers are facing tremendous challenges in metagenomic data analysis due to huge quantity and high complexity of sequence data. Analyzing large datasets is extremely time-consuming; also metagenomic annotation involves a wide range of computational tools, which are difficult to be installed and maintained by common users. The tools provided by the few available web servers are also limited and have various constraints such as login requirement, long waiting time, inability to configure pipelines etc.Entities:
Mesh:
Year: 2011 PMID: 21899761 PMCID: PMC3180703 DOI: 10.1186/1471-2164-12-444
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Illustration of WebMGA and its metagenomic analysis functions. The major input of WebMGA is either a DNA sequence file or a protein sequence file. A user can run a single analysis at a time such as to prediction ORFs from the uploaded DNA sequences. A user can also use a script to call WebMGA to run multiple analyses or run a pipeline where one job can use the output of another job.
Figure 2A screenshot and examples of output results of WebMGA. (a) A screenshot of WebMGA server (b) A plot of distribution of clusters by CD-HIT (c) COG annotation results are in several "TAB" delimited text files, which can be easily viewed locally. (d) A plot of length distribution by sequence statistical tool
Figure 3A simple example pipeline configured with tools in WebMGA.
Computational time and throughput for each tool of WebMGA
| Category | Tool | Dataa | Wall time | Total CPU time | Daily throughputb |
|---|---|---|---|---|---|
| Clustering | CD-HIT-EST | 1 | 00:08:53 | 00:34:08 | 3,113 |
| CD-HIT | 2 | 00:00:58 | 00:02:52 | 23,040 | |
| H-CD-HIT | 2 | 00:20:06 | 01:10:26 | 1,600 | |
| CD-HIT-454 | 1 | 00:05:40 | 00:21:54 | 4,800 | |
| rRNA | BLASTN-rRNA | 1 | 00:12:43 | 13:44:53 | 139 |
| hmm-rRNA | 1 | 00:01:56 | 00:20:35 | 5,008 | |
| tRNA | tRNA-scan | 1 | 00:02:29 | 02:01:50 | 936 |
| ORF calling | ORF-finder | 1 | 00:02:06 | 00:02:06 | 23,040 |
| Metagene | 1 | 00:16:21 | 00:15:21 | 6,400 | |
| FragGeneScan | 1 | 01:27:50 | 01:27:50 | 1,294 | |
| Function | COG | 2 | 00:14:55 | 15:12:50 | 126 |
| KOG | 2 | 00:15:16 | 16:25:31 | 116 | |
| PRK | 2 | 00:28:38 | 32:03:16 | 59 | |
| PFAM | 2 | 01:33:44 | 115:30:23 | 16 | |
| TIGRFAM | 2 | 00:53:23 | 62:31:51 | 30 | |
| Pathway | KEGG | 2 | 20:24:33 | 553:32:48 | 3 |
| Statistics | FNA-stat | 1 | 00:00:38 | 00:00:38 | 43,746 |
| FAA-stat | 2 | 00:00:12 | 00:00:12 | 52,363 | |
| Quality control | QC-filter-FASTQ | 1 | 00:03:13 | 00:03:13 | 19,200 |
| QC-filter-FASTA-qual | 1 | 00:02:47 | 00:02:47 | 23,040 | |
| Trim | 1 | 00:04:00 | 00:04:00 | 16,457 | |
| Filtering | Filter-human | 1 | 00:40:28 | 02:29:57 | 762 |
| Binning | RDP-binning | 1 | 01:16:30 | 01:20:00 | 1,404 |
| FR-HIT-binning | 1 | 00:36:59 | 02:13:53 | 853 | |
| OTU clustering | CD-HIT-OTU | 3 | 00:05:10 | 00:10:23 | 8,861 |
| File conversion | FASTQ2FASTA | 1 | 00:02:24 | 00:02:24 | 23,040 |
a See text for descriptions of the 3 datasets tested.
b Daily throughput is calculated as the daily CPU time of WebMGA cluster with 80 cores divided by the total CPU time of a job, assuming 2 minutes of administrative CPU cost such as job queuing, file coping etc. for each job.
Annotation summary for example dataset
| Tool | Annotation Summarya |
|---|---|
| FNA-stat | Total reads: 555853, Length: 45~607, Average length: 251, |
| CD-HIT-EST | Parameters: "-d 0 -n 10 -l 11 -r 1 -p 1 -g 1 -G 0 -c 0.95 -aS 0.8" |
| HMM-rRNA | rRNA sequences identified: 3858, |
| tRNA-SCAN | tRNA sequences identified: 1378 |
| Metagene | ORFs: 571261 |
| FAA-stat | Total ORFs: 555853, Length: 20-121, Average length: 66, |
| CD-HIT | Parameters: "-d 0 -n 5 -p 1 -g 1 -G 0 -c 0.9 -aS 0.8" |
| COG | Parameters: "-e 0.001" |
| PFAM | Parameters: "-e 0.001" |
| TIGRFAM | Parameters: "-e 0.001" |
a Detailed parameters are explained at WebMGA website.