| Literature DB >> 30365027 |
Wenyu Shi1, Heyuan Qi1, Qinglan Sun1, Guomei Fan1, Shuangjiang Liu1,2, Jun Wang3, Baoli Zhu3,4,5,6, Hongwei Liu7, Fangqing Zhao8, Xiaochen Wang1, Xiaoxuan Hu1, Wei Li1, Jia Liu9, Ye Tian9, Linhuan Wu1,2, Juncai Ma1,2.
Abstract
Meta-omics approaches have been increasingly used to study the structure and function of the microbial communities. A variety of large-scale collaborative projects are being conducted to encompass samples from diverse environments and habitats. This change has resulted in enormous demands for long-term data maintenance and capacity for data analysis. The Global Catalogue of Metagenomics (gcMeta) is a part of the 'Chinese Academy of Sciences Initiative of Microbiome (CAS-CMI)', which focuses on studying the human and environmental microbiome, establishing depositories of samples, strains and data, as well as promoting international collaboration. To accommodate and rationally organize massive datasets derived from several thousands of human and environmental microbiome samples, gcMeta features a database management system for archiving and publishing data in a standardized way. Another main feature is the integration of more than ninety web-based data analysis tools and workflows through a Docker platform which enables data analysis by using various operating systems. This platform has been rapidly expanding, and now hosts data from the CAS-CMI and a number of other ongoing research projects. In conclusion, this platform presents a powerful and user-friendly service to support worldwide collaborative efforts in the field of meta-omics research. This platform is freely accessible at https://gcmeta.wdcm.org/.Entities:
Mesh:
Year: 2019 PMID: 30365027 PMCID: PMC6324004 DOI: 10.1093/nar/gky1008
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.General pipeline of the gcMeta platform. The functional services of gcMeta can be described in three parts: data management, data analysis and data publication. Users submit the meta-data and primary raw data into the system under their own accounts. Users are allowed to analyze the data by preinstalled tools and workflows. Data and results could be downloaded for further analysis. A unique identifier PID would be assigned to each record before the data is public. If the data is further cited in other resources with the PID, the citation could be traced automatically.
Figure 2.Database schema of gcMeta. Main data structure and relationships between the different tables are illustrated.
Figure 3.Screenshots and examples of user cases in gcMeta. (A) Homepage of the gcMeta. Statistic number of public and private studies, samples, experiments and runs are showed in the homepage. (B) A screenshot of data submission by web table. Each entry could be set ‘private’ or ‘public’ as highlighted in the red box. (C) A screenshot of database browser. In the search interface, search results could be filtered by ‘experiment type’, ‘sample environment’ and ‘data sources’.
Tools embedded in the gcMeta platform. The tools belong to the group raw reads preprocessing, sequence assembly, genome structural analysis, database annotation, community profiling and sequence alignment are set as red, blue, purple, orange green and yellow respectively. BBtools software suite (http://jgi.doe.gov/data-and-tools/bbtools/), FastQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/), fastp (https://github.com/OpenGene/fastp/), Trim Galore (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/), minced (https://github.com/ctSkennerton/minced/tree/master) and RepeatMasker (http://ftp.genome.washington.edu/cgi-bin/RepeatMasker) are all referenced to their websites
|
|
Figure 4.Integrated workflows on gcMeta. The tools can be grouped into 6 clusters shown in different colors (metagenome binning, taxonomic assignment and downstream analysis are all belong to the group community profiling shown in green color). Tools from different functional groups are connected in proper sequence to create workflows. Five main workflows covering different tools according to analysis aims are accessible from a unified user interface exemplified. Comparative analysis tools (shown in yellow) are widely involved in all the workflows. NGS and TGS stands for next-generation sequencing and third-generation sequencing, respectively.
Figure 5.Screenshots of the utility of the tool and workflow. (A, B) The ANI (average nucleotide identity) and dDDH (digital DNA-DNA hybridization) calculation tool which can be used by guest users. (A) Screenshots of the job submission including file upload module and necessary arguments setting. (B) The results of the job. (C–F) Metagenomic 16S rRNA sequencing taxonomic assignment workflow. (C) A screenshot of the sketch of the workflow. (D) The screenshots of the inputs, ouputs and arguments settings. (E) The result of the workflow. (F) The screenshots of the visualization of the analysis result. The example shows PCoA plot generated by ggplot package.
Figure 6.System structure of gcMeta. The platform integrates storage cluster and computing cluster resources with database management system and Docker based tools and workflows to supply comprehensive data archive, publication and analysis service to users.