| Literature DB >> 34175476 |
Meili Chen1, Yingke Ma1, Song Wu2, Xinchang Zheng1, Hongen Kang2, Jian Sang2, Xingjian Xu2, Lili Hao1, Zhaohua Li2, Zheng Gong2, Jingfa Xiao2, Zhang Zhang2, Wenming Zhao2, Yiming Bao3.
Abstract
The Genome Warehouse (GWH) is a public repository housing genome assembly data for a wide range of species and delivering a series of web services for genome data submission, storage, release, and sharing. As one of the core resources in the National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB; https://ngdc.cncb.ac.cn), GWH accepts both full and partial (chloroplast, mitochondrion, and plasmid) genome sequences with different assembly levels, as well as an update of existing genome assemblies. For each assembly, GWH collects detailed genome-related metadata of biological project, biological sample, and genome assembly, in addition to genome sequence and annotation. To archive high-quality genome sequences and annotations, GWH is equipped with a uniform and standardized procedure for quality control. Besides basic browse and search functionalities, all released genome sequences and annotations can be visualized with JBrowse. By May 21, 2021, GWH has received 19,124 direct submissions covering a diversity of 1108 species and has released 8772 of them. Collectively, GWH serves as an important resource for genome-scale data management and provides free and publicly accessible data to support research activities throughout the world. GWH is publicly accessible at https://ngdc.cncb.ac.cn/gwh.Entities:
Keywords: Genome Warehouse; Genome annotation; Genome sequence; Genome submission; Quality control
Mesh:
Year: 2021 PMID: 34175476 PMCID: PMC9039550 DOI: 10.1016/j.gpb.2021.04.001
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 6.409
Fig. 1Data model in GWH Genome assembly accession numbers are represented as, for example, “GWHAAAA00000000”, in which the “AAAA” can be replaced by any four other capital English letters representing different genome assemblies. The first genome sequence under the genome assembly is represented as “GWHAAAA00000001”, and other genome sequences under the same genome assembly are represented with the last eight digits increasing in order (“GWHAAAA00000002”, “GWHAAAA00000003”, etc.). For the first gene sequence, transcript sequence, and protein sequence under the genome assembly, the accession numbers are assigned as “GWHGAAAA000001”, “GWHTAAAA000001”, “GWHPAAAA000001”, respectively, and the last six digits are increasing in order for other genes, transcripts, and proteins.
Fig. 2Major components in GWH data processing workflow.
Total data holdings in GWH.
| Status | Type | Animal | Plant | Fungus | Bacterium | Archaea | Virus | Metagenome | Others | Total |
|---|---|---|---|---|---|---|---|---|---|---|
| Released | Assembly | 531 | 251 | 16 | 291 | 103 | 915 | 6651 | 14 | 8772 |
| Species | 90 | 159 | 14 | 109 | 11 | 23 | 5 | 12 | 423 | |
| Unpublic | Assembly | 7490 | 1334 | 104 | 76 | 19 | 858 | 10 | 461 | 10,352 |
| Species | 38 | 642 | 7 | 8 | 5 | 4 | 3 | 9 | 716 | |
| Total | Assembly | 8021 | 1585 | 120 | 367 | 122 | 1773 | 6661 | 475 | 19,124 |
| Species | 125 | 786 | 20 | 113 | 13 | 25 | 7 | 19 | 1108 |
Note: The numbers of genome assemblies and covering species are those directly submitted to GWH, and their percentages (in parentheses) for different organism groups are presented. GWH, Genome Warehouse.
Fig. 3Statistics of genome assemblies in GWH (as of May 21, 2021) A. All assemblies. B. Publicly released assemblies. Assemblies at contig, scaffold, chromosome, and complete levels are shown in different colors.