| Literature DB >> 28387199 |
Yanqing Wang1, Fuhai Song2, Junwei Zhu1, Sisi Zhang1, Yadong Yang2, Tingting Chen1, Bixia Tang3, Lili Dong1, Nan Ding4, Qian Zhang4, Zhouxian Bai2, Xunong Dong2, Huanxin Chen1, Mingyuan Sun1, Shuang Zhai1, Yubin Sun1, Lei Yu1, Li Lan1, Jingfa Xiao5, Xiangdong Fang6, Hongxing Lei7, Zhang Zhang8, Wenming Zhao9.
Abstract
With the rapid development of sequencing technologies towards higher throughput and lower cost, sequence data are generated at an unprecedentedly explosive rate. To provide an efficient and easy-to-use platform for managing huge sequence data, here we present Genome Sequence Archive (GSA; http://bigd.big.ac.cn/gsa or http://gsa.big.ac.cn), a data repository for archiving raw sequence data. In compliance with data standards and structures of the International Nucleotide Sequence Database Collaboration (INSDC), GSA adopts four data objects (BioProject, BioSample, Experiment, and Run) for data organization, accepts raw sequence reads produced by a variety of sequencing platforms, stores both sequence reads and metadata submitted from all over the world, and makes all these data publicly available to worldwide scientific communities. In the era of big data, GSA is not only an important complement to existing INSDC members by alleviating the increasing burdens of handling sequence data deluge, but also takes the significant responsibility for global big data archive and provides free unrestricted access to all publicly available data in support of research activities throughout the world.Entities:
Keywords: Big data; GSA; Genome Sequence Archive; INSDC; Raw sequence data
Mesh:
Year: 2017 PMID: 28387199 PMCID: PMC5339404 DOI: 10.1016/j.gpb.2017.01.001
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Figure 1Data model in GSAPrefixes of accession numbers for data objects, including BioProject, BioSample, Experiment, and Run, are indicated in red. Data objects Experiment and Run constitute China Read Archive.
Figure 2Data statistics of GSAA. Numbers of BioProjects and BioSamples in GSA. B. Numbers of Experiments and Runs, as well as file size in GSA. All statistics are based on data submissions ranging from December 2015 to December 2016.
Figure 3Graphic illustration of data submissions to GSATwo representative studies are provided here as examples to depict the data objects involved in data submission.