| Literature DB >> 21106499 |
Guy Cochrane1, Ilene Karsch-Mizrachi, Yasukazu Nakamura.
Abstract
Under the International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org), globally comprehensive public domain nucleotide sequence is captured, preserved and presented. The partners of this long-standing collaboration work closely together to provide data formats and conventions that enable consistent data submission to their databases and support regular data exchange around the globe. Clearly defined policy and governance in relation to free access to data and relationships with journal publishers have positioned INSDC databases as a key provider of the scientific record and a core foundation for the global bioinformatics data infrastructure. While growth in sequence data volumes comes no longer as a surprise to INSDC partners, the uptake of next-generation sequencing technology by mainstream science that we have witnessed in recent years brings a step-change to growth, necessarily making a clear mark on INSDC strategy. In this article, we introduce the INSDC, outline data growth patterns and comment on the challenges of increased growth.Entities:
Mesh:
Year: 2010 PMID: 21106499 PMCID: PMC3013722 DOI: 10.1093/nar/gkq1150
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(a) Base pairs in INSDC over time, excluding the Trace Archive (raw data from capillary sequencing platforms). Cumulative data volume in base pairs over time. (b) Base pairs in INSDC over time since 2002, broken down into selected data components. Cumulative data volume in base pairs broken down into assembled sequence (whole genome shotgun methods and others) and raw next-generation-sequence data.
Figure 2.Growth in complete genomes. The layered chart shows the number of complete genomes available from INSDC databases over time. The end of 2010 time point is conservatively (linearly) extrapolated from October 2010 figures, which are the latest available at the time of submission.
Figure 3.Taxonomic coverage. Growth in the number of taxa with associated sequence (or with subordinate taxa with associated sequence) over time.