| Literature DB >> 27011302 |
Nadezda V Kovalevskaya1,2, Charlotte Whicher2, Timothy D Richardson2, Craig Smith1,2, Jana Grajciarova2, Xocas Cardama2, José Moreira2, Adrian Alexa2, Amanda A McMurray2, Fiona G G Nielsen1,2.
Abstract
There is no unified place where genomics researchers can search through all available raw genomic data in a way similar to OMIM for genes or Uniprot for proteins. With the recent increase in the amount of genomic data that is being produced and the ever-growing promises of precision medicine, this is becoming more and more of a problem. DNAdigest is a charity working to promote efficient sharing of human genomic data to improve the outcome of genomic research and diagnostics for the benefit of patients. Repositive, a social enterprise spin-out of DNAdigest, is building an online platform that indexes genomic data stored in repositories and thus enables researchers to search for and access a range of human genomic data sources through a single, easy-to-use interface, free of charge.Entities:
Mesh:
Year: 2016 PMID: 27011302 PMCID: PMC4807091 DOI: 10.1371/journal.pbio.1002418
Source DB: PubMed Journal: PLoS Biol ISSN: 1544-9173 Impact factor: 8.029
Fig 1Estimated minimum annual human genome sequencing capacity based on sales of Illumina HiSeq X annual throughput capacity (at least 16 systems sold worldwide) and the amount of data available up to 2014 via dbGaP—one of the largest repositories for clinical human genomic data [12].
Taking into account that a whole genome sequence is ~200 GB in size, this corresponds to ~360,000 and ~3,000 human genomes, respectively [9,10,13].
A list of repositories where researchers can download or upload genomic data.
| Repository | Data Types | Description | URL |
|---|---|---|---|
|
| Raw sequence data and phenotypic data | Database of Genotypes and Phenotypes, developed to archive and distribute the results of studies that have investigated the interaction of genotype and phenotype. |
|
|
| Variant data | Database of genomic structural variation—it contains insertions, deletions, duplications, inversions, multinucleotide substitutions, mobile element insertions, translocations, and complex chromosomal rearrangements. |
|
|
| Variant data | Database of single nucleotide polymorphisms (SNPs) and multiple small-scale variations that include insertions and deletions, microsatellites, and non-polymorphic variants. |
|
|
| Raw sequencing data | Public functional genomics data repository supporting Minimum Information About a Microarray Experiment (MIAME)-compliant data submissions. Tools are provided to help users query and download experiments and curated gene expression profiles. |
|
|
| Raw sequencing data | Stores raw sequencing data and alignment information from high-throughput sequencing platforms. |
|
|
| Variant data | Aggregates information about genomic variation and its relationship to human health. |
|
|
| Raw sequence data and phenotypic data | Allows you to explore datasets from genomic studies, provided by a range of data providers. |
|
|
| Raw sequencing data | A comprehensive record of the world's nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. |
|
|
| Variant data | An open-access database of all types of genetic variation data from all species. |
|
|
| Raw sequencing data | Archive of Functional Genomics Data stores data from high-throughput functional genomics experiments, and provides these data for reuse to the research community. |
|
|
| Raw sequencing data | Collects nucleotide sequence data as a member of the International Nucleotide Sequence Database Collaboration (INSDC) and provides freely available nucleotide sequence data and supercomputer system, to support research activities in life science. |
|
|
| Raw sequencing data | A service for permanent archiving and sharing of all types of individual-level genetic and de-identified phenotypic data resulting from biomedical research projects. The JGA contains exclusive data collected from individuals whose consent agreements authorize data release only for specific research use or to bona fide researchers. |
|
|
| Variant data | Stores and displays somatic mutation information and related details and contains information relating to human cancers. There are two types of data in COSMIC: expert manual curation data and systematic screen data. |
|
|
| Variant data and phenotypic data | Database contains data from >17,800 patients who have given consent for broad data-sharing. Used by the clinical community to share and compare phenotypic and genotypic data. |
|
|
| Raw sequencing data | A repository where users can make all of their research outputs available in a citable, shareable, and discoverable manner. |
|
|
| Raw sequencing data | A curated resource that makes the data underlying scientific publications discoverable, freely reusable, and citable. Dryad provides a general-purpose home for a wide diversity of datatypes. |
|
|
| Variant data | A free, flexible, Web-based, open source database developed and designed to collect and display variants in the DNA sequence. |
|
|
| Raw sequencing data | Associated with the journal GigaScience, contains discoverable, trackable, and citable datasets that are available for public download and use. |
|
|
| Variant data and phenotypic data | A repository of biomaterials and phenotypic and genotypic data to aid research on autism spectrum disorders. |
|
|
| Raw sequencing data | A collaborative project aiming to provide genetic testing customers with the knowledge and tools they need to make the most of their own genetic data. As part of the project, members are taking commercial genetic tests and making the raw data publicly available for others to download, analyse, and reuse. |
|
|
| Raw sequencing data | Allows individuals to publish their genetic test results, find others with similar genetic variations, learn more about their results, get the latest primary literature on their variations, and help scientists find new associations. |
|
a Restricted access repositories.
A list of downloadable genomic data collections.
| Repository | Data Types | Description | URL |
|---|---|---|---|
|
| Raw sequencing data | A coalition of investigators seeking to aggregate exome sequencing data from a wide variety of large-scale sequencing projects and to make summary data available for the wider scientific community. |
|
|
| Raw sequencing and phenotypic data | Comprehensive genomic characterisation and analysis of various cancers. |
|
|
| Variant data | Comprehensive description of genomic, transcriptomic, and epigenomic changes in 50 different tumour types and/or subtypes that are of clinical and societal importance across the globe. |
|
|
| Raw sequencing data | The first project to sequence the genomes of a large number of people, to provide a comprehensive resource on human genetic variation. |
|
|
| Raw sequencing data | Aiming to build a comprehensive parts list of functional elements in the human genome. |
|
|
| Raw sequencing data | National Heart, Lung and Blood Institute (NHLBI) Exome Sequencing Project (ESP) aims to discover novel genes and mechanisms contributing to heart, lung, and blood disorders by applying next-generation sequencing of the protein coding regions of the human genome and to share these datasets and findings with the scientific community. |
|
|
| Raw sequencing data | A group of research studies creating freely available scientific resources that bring together genomic, environmental, and human trait data donated by volunteers. |
|
|
| Raw sequencing data | The Dutch biobank collaboration BBMRI-NL has initiated the extensive Rainbow Project “Genome of the Netherlands” (GoNL) to build a global genetic profile of large numbers of Dutch. |
|
|
| Raw sequencing data | A dataset of diverse, high-quality human genome sequences. |
|
|
| Raw sequencing data | This site contains the reference sequence and working draft assemblies for a large collection of genomes. |
|
a Restricted access repositories.
Fig 2Repositive is an online platform indexing public human genomic data repositories.
It enables registered users to find, access, and share genomic data that is consented for research use.
Fig 3The Repositive platform provides benefits for both sides of the data exchange.