| Literature DB >> 23193291 |
Ilkka Lappalainen1, John Lopez, Lisa Skipper, Timothy Hefferon, J Dylan Spalding, John Garner, Chao Chen, Michael Maguire, Matt Corbett, George Zhou, Justin Paschall, Victor Ananiev, Paul Flicek, Deanna M Church.
Abstract
Much has changed in the last two years at DGVa (http://www.ebi.ac.uk/dgva) and dbVar (http://www.ncbi.nlm.nih.gov/dbvar). We are now processing direct submissions rather than only curating data from the literature and our joint study catalog includes data from over 100 studies in 11 organisms. Studies from human dominate with data from control and case populations, tumor samples as well as three large curated studies derived from multiple sources. During the processing of these data, we have made improvements to our data model, submission process and data representation. Additionally, we have made significant improvements in providing access to these data via web and FTP interfaces.Entities:
Mesh:
Year: 2012 PMID: 23193291 PMCID: PMC3531204 DOI: 10.1093/nar/gks1213
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The data growth since DGVa and dbVar services was launched. The graph shows accumulation of variant calls, stratified by organism. Several large datasets such as the 1000 Genomes project pilot (estd59) and phase I (estd199), structural variation data from 17 in-bred mouse strains (estd118) and the first releases of somatic structural variation from the COSMIC database (estd192), case-control and case-only studies on developmental delay (nstd54) and the International Standard Cytogenetic Array (ISCA) consortium data (nstd37). In addition to human and mouse data the archives include data from dog, pig, fruit fly, macaque, cow, horse, zebrafish, sorghum and chimp.
Figure 2.Graphical representation of the archive data model. The three accessioned objects (studies, calls and regions) are prefixed by an ‘n’ if submitted to dbVar and an ‘e’ if submitted to DGVa. Variation in individual sample genomes is aggregated to a variant region, with respect to a reference genome. Genomic position (indicated by green arrows) does not necessarily overlap completely. Study authors describe the aggregation process in the Assertion method attribute. Discovery and validation methods for each call are stored in the Experiment attribute. This facilitates cross-study analysis of GSV identified using different techniques. Studies point to any external resources that provide access to the raw data used in the experiment or to the publication describing the data.
Variant call types and variant region types
| Variant call type | Associated variant region type |
|---|---|
| Copy number gain | CNV |
| Copy number loss | CNV |
| Deletion | CNV |
| Duplication | CNV |
| Insertion | Insertion |
| Mobile element insertion | Mobile element insertion |
| Novel sequence insertion | Novel sequence insertion |
| Tandem duplication | Tandem duplication |
| Translocation | Translocation |
| Interchromosomal breakpoint | Interchromosomal breakpoint |
| Intrachromosomal breakpoint | Intraschromosomal breakpoint |
| Complex | Complex |
| Unknown | Unknown |
The complex region type can be used for any region where calls of different type (other than CNV) have been called and aggregated into a region by the user. CNV = Copy Number Variation.
Figure 3.Rendering of breakpoint ambiguity (A) is shown. Variants with breakpoint resolution are shown with fully saturated color. Breakpoints defining by a range (using inner/outer starts and stops) are shown as fully saturated for the high confidence intervals (the regions defined by the inner start-stop) while the region of breakpoint ambiguity is shown as transparent. In many cases, an undefined breakpoint is submitted, but no likelihood range is provided; in these cases triangles pointing towards each other (when only outer coordinates are provided) or pointing out (when inner coordinates are provided). Rendering call and region type (B) is usually designated by color. SV corresponds to variant region and SSV corresponds to variant calls.