| Literature DB >> 27638885 |
Laura Clarke1, Susan Fairley1, Xiangqun Zheng-Bradley1, Ian Streeter1, Emily Perry1, Ernesto Lowy1, Anne-Marie Tassé2, Paul Flicek3.
Abstract
The International Genome Sample Resource (IGSR; http://www.internationalgenome.org) expands in data type and population diversity the resources from the 1000 Genomes Project. IGSR represents the largest open collection of human variation data and provides easy access to these resources. IGSR was established in 2015 to maintain and extend the 1000 Genomes Project data, which has been widely used as a reference set of human variation and by researchers developing analysis methods. IGSR has mapped all of the 1000 Genomes sequence to the newest human reference (GRCh38), and will release updated variant calls to ensure maximal usefulness of the existing data. IGSR is collecting new structural variation data on the 1000 Genomes samples from long read sequencing and other technologies, and will collect relevant functional data into a single comprehensive resource. IGSR is extending coverage with new populations sequenced by collaborating groups. Here, we present the new data and analysis that IGSR has made available. We have also introduced a new data portal that increases discoverability of our data-previously only browseable through our FTP site-by focusing on particular samples, populations or data sets of interest.Entities:
Mesh:
Year: 2016 PMID: 27638885 PMCID: PMC5210610 DOI: 10.1093/nar/gkw829
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
IGSR Data Element definitions
| Data Element | Descriptive Example |
|---|---|
| Data Collection | Study or project level grouping of data such as the 1000 Genomes Project Phase 3 or the Human Genome Structural Variation (HGSV) Consortium. |
| Analysis group | The experimental strategy used to generate data. Data from the same analysis group and data collection can generally be analyzed as a coherent unit. Examples include exome sequencing or high coverage whole genome Illumina sequencing. Data from different data collections can have the same analysis group. |
| Data Type | Specific data description and currently one of Sequence, Alignment, Variants or Other. |
| Population | A defined group of samples normally collected from the same geographic location and ethnic group. |
| Sample | An individual who donated material to a project. |
IGSR Data Collections. The number of samples, number of populations and available data types in each data collection
| Data Collection | Samples | Populations | Sequence | Alignment | Variants |
|---|---|---|---|---|---|
| The 1000 Genomes Project Phase 1 | 1092 | 14 | Y | Y | Y |
| The 1000 Genomes Project Phase 3 | 2504 | 26 | Y | Y | Y |
| The 1000 Genomes Project GRCh38 | 2706 | 26 | Y | Y | |
| Illumina Platinum Pedigree | 17 | 1 | Y | Y | |
| HGSV | 9 | 3 | Y | Y | |
| The Gambian Genome Variation Project | 394 | 4 | Y | ||
| Simons Diversity Project | 279 | 130 | Y | ||
| Human Genome Diversity Project | 177 | 52 | Y |
Analysis groups. The description and project for each of the IGSR analysis groups
| Analysis group | Description |
|---|---|
| Exome | The 1000 Genomes Project exome sequencing |
| High Coverage WGS | The 1000 Genomes Project high coverage sequencing |
| Low Coverage WGS | The 1000 Genomes Project low coverage sequencing |
| Strand specific RNA | HGSV strand specific Illumina RNA-Seq |
| 3.5 kb jump | HGSV Illumina 3.5 kb jumping library sequence |
| HiC | HGSV HiC chromatin conformation sequencing |
| PCR free high | HGSV PCR free high coverage sequencing |
| Strand Seq | HGSV Illumina Strand Seq sequencing |
| SV 7 kb mate | HGSV Illumina 7 kb insert mate pair library sequencing |
| SV SMRT | HGSV Single Molecule Real Time Sequencing |
| Illumina platinum ped | Illumina Platinum Pedigree |
| CG | The 1000 Genomes Project Complete Genomics sequencing |
| Integrated calls | The 1000 Genomes Project integrated variant calls |
| HD genotype chip | The 1000 Genomes Project high density genotype chip results |
Figure 1.Using the IGSR Data Portal. The IGSR Data Portal is a powerful and flexible way to explore the IGSR data. (A) The main page of the portal is a tabular listing of all of the available samples with options for filtering by population, analysis group and data collection. Both sample reference numbers and population names are linked to further specific information and there is a link to the search interface at the top of the page. (B) The search page enables direct access to samples and populations. (C) An individual sample page gives descriptive data about a specific sample including related individuals, how to find more information about it in other databases or purchase cell lines for further experiments, if available. Sample pages also allow for filtering by data collection, data type and analysis group with the resulting data files listed on the bottom right of the page for immediate download. (D) Pages for each population list all of the samples with in the population as well as the options filter by data collection, data type and analysis group. As for the sample page, all available data files are linked from the list on the bottom right of the page.