| Literature DB >> 26834991 |
Elise Ruark1, Márton Münz2, Anthony Renwick1, Matthew Clarke1, Emma Ramsay1, Sandra Hanks1, Shazia Mahamdallie1, Anna Elliott1, Sheila Seal1, Ann Strydom1, Lunter Gerton2, Nazneen Rahman3.
Abstract
To enhance knowledge of gene variation in outbred populations, and to provide a dataset with utility in research and clinical genomics, we performed exome sequencing of 1,000 UK individuals from the general population and applied a high-quality analysis pipeline that includes high sensitivity and specificity for indel detection. Each UK individual has, on average, 21,978 gene variants including 160 rare (0.1%) variants not present in any other individual in the series. These data provide a baseline expectation for gene variation in an outbred population. Summary data of all 295,391 variants we detected are included here and the individual exome sequences are available from the European Genome-phenome Archive as the ICR1000 UK exome series. Furthermore, samples and other phenotype and experimental data for these individuals are obtainable through application to the 1958 Birth Cohort committee.Entities:
Keywords: NGS; exome; exome sequencing; gene variation; next-generation sequencing; population genetics; variant
Year: 2015 PMID: 26834991 PMCID: PMC4706061 DOI: 10.12688/f1000research.7049.1
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. Population structure of the ICR1000 UK exome series.
Plot of first and second principal components from PCA using HapMap populations and the ICR1000 UK exome series showing that the series clusters with the Central European population, with no ethnic outliers.
Figure 2. Summary of gene variation in the ICR1000 UK exome series.
Protein-altering variation includes nonsynonymous variants, inframe indels and protein-truncating variants (i.e. frameshifting indels or variants that alter essential splice-site residues). The majority of genes include variants across the frequency spectrum.
Supplementary Figure 1. Coverage and coding length of genes.
Scatterplot of gene coverage in 90% of samples vs the coding length in base pairs.
Figure 3. Summary of variants in the ICR1000 UK exome series by type and frequency.
The number and percentage of variants in each category is shown in white text. For all variant types, rare variants predominate, and the distribution of variants of different frequencies is similar.
Figure 4. Summary of indel characteristics in the ICR1000 UK exome series.
Variant frequency varies with type and length of indel. Deletions are more common than insertions, particularly for rare variants. There is enrichment of indels of 3 bp, 6 bp and 9 bp in coding but not non-coding sequence, because these cause inframe variants.
Average number of exome variants per UK individual.
| Total | Common
| Low Frequency
| Rare
| |
|---|---|---|---|---|
| All Variants | 21,978 | 20,970 | 848 | 160 |
|
| ||||
| Base substitution | 21,270 | 20,302 | 814 | 154 |
| Deletion | 418 | 392 | 22 | 4 |
| Insertion | 289 | 276 | 12 | 2 |
|
| ||||
| Synonymous (SY) | 9,993 | 9,632 | 313 | 48 |
| Nonsynonymous (NSY) | 8,718 | 8,221 | 407 | 89 |
| Splice-site (SS, SS5) | 2,494 | 2,396 | 86 | 12 |
| Exon end (EE) | 396 | 375 | 17 | 4 |
| Inframe (IF) | 147 | 137 | 8 | 1 |
| Frameshifting indel (FS) | 99 | 90 | 6 | 3 |
| Stop-gain (SG) | 57 | 49 | 5 | 2 |
| Essential splice-site (ESS) | 55 | 50 | 3 | 1 |
| Initiating methionine (IM) | 10 | 10 | 1 | 0 |
| Stop-loss (SL) | 10 | 9 | 0 | 0 |
The ranges are given in Supplementary Table 3. Values are rounded to the nearest whole number. The functional impact class supplied by CAVA is given in parentheses. Full details of the CAVA classification system are given in Münz et al. [11]