| Literature DB >> 22319179 |
Carrie C Buchanan1, Eric S Torstenson, William S Bush, Marylyn D Ritchie.
Abstract
BACKGROUND: Since publication of the human genome in 2003, geneticists have been interested in risk variant associations to resolve the etiology of traits and complex diseases. The International HapMap Consortium undertook an effort to catalog all common variation across the genome (variants with a minor allele frequency (MAF) of at least 5% in one or more ethnic groups). HapMap along with advances in genotyping technology led to genome-wide association studies which have identified common variants associated with many traits and diseases. In 2008 the 1000 Genomes Project aimed to sequence 2500 individuals and identify rare variants and 99% of variants with a MAF of <1%.Entities:
Mesh:
Year: 2012 PMID: 22319179 PMCID: PMC3277631 DOI: 10.1136/amiajnl-2011-000652
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
HapMap details
| No. of SNPs genotyped | Targeted SNPs | Populations studied | |
| Phase I | 1 million | Prioritized coding SNPs to attain 1 SNP for each 5 kb region | CEU, YRI, CHB, JPT |
| Phase II | 3 million | Prioritized non-synonymous SNPs in coding regions | CEU, YRI, CHB, JPT |
| Phase III | 1.4 million | Prioritized rare variants | CEU, YRI, CHB, JPT, ASW, CHB, GIH, LWK, MXL, MKK, TSI |
SNP, single nucleotide polymorphism.
Details for three pilot projects initiated by the 1000 Genomes Project
| Pilot data sets | Populations | Samples | Coverage |
| Trio | 2 | 6 | 20–40× |
| Low coverage | 4 | 179 | 2–4× |
| Exon (8140 exons, ∼5% of exome) | 7 | 697 | 20–50× |
Details for 1000 Genomes Project full project data (sequence index 2010.08.04)
| Continental groups | Ethnicity breakdown | Total |
| AFR | 78 YRI+67 LWK+24 ASW+5 PUR | 174 |
| EUR | 90 CEU+92 TSI+43 GBR+36 FIN+17 MXL+5 PUR | 283 |
| ASN | 68 CHB+25 CHS+84 JPT+17 MXL | 194 |
| Total number of unique individuals | 629 |
AFR, African; ASN, Asian; EUR, European.
Figure 1Variants in HapMap and 1000 Genomes Project data. The left box shows an enhanced screenshot from the NCBI browser. rs2072413 shows that variants in HapMap are not always found in 1000 Genomes Project data. For reference, the validation status descriptions are shown in the box on the right. SNP, single nucleotide polymorphism.
Figure 2Reference allele frequency in the HapMap exclusive data, by chromosome in CEU (A) and YRI (B).
Figure 3Distribution of reference allele frequencies (most often the major allele frequency) of HapMap (HM) exclusive variants after filtering out any fixed alleles in CEU (A) and YRI (B) populations.
Comparison of HapMap and 1000 Genomes Project pilot data
| Population | Raw data | Filtered out fixed alleles (fixed: AF=0 or 1) | Filtered out uncommon alleles (uncommon: AF<0.05 or >0.95) |
| CEU | 0.6858 | 0.9314 | 0.9894 |
| YRI | 0.7543 | 0.9234 | 0.9841 |
The table shows the percentage of variants in HapMap that were also represented in 1000 Genomes (and therefore, were common to both). The first column indicates the two populations used in this study; the second column shows the calculated percentage (of HapMap) common to both databases using the raw data; the third and fourth columns indicate how the HapMap list of variants was filtered before comparison.
Figure 4Total number of HapMap variants before and after filtering using CEU samples on chromosome 1. The y-axis shows the total number of variants (by hundred thousand). The tan bars indicate the number of HapMap variants left after an allele frequency filter is applied (if applied). The green bars indicate how many of those variants are present in 1000 Genomes Project pilot data. The numbers above each bar indicate the bar height, that is, the number of variants. For reference, the light gray line demonstrates the total number of variants on chromosome 1 in 1000 Genomes Project pilot data (approximately 605 000).
Comparison of HapMap and 1000 Genomes Project full project data
| Population | Raw data | Filtered out fixed alleles (fixed: AF=0 or 1) | Filtered out uncommon alleles (uncommon: AF<0.05 or >0.95) |
| CEU | 0.8784 | 0.9884 | 0.9930 |
| YRI | 0.8778 | 0.9859 | 0.9932 |
The table shows the percentage of variants in HapMap that were also represented in 1000 Genomes (and therefore, were common to both). The first column indicates the two populations used in this study; the second column shows the calculated percentage (of HapMap) common to both databases using the downloaded data; the third and fourth columns indicate how the HapMap list of variants was filtered before comparison.