| Literature DB >> 17604454 |
Doron M Behar, Saharon Rosset, Jason Blue-Smith, Oleg Balanovsky, Shay Tzur, David Comas, R John Mitchell, Lluis Quintana-Murci, Chris Tyler-Smith, R Spencer Wells.
Abstract
The Genographic Project is studying the genetic signatures of ancient human migrations and creating an open-source research database. It allows members of the public to participate in a real-time anthropological genetics study by submitting personal samples for analysis and donating the genetic results to the database. We report our experience from the first 18 months of public participation in the Genographic Project, during which we have created the largest standardized human mitochondrial DNA (mtDNA) database ever collected, comprising 78,590 genotypes. Here, we detail our genotyping and quality assurance protocols including direct sequencing of the mtDNA HVS-I, genotyping of 22 coding-region SNPs, and a series of computational quality checks based on phylogenetic principles. This database is very informative with respect to mtDNA phylogeny and mutational dynamics, and its size allows us to develop a nearest neighbor-based methodology for mtDNA haplogroup prediction based on HVS-I motifs that is superior to classic rule-based approaches. We make available to the scientific community and general public two new resources: a periodically updated database comprising all data donated by participants, and the nearest neighbor haplogroup prediction tool.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17604454 PMCID: PMC1904368 DOI: 10.1371/journal.pgen.0030104
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Genotyping Parameters of the Reference Database
Hg Frequencies
Figure 1HVS-I Identity by Descent or by State
A theoretically evolving tree is presented. Coding-region polymorphisms are in black. HVS-I polymorphisms are in red. Samples A and B share HVS-I haplotype 16303 by descent. Samples A and D or B and D share HVS-I haplotype 16303 by state and as a result of homoplasy. Samples C and E are identical by state as a result of a back mutation in position 16303 in sample C as marked by the “BM” designation.
Figure 2Saturation Curves
The number of accumulated mtDNA HVS-I haplotypes (A and B) and polymorphic sites (C and D) as a function of the number of accumulating samples is shown. The analysis is presented once for the entire database (A and C) and once for a limited number of samples (B and D), allowing a better comparison with the less well-represented geographic groups. The Hgs were grossly divided to represent four different geographic groups as follows. Africa: L, M1, and U6; East Asia-Americas: A, B, C, D, F, N9a, and R9; South Asia: M*, R1, R2, R5, and R6; and West Eurasia: N1, R, W, and X. Saturation curves for Hg H are also presented.
Figure 3Physical Map of HVS-I
The figure presents a simple map made up from all polymorphic sites observed in the sequenced region 16024–16569 without denoting their frequencies. Conclusions regarding the number of times each observed position was hit during Homo sapiens' evolution can not be inferred.
Figure 4The Phylogeny of mtDNA Haplogroups Inferred from the Panel of 22 Coding-Region SNPs Used in the Genographic Project
The coding-region mutations are shown on the branches. The frequencies of the haplogroups found among the Genographic participants are shown in brackets beside the Hgs assignments and correspond to Table 2. Note that the figure discriminates between haplogroups L0 and L1 while the coding-region SNPs used during genotyping do not distinguish the two and therefore they are labeled throughout the paper as L0/L1.