| Literature DB >> 29967758 |
Sandra Smieszek1,2, Sabrina L Mitchell3, Eric H Farber-Eger4, Olivia J Veatch5, Nicholas R Wheeler1,2, Robert J Goodloe6, Quinn S Wells7,8, Deborah G Murdock9, Dana C Crawford1,2.
Abstract
Effective approaches for assessing mitochondrial DNA (mtDNA) variation are important to multiple scientific disciplines. Mitochondrial haplogroups characterize branch points in the phylogeny of mtDNA. Several tools exist for mitochondrial haplogroup classification. However, most require full or partial mtDNA sequence which is often cost prohibitive for studies with large sample sizes. The purpose of this study was to develop Hi-MC, a high-throughput method for mitochondrial haplogroup classification that is cost effective and applicable to large sample sizes making mitochondrial analysis more accessible in genetic studies. Using rigorous selection criteria, we defined and validated a custom panel of mtDNA single nucleotide polymorphisms that allows for accurate classification of European, African, and Native American mitochondrial haplogroups at broad resolution with minimal genotyping and cost. We demonstrate that Hi-MC performs well in samples of European, African, and Native American ancestries, and that Hi-MC performs comparably to a commonly used classifier. Implementation as a software package in R enables users to download and run the program locally, grants greater flexibility in the number of samples that can be run, and allows for easy expansion in future revisions. Hi-MC is available in the CRAN repository and the source code is freely available at https://github.com/vserch/himc.Entities:
Keywords: Classifier; Genotype; Haplogroup; Mitochondria; mtDNA variation
Year: 2018 PMID: 29967758 PMCID: PMC6022720 DOI: 10.7717/peerj.5149
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Hi-MC algorithm structure.
Input for the algorithm is a list of sample IDs and corresponding SNP genotype data in pedigree (PED/MAP) format. These genotypes are recursively analyzed through a node-based tree structure. Each successive genotype classification is passed on to the Accumulator. They are then ranked according to specificity (longer path through the tree -> more SNPs checked -> more specific), with the most specific haplogroup as the final output. MRCA, most recent common ancestor.
Percent concordance in CEU and YRI populations for pair-wise comparisons of mitochondrial haplogroup classifications.
| (A) | |||
|---|---|---|---|
| CEU ( | YRI ( | CHB/JPT ( | |
| Hi-MC-assigned vs HapMap-reported | 94.4 | 90.8 | 45.9 |
| Hi-MC-assigned vs Haplogrep2-assigned | 93.3 | 59.2 | 84.9 |
| Haplogrep2-assigned vs HapMap-reported | 97.8 | 60.5 | 40.0 |
Notes:
We assigned haplogroups using Hi-MC and Haplogroup2 based on (A) genotype data in CEU, YRI, and CHB/JPT HapMap samples generated by a custom SNP panel targeting 63 mitochondrial SNPs. We compared these haplogroup assignments with HapMap-reported haplogroups available at ftp://ftp.ncbi.nlm.nih.gov/hapmap/mtDNA_and_chrY_haplogroups/. (B) We also performed the same comparisons using Haplogrep2 on mitochondrial data available on overlapping HapMap samples from the 1,000 Genomes Project. HapMap MXL samples are not included here as no HapMap-reported haplogroups are available for comparison.
Fourteen YRI samples were “unclassified” by Hi-MC due to missing SNPs and were removed from these comparisons.
Four CHB/JPT samples were “unclassified” by Hi-MC due to missing SNPs and one CHB/JPT sample was labeled “unknown” as the HapMap-reported haplogroup. All five were removed from these comparisons.
Four YRI samples were “unclassified” by Hi-MC due to missing SNPs and were removed from these comparisons.
Four CHB/JPT samples were “unclassified” by Hi-MC due to missing SNPs and one CHB/JPT sample was labeled “unknown” as the HapMap-reported haplogroup. All five were removed from these comparisons.
Figure 2Distribution of mitochondrial haplogroups in the HapMapMap Phase III samples of Mexican ancestry in Los Angeles, CA.
Hi-MC-assigned haplogroups based on the custom SNP panel for MXL samples and their frequency are plotted on the X- and Y-axes, respectively. We genotyped the custom SNP panel and applied Hi-MC to all MXL samples from Phase III of the HapMap Project (n = 90). Given that the mitochondrial haplogroup of the offspring is the same as that of the mother, offspring were excluded when determining the frequency distribution of haplogroups (n = 30). We further excluded three samples with unclassified haplogroups and two samples with likely misclassified haplogroups by Hi-MC due to key missing SNPs. A total of 55 MXL unrelated samples are represented here.