| Literature DB >> 33138247 |
Cassandra R Taylor1,2, Kevin M Kiesler3, Kimberly Sturk-Andreaggi1,2, Joseph D Ring1,2, Walther Parson4,5, Moses Schanfield6, Peter M Vallone3, Charla Marshall1,2,5.
Abstract
A total of 1327 platinum-quality mitochondrial DNA haplotypes from United States (U.S.) populations were generated using a robust, semi-automated next-generation sequencing (NGS) workflow with rigorous quality control (QC). The laboratory workflow involved long-range PCR to minimize the co-amplification of nuclear mitochondrial DNA segments (NUMTs), PCR-free library preparation to reduce amplification bias, and high-coverage Illumina MiSeq sequencing to produce an average per-sample read depth of 1000 × for low-frequency (5%) variant detection. Point heteroplasmies below 10% frequency were confirmed through replicate amplification, and length heteroplasmy was quantitatively assessed using a custom read count analysis tool. Data analysis involved a redundant, dual-analyst review to minimize errors in haplotype reporting with additional QC checks performed by EMPOP. Applying these methods, eight sample sets were processed from five U.S. metapopulations (African American, Caucasian, Hispanic, Asian American, and Native American) corresponding to self-reported identity at the time of sample collection. Population analyses (e.g., haplotype frequencies, random match probabilities, and genetic distance estimates) were performed to evaluate the eight datasets, with over 95% of haplotypes unique per dataset. The platinum-quality mitogenome haplotypes presented in this study will enable forensic statistical calculations and thereby support the usage of mitogenome sequencing in forensic laboratories.Entities:
Keywords: haplogroup; haplotype; mitogenome; mtDNA; next-generation sequencing; population statistics
Mesh:
Substances:
Year: 2020 PMID: 33138247 PMCID: PMC7716222 DOI: 10.3390/genes11111290
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Sample set information for all 1363 samples.
| Sample Set | Source | U.S. Geographic Origin | Metapopulation | Sample Type | Count |
|---|---|---|---|---|---|
| COAF | Analytical Genetic Testing Center (Denver, CO) | Colorado * | African American | Whole blood, buccal swabs | 123 |
| COCN | Analytical Genetic Testing Center (Denver, CO) | Colorado * | Caucasian | Whole blood, buccal swabs | 118 |
| COHS | Analytical Genetic Testing Center (Denver, CO) | Colorado * | Hispanic | Whole blood, buccal swabs | 113 |
| NTAF | National Institute of Standards and Technology (Gaithersburg, MD) | Multiple States | African American | Whole blood | 258 |
| NTCN | National Institute of Standards and Technology (Gaithersburg, MD) | Multiple States | Caucasian | Whole blood | 262 |
| NTHS | National Institute of Standards and Technology (Gaithersburg, MD) | Multiple States | Hispanic | Whole blood | 139 |
| DSAS | Department of Defense Serum Repository (Silver Spring, CO) | Multiple States/Territories | Asian American | Serum | 175 |
| DSNA | Department of Defense Serum Repository (Silver Spring, CO) | Multiple States | Native American | Serum | 175 |
COAF = Colorado African American; COCN = Colorado Caucasian; COHS = Colorado Hispanic; NTAF = National Institute of Standards and Technology (NIST) African American; NTCN = NIST Caucasian; NTHS = NIST Hispanic; DSAS = Department of Defense Serum Repository (DoDSR) Asian American; DSNA = DoDSR Native American. * The majority of samples were collected from individuals living in Colorado, though some samples in the Analytical Genetic Testing Center sets may have originated from other U.S. states.
Summary of laboratory processing methods for each sample source: the Analytical Genetic Testing Center in Colorado (AGTC-CO), the National Institute of Standards and Technology (NIST), and the Department of Defense Serum Repository (DoDSR). Samples were processed either at the Armed Forces Medical Examiner System’s Armed Forces DNA Identification Laboratory (AFMES-AFDIL) or at the Applied Genetics laboratory at NIST.
| Sample Source | Processing Laboratory | Amplification Input (µL) | Amplicon Purification | Library Preparation | Sequencing | ||||
|---|---|---|---|---|---|---|---|---|---|
| Input (ng) | Reaction | Method | Input (pM) | Reagent Kit | Read Type | ||||
| AGTC-CO | AFMES-AFDIL | 3 | Yes | 150 | Half-reaction | Manual | 12 | 150 cycle v3 | Single end |
| NIST | NIST | 2 | No | 350 | Full-reaction | Manual | 20 | 600 cycle v3 | Paired end |
| DoDSR | AFMES-AFDIL | 5 | Yes | 50 | Half-reaction | Automated | 12 | 600 cycle v3 | Paired end |
Figure 1Summary of methods and quality control (QC) process for the generation of platinum-quality mitochondrial genomes. AFMES-AFDIL = Armed Forces Medical Examiner System’s Armed Forces DNA Identification Laboratory; EMPOP = The European DNA Profiling Group (EDNAP) mtDNA Population Database.
Breakdown of sample success rate by dataset.
| Dataset | Samples Attempted | Finalized Samples | Passing | Excluded | ||||
|---|---|---|---|---|---|---|---|---|
| Two Amplicon | Four Amplicon | Failed | Mixed | Duplicate | Related | |||
| COAF | 123 | 112 | 112 | 0 | 6 | 3 | 2 | 0 |
| COCN | 118 | 112 | 112 | 0 | 5 | 1 | 0 | 0 |
| COHS | 113 | 109 | 109 | 0 | 1 | 3 | 0 | 0 |
| NTAF | 258 | 256 | 251 | 5 | 1 | 1 | 0 | 0 |
| NTCN | 262 | 260 | 258 | 2 | 1 | 0 | 0 | 1 |
| NTHS | 139 | 138 | 138 | 0 | 0 | 0 | 0 | 1 |
| DSAS | 175 | 169 | 165 | 4 | 3 | 3 | 0 | 0 |
| DSNA | 175 | 171 | 158 | 13 | 1 | 3 | 0 | 0 |
| Total | 1363 | 1327 | 1303 | 24 | 18 | 14 | 2 | 2 |
COAF = Colorado African American; COCN = Colorado Caucasian; COHS = Colorado Hispanic; NTAF = National Institute of Standards and Technology (NIST) African American; NTCN = NIST Caucasian; NTHS = NIST Hispanic; DSAS = Department of Defense Serum Repository (DoDSR) Asian American; DSNA = DoDSR Native American.
Analysis metrics for each data source. The metric values from each set were averaged.
| Data Source | Total Reads | Reads After Trim | Reads Mapped | Trimmed Reads Mapped (%) | Average Read Depth | Average Major Base Frequency (%) | Average Major Base Frequency Excluding Heteroplasmy (%) | Average Variant Position Read Depth |
|---|---|---|---|---|---|---|---|---|
| AGTC-CO | 352,136 | 313,022 | 293,667 | 94 | 1658.8 | 98.6 | 99.5 | 1499.6 |
| NIST | 383,834 | 272,090 | 256,309 | 95 | 1558.4 | 98.0 | 99.1 | 1466.0 |
| DoDSR | 688,530 | 566,775 | 504,600 | 90 | 2385.4 | 97.5 | 99.2 | 2198.7 |
AGTC-CO = Analytical Genetic Testing Center in Colorado; NIST = National Institute of Standards and Technology; DoDSR = Department of Defense Serum Repository.
Number of individuals with observed point heteroplasmies (PHPs) in each dataset.
| Dataset | Total Individuals | Total PHPs | Individuals with PHPs | Individuals with 1 PHP | Individuals with 2 PHPs | Individuals with 3 PHPs |
|---|---|---|---|---|---|---|
| COAF | 112 | 37 | 31 (28%) | 26 | 4 | 1 |
| COCN | 112 | 41 | 30 (27%) | 20 | 9 | 1 |
| COHS | 109 | 36 | 27 (25%) | 20 | 5 | 2 |
| NTAF | 256 | 77 | 60 (23%) | 43 | 17 | 0 |
| NTCN | 260 | 92 | 77 (30%) | 65 | 10 | 2 |
| NTHS | 138 | 53 | 43 (31%) | 34 | 8 | 1 |
| DSAS | 169 | 62 | 54 (32%) | 49 | 2 | 3 |
| DSNA | 171 | 48 | 43 (25%) | 38 | 5 | 0 |
| All | 1327 | 446 | 365 (28%) | 295 | 60 | 10 |
COAF = Colorado African American; COCN = Colorado Caucasian; COHS = Colorado Hispanic; NTAF = National Institute of Standards and Technology (NIST) African American; NTCN = NIST Caucasian; NTHS = NIST Hispanic; DSAS = Department of Defense Serum Repository (DoDSR) Asian American; DSNA = DoDSR Native American.
Figure 2Distribution and count of the 446 observed point heteroplasmies across the mitochondrial genome including the two hypervariable (HVI and HVII) regions of the control region and coding region/sequence (CDS).
Figure 3Variant frequencies (VF) of observed point heteroplasmy (PHP) in the control region (CR) (blue) and the coding region/sequence (CDS) (green). For PHPs with VFs higher than 50%, the frequency was subtracted from one (1–VF) and is shaded in light blue or light green.
Summary statistics. Observed and empirical random match probabilities (RMPs) and haplotype diversity were calculated for the entire mitogenome when point heteroplasmy (PHP) were included and excluded. Length heteroplasmy nps 16,193, 309, 315, 455, 463, 573, 960, 5899, 8276, and 8285 were ignored for these calculations.
| Dataset | Sample Size | Including PHP | Excluding PHP | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Total Haplotypes | Unique Haplotypes | Observed RMP (%) | Empirical RMP (%) | Haplotype Diversity | Total Haplotypes | Unique Haplotypes | Observed RMP (%) | Empirical RMP (%) | Haplotype Diversity | ||
| COAF | 112 | 112 | 112 | 0.89 | 0.00 | 1 | 110 | 108 | 0.92 | 0.03 | 0.9997 |
| COCN | 112 | 112 | 112 | 0.89 | 0.00 | 1 | 112 | 112 | 0.89 | 0.00 | 1 |
| COHS | 109 | 102 | 97 | 1.09 | 0.17 | 0.9983 | 94 | 83 | 1.34 | 0.42 | 0.9958 |
| NTAF | 256 | 251 | 247 | 0.41 | 0.02 | 0.9998 | 246 | 237 | 0.42 | 0.03 | 0.9997 |
| NTCN | 260 | 254 | 250 | 0.41 | 0.02 | 0.9998 | 250 | 244 | 0.43 | 0.05 | 0.9995 |
| NTHS | 138 | 131 | 127 | 0.86 | 0.14 | 0.9986 | 125 | 116 | 0.97 | 0.24 | 0.9976 |
| DSAS | 169 | 167 | 165 | 0.61 | 0.01 | 0.9999 | 165 | 161 | 0.62 | 0.03 | 0.9997 |
| DSNA | 171 | 167 | 163 | 0.61 | 0.03 | 0.9997 | 164 | 159 | 0.65 | 0.07 | 0.9993 |
COAF = Colorado African American; COCN = Colorado Caucasian; COHS = Colorado Hispanic; NTAF = National Institute of Standards and Technology (NIST) African American; NTCN = NIST Caucasian; NTHS = NIST Hispanic; DSAS = Department of Defense Serum Repository (DoDSR) Asian American; DSNA = DoDSR Native American.
Figure 4Principal coordinate analyses of the eight datasets. Each metapopulation is labeled in a different color (African American in green, Caucasian in blue, Hispanic in red, Asian American in orange, and Native American in purple). COAF = Colorado African American; COCN = Colorado Caucasian; COHS = Colorado Hispanic; NTAF = National Institute of Standards and Technology (NIST) African American; NTCN = NIST Caucasian; NTHS = NIST Hispanic; DSAS = Department of Defense Serum Repository (DoDSR) Asian American; DSNA = DoDSR Native American.
Figure 5Proportions of haplogroups observed within each dataset. African haplogroups are green, European haplogroups are blue, Native American haplogroups are purple, and Asian haplogroups are orange. Haplogroups representing less than 5% of haplotypes within each dataset have been grouped together and are shaded in grey. For complete haplogroup distribution and frequencies, see Table S8. COAF = Colorado African American; NTAF = National Institute of Standards and Technology (NIST) African American; COCN = Colorado Caucasian; NTCN = NIST Caucasian; COHS = Colorado Hispanic; NTHS = NIST Hispanic; DSNA = Department of Defense Serum Repository (DoDSR) Native American; DSAS = DoDSR Asian American.
Figure 6Mitochondrial DNA ancestry proportions in (a) the Analytical Genetic Testing Center in Colorado (AGTC-CO) Hispanic (COHS), (b) the National Institute of Standards and Technology (NIST) Hispanic (NTHS), and (c) the Department of Defense Serum Repository (DoDSR) Native American (DSNA) datasets. Ancestry was classified on a continental level based on the assigned mitochondrial DNA haplogroup.