| Literature DB >> 17439969 |
Abstract
With the recent increase in the available number of high-quality, full-length mitochondrial sequences, it is now possible to construct and analyze a comprehensive human mitochondrial consensus sequence. Using a data set of 827 carefully selected sequences, it is shown that modern humans contain extremely low levels of divergence from the mitochondrial consensus sequence, differing by a mere 21.6 nt sites on average. Fully 84.1% of the mitochondrial genome was found to be invariant and 'private' mutations accounted for 43.8% of the variable sites. Ninety eight percent of the variant sites had a primary nucleotide with an allele frequency of 0.90 or greater. Interestingly, the few truly ambiguous nucleotide sites could all be reliably assigned to either a purine or pyrimidine ancestral state. A comparison of this consensus sequence to several ancestral sequences derived from phylogenetic studies reveals a great deal of similarity, where, as expected, the most phylogenetically informative nucleotides in the ancestral studies tended to be the most variable nucleotides in the consensus. Allowing for this fact, the consensus approach provides variation data on the positions that do not contribute to phylogenetic reconstructions, and these data provide a baseline for measuring human mitochondrial variation in populations worldwide.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17439969 PMCID: PMC1888801 DOI: 10.1093/nar/gkm207
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
‘Poly-x’ sites
| Site | Type | Number of alleles | |
|---|---|---|---|
| 290–291 | Poly-A | 3 | 0.998 |
| 309 | Poly-C | 4 | 0.522 |
| 315 | Poly-C | 3 | 0.935 |
| 498 | Poly-C | 3 | 0.995 |
| 520–523 | CA repeat | 5 | 0.746 |
| 571–573 | Poly-C | 8 | 0.971 |
| 960 | Poly-C | 7 | 0.989 |
| 5899 | Poly-C | 4 | 0.987 |
| 8276 | Poly-C | 5 | 0.990 |
| 8281–8289 | ‘9-bp deletion’ | 6 | 0.993 |
| 16, 192–16, 193 | Poly-C | 5 | 0.945 |
Figure 1.Pairwise nucleotide differences between all individuals within the data set and the worldwide consensus.
Variation within specific regions of the mitochondrial genome
| Locus | Number of variant sites | % Variant sites | |
|---|---|---|---|
| D-loop | D-loop total | 392 | 14.6 |
| HVS 1 | 181 | 6.8 | |
| HVS 2 | 114 | 4.3 | |
| 7S DNA | 238 | 8.9 | |
| Coding region | Coding region total | 2272 | 85.3 |
| Non-coding nucleotides | 23 | 0.9 | |
| tRNAs | 158 | 5.9 | |
| 12S Ribosomal RNA | 99 | 3.7 | |
| 16S Ribosomal RNA | 125 | 4.7 | |
| NADH dehydrogenase subunits | 996 | 37.4 | |
| Cytochrome c oxidase subunits | 431 | 16.2 | |
| Cytochrome b | 217 | 8.1 | |
| ATP synthase subunits | 211 | 7.9 |
Loci positions were obtained from MitoMap (www.MitoMap.org). Percent calculations included poly-x sites (Table 1) as single events in nucleotide counts. Note: The overlap between several of these categories means that the number of variant sites will be higher than reported in the text and the percentages will sum to greater than unity.
Figure 2.Primary allele frequency (p) distribution, where p is defined as the frequency of the most common allele at each position. Data for the 11 poly-x sites are treated separately (Table 2).
Figure 3.Allele-count frequency histogram. The allele count is a measure of the number of alleles (i.e. A, T, G or C) found at each position. Invariant sites have an allele count of 1. Data for the 11 poly-x sites are treated separately (Table 2).
Summary statistics for the number of alleles and primary allele frequency (p) data on a site-by-site basis
| Number of alleles | ||
|---|---|---|
| Average | 1.169 | 0.998 |
| SD | 0.406 | 0.012 |
| Min | 1 | 0.521 |
| Max | 8 | 1.000 |
Relative to the rCRS, there are 31 insertions within the data set that are not within the designated poly-x sites. Fourteen of these are private alleles. Reported values are the same (to three decimal places) whether or not one includes the insertions.