| Literature DB >> 24448545 |
Shengting Li1, Soren Besenbacher2, Yingrui Li3, Karsten Kristiansen4, Niels Grarup5, Anders Albrechtsen4, Thomas Sparsø5, Thorfinn Korneliussen4, Torben Hansen5, Jun Wang3, Rasmus Nielsen6, Oluf Pedersen5, Lars Bolund7, Mikkel H Schierup8.
Abstract
In this paper, we mine full mtDNA sequences from an exome capture data set of 2000 Danes, showing that it is possible to get high-quality full-genome sequences of the mitochondrion from this resource. The sample includes 1000 individuals with type 2 diabetes and 1000 controls. We characterise the variation found in the mtDNA sequence in Danes and relate the variation to diabetes risk as well as to several blood phenotypes of the controls but find no significant associations. We report 2025 polymorphisms, of which 393 have not been reported previously. These 393 mutations are both very rare and estimated to be caused by very recent mutations but individuals with type 2 diabetes do not possess more of these variants. Population genetics analysis using Bayesian skyline plot shows a recent history of rapid population growth in the Danish population in accordance with the fact that >40% of variable sites are observed as singletons.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24448545 PMCID: PMC4350597 DOI: 10.1038/ejhg.2013.282
Source DB: PubMed Journal: Eur J Hum Genet ISSN: 1018-4813 Impact factor: 4.246
Figure 1(a) The total coverage of mitochondrial sequence from 2000 samples over the length of the mitochondrion. (b) The number of mitochondrial sequences with missing data (out of 2000) along the length of the mitochondrion.
Overview of variation in the sample of Danish mtDNA
| Control region | 304 | 289 | 45 | 27.1 | |||
| Other non-coding | 27 | 21 | 7 | ||||
| 12S rRNA | 68 | 62 | 9 | 6.7 | |||
| 16 S rRNA | 94 | 90 | 5 | 6.0 | |||
| tRNAs | 117 | 116 | 2 | 7.8 | |||
| 128 | 121 | 8 | 70 | 58 | 0.52 | 18.8 | |
| 27 | 26 | 1 | 14 | 13 | 0.46 | 13.0 | |
| 164 | 154 | 15 | 40 | 124 | 0.14 | 10.4 | |
| 80 | 74 | 8 | 23 | 57 | 0.17 | 11.7 | |
| 97 | 93 | 5 | 34 | 63 | 0.23 | 12.4 | |
| 155 | 144 | 14 | 59 | 96 | 0.26 | 13.6 | |
| 121 | 114 | 9 | 37 | 84 | 0.19 | 12.7 | |
| 131 | 126 | 7 | 36 | 95 | 0.16 | 12.5 | |
| 39 | 37 | 2 | 13 | 26 | 0.21 | 11.3 | |
| 147 | 140 | 8 | 30 | 117 | 0.11 | 10.7 | |
| 30 | 30 | 2 | 6 | 24 | 0.11 | 9.8 | |
| 221 | 209 | 15 | 59 | 162 | 0.16 | 12.1 | |
| 75 | 74 | 4 | 30 | 46 | 0.28 | 14.3 | |
| Total | 2025 | 1920 | 166 | 451 | 965 | 0.20 |
The site frequency of variants divided into coding variants (synonymous and non-synonymous), variants in RNA genes, intergenic variants and new variants
| 1 | 193 | 42.8 | 399 | 41.3 | 128 | 45.9 | 73 | 24.0 | 286 | 72.0 |
| 2 | 71 | 15.7 | 163 | 16.9 | 47 | 16.8 | 27 | 8.9 | 68 | 17.1 |
| 3 | 41 | 9.1 | 73 | 7.6 | 20 | 7.2 | 28 | 9.2 | 18 | 4.5 |
| 4 | 26 | 5.8 | 55 | 5.7 | 15 | 5.4 | 15 | 4.9 | 9 | 2.3 |
| 5 | 21 | 4.7 | 40 | 4.1 | 12 | 4.3 | 10 | 3.3 | 6 | 1.5 |
| 6–10 | 37 | 8.2 | 111 | 11.5 | 20 | 7.2 | 40 | 13.2 | 5 | 1.3 |
| 11–20 | 28 | 6.2 | 38 | 3.9 | 7 | 2.5 | 28 | 9.2 | 1 | 0.3 |
| 21–100 | 22 | 4.9 | 66 | 6.8 | 17 | 6.1 | 55 | 18.1 | 0 | 0.0 |
| 100–2000 | 12 | 2.7 | 20 | 2.1 | 13 | 4.7 | 28 | 9.2 | 0 | 0.0 |
Figure 2The low range of the folded SFS for the variation observed, divided into synonymous, non-synonymous, RNA coding, intergenic and non-sense. Variants observed up to 10 times are shown.
The number of potentially functional variants (non-synonymous or changing an RNA gene) not previously recorded in mitomap, divided into genes and with the number of cases and controls having such variants
| NA | 3 | 6 | 0 | 0.031 | |
| 0.35 | 11 | 5 | 14 | 0.062 | |
| 3.02 | 12 | 12 | 4 | 0.076 | |
| 1.43 | 35 | 34 | 24 | 0.230 | |
| 0.66 | 28 | 16 | 24 | 0.263 | |
| 0.58 | 12 | 7 | 12 | 0.357 | |
| 4.01 | 4 | 4 | 1 | 0.374 | |
| 0.50 | 6 | 4 | 8 | 0.386 | |
| 1.67 | 14 | 10 | 6 | 0.453 | |
| 0.00 | 2 | 0 | 2 | 0.500 | |
| 0.00 | 2 | 0 | 2 | 0.500 | |
| 1.45 | 18 | 13 | 9 | 0.521 | |
| 0.83 | 14 | 15 | 18 | 0.726 | |
| 0.78 | 4 | 7 | 9 | 0.803 | |
| 1.29 | 11 | 9 | 7 | 0.803 |
P-value is Fisher's exact test of independence of case/control status.
Variants found in the present data set that have been confirmed (according to mitomap.org) to be involved in diseases, with number of cases and controls harbouring them.
| A1555G | 3 | 4 | DEAF | 102/107 |
| G11778A | 4 | 0 | Progressive dystonia, LHON | 22/22 |
| T14484C | 1 | 0 | LHON | 49/49 |
| T14674C | 2 | 0 | Reversible COX deficiency myopathy | 47/47 |
Figure 3The phylogenetic tree relating the haplogroups with >30 representatives among the 2000 mtDNA sequences sampled in the current study. Haplogroup designation and number of samples are shown, for internal branches and the tips.
Figure 4The haplogroup distribution in cases and controls. None of the differences are significant with Bonferroni correction for multiple testing.
Figure 5Bayesian skyline plot of the 2000 sequences. A mutation rate of 1.7 × 10−8 per year was used to convert substitution rates into years (x-axis) and coalescent intensities into effective population sizes (y-axis).