| Literature DB >> 16920744 |
M Thomas P Gilbert1, Jonas Binladen, Webb Miller, Carsten Wiuf, Eske Willerslev, Hendrik Poinar, John E Carlson, James H Leebens-Mack, Stephan C Schuster.
Abstract
Although ancient DNA (aDNA) miscoding lesions have been studied since the earliest days of the field, their nature remains a source of debate. A variety of conflicting hypotheses exist about which miscoding lesions constitute true aDNA damage as opposed to PCR polymerase amplification error. Furthermore, considerable disagreement and speculation exists on which specific damage events underlie observed miscoding lesions. The root of the problem is that it has previously been difficult to assemble sufficient data to test the hypotheses, and near-impossible to accurately determine the specific strand of origin of observed damage events. With the advent of emulsion-based clonal amplification (emPCR) and the sequencing-by-synthesis technology this has changed. In this paper we demonstrate how data produced on the Roche GS20 genome sequencer can determine miscoding lesion strands of origin, and subsequently be interpreted to enable characterization of the aDNA damage behind the observed phenotypes. Through comparative analyses on 390,965 bp of modern chloroplast and 131,474 bp of ancient woolly mammoth GS20 sequence data we conclusively demonstrate that in this sample at least, a permafrost preserved specimen, Type 2 (cytosine-->thymine/guanine-->adenine) miscoding lesions represent the overwhelming majority of damage-derived miscoding lesions. Additionally, we show that an as yet unidentified guanine-->adenine analogue modification, not the conventionally argued cytosine-->uracil deamination, underpins a significant proportion of Type 2 damage. How widespread these implications are for aDNA will become apparent as future studies analyse data recovered from a wider range of substrates.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16920744 PMCID: PMC1802572 DOI: 10.1093/nar/gkl483
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Number of miscoding lesions observed within chloroplast and mammoth datasets
| Miscoding lesions originally derived from A and T nucleotides | Miscoding lesions originally derived from C and G nucleotides | |||||||
|---|---|---|---|---|---|---|---|---|
| A→G T→C | A→C T→G | A→T T→A | Total A+T | C→A G→T | C→G G→C | C→T G→A | Total C+G | |
| Chloroplast | 78 | 24 | 89 | 244 230 | 33 | 9 | 52 | 146 735 |
| Mammoth | 39 | 7 | 9 | 81 790 | 16 | 8 | 597 | 49 684 |
| Corrected Mammoth | 116 | 21 | 27 | 47 | 24 | 1763 | ||
| Nucleotide ratio | 2.99 | 2.95 | ||||||
aTotal number of adenine and thymine nucleotides in dataset.
bTotal number of cytosine and guanine nucleotides in dataset.
cCorrected Mammoth: the number of observed lesions among the mammoth sequence data, scaled to match the total chloroplast nucleotides sequenced. For example, corrected mammoth count for A→G/T→C pair was calculated as (Observed Mammoth A→G/T→C)*(Total Chloroplast A+T)/(Total Mammoth A+T) = 39*244 320/81 790 = 116.
Number of observed and expected miscoding lesions in mammoth dataset
| A→G T→C | A→C T→G | A→T T→A | C→A G→T | C→G G→C | C→T G→A | |
|---|---|---|---|---|---|---|
| Observed | 39 | 7 | 9 | 16 | 8 | 597 |
| Expected | 26.12 | 8.04 | 29.81 | 11.17 | 3.05 | 17.61 |
| 0.011 | 0.86 | 8 × 10−4 | 0.17 | 0.013 | <1 × 10−5 | |
| Occurrence per bp sequenced | 1.5 × 10−4 | 0 | 0 | 9.7 × 10−5 | 9.9 × 10−5 | 0.01 |
aAbsolute number of miscoding lesions observed.
bExpected number of miscoding lesions, modelled using the Poisson distribution with rates derived from the chloroplast data.
cWhen using a 5% Bonferroni corrected significance level P-values below 5%/6=0.0084 are significant, leaving only (A→T/T→A) and (C→T/G→A) significant.
Figure 1Orientation of the DNA molecule at different steps of the data production process, and the algorithm used to subsequently segregate and analyse the sequence data. The GS20 emPCR and pyrosequencing process occurs in isolated, parallel reactions (up to 0.8 million per GS20 run). This figure illustrates key stages of the data-generation and analytical process for each individual reaction within a single GS20 run. (1a) Before emPCR original single-stranded DNA molecules are isolated. (2a) Post-emPCR, only descendents of the original molecule that are in the complementary orientation are retained. (1c) These molecules are pyrosequenced, generating sequence in the identical orientation to the originally isolated single-stranded DNA molecule. (1d) A simple analytical algorithm is then applied to the sequence to identify the orientation of, and any miscoding lesions in, the original single-stranded DNA molecule.
Absolute number of damage events underlying observed miscoding lesions, subdivided by Light and Heavy template molecules
| A→N | C→N | G→N | T→N | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | C | G | T | A | C | G | T | A | C | G | T | A | C | G | T | |
| Light | 22 613 | 1 | 9 | 0 | 2 | 15 808 | 1 | 290 | 54 | 2 | 8727 | 6 | 0 | 4 | 2 | 19 116 |
| Heavy | 18 816 | 0 | 15 | 3 | 0 | 8417 | 0 | 141 | 112 | 5 | 16 111 | 8 | 6 | 11 | 4 | 21 190 |
| Total | 41 429 | 1 | 24 | 3 | 2 | 24 225 | 1 | 431 | 166 | 7 | 24 838 | 14 | 6 | 15 | 6 | 40 306 |
aWhere N refers to four possible derived nucleotide states, as listed in subsequent subcolumns.
Contribution of individual damage events to observed miscoding lesion pairs
| Original damage | A→G T→C | A→C T→G | A→T T→A | C→A G→T | C→G G→C | C→T G→A | |
|---|---|---|---|---|---|---|---|
| Mammoth dataset | 24 | 1 | 3 | 2 | 1 | 431 | |
| 15 | 6 | 6 | 14 | 7 | 166 | ||
| Complementary pair total | 39 | 7 | 9 | 16 | 8 | 597 | |
| Per cent of total mammoth observations | 5.8 | 1.0 | 1.3 | 2.4 | 1.2 | 88.3 |
aConstituent damage events within each of the six complementary miscoding lesion pairs, identified as i and j, respectively. Subsequent rows of the table describe observed number of i and j for each dataset, plus total (i + j).
Type 2 damage rate, nucleotides per unit time
| Per year | Per second | |
|---|---|---|
| Type 2 damage | 4.2 × 10−7 | 1.3 × 10−14 |
| C→T | 6.2 × 10−7 | 2.0 × 10−14 |
| G→A | 2.3 × 10−7 | 7.4 × 10−15 |
aAdjusted to account for enzyme contribution to miscoding lesions.
bUnadjusted for enzyme contribution, therefore overestimate of true rate.