| Literature DB >> 27436340 |
Irina Morozova1, Pavel Flegontov2, Alexander S Mikheyev3, Sergey Bruskin4, Hosseinali Asgharian5, Petr Ponomarenko6, Vladimir Klyuchnikov7, GaneshPrasad ArunKumar8, Egor Prokhortchouk9, Yuriy Gankin10, Evgeny Rogaev11, Yuri Nikolsky12, Ancha Baranova13, Eran Elhaik14, Tatiana V Tatarinova15.
Abstract
The term 'ancient DNA' (aDNA) is coming of age, with over 1,200 hits in the PubMed database, beginning in the early 1980s with the studies of 'molecular paleontology'. Rooted in cloning and limited sequencing of DNA from ancient remains during the pre-PCR era, the field has made incredible progress since the introduction of PCR and next-generation sequencing. Over the last decade, aDNA analysis ushered in a new era in genomics and became the method of choice for reconstructing the history of organisms, their biogeography, and migration routes, with applications in evolutionary biology, population genetics, archaeogenetics, paleo-epidemiology, and many other areas. This change was brought by development of new strategies for coping with the challenges in studying aDNA due to damage and fragmentation, scarce samples, significant historical gaps, and limited applicability of population genetics methods. In this review, we describe the state-of-the-art achievements in aDNA studies, with particular focus on human evolution and demographic history. We present the current experimental and theoretical procedures for handling and analysing highly degraded aDNA. We also review the challenges in the rapidly growing field of ancient epigenomics. Advancement of aDNA tools and methods signifies a new era in population genetics and evolutionary medicine research.Entities:
Keywords: ancient DNA; bioinformatics; epigenetics; next-generation sequencing; population genetics
Mesh:
Substances:
Year: 2016 PMID: 27436340 PMCID: PMC4991838 DOI: 10.1093/dnares/dsw029
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Figure 1.Major milestones of development of high-resolution ancient human genomics.
Figure 2.Geographic distribution of existing whole genome aDNA sequences.
Difficulties of working with ancient DNA and specialized methods developed to address them
| Problem | Experimental solutions | Bioinformatics solutions |
|---|---|---|
| Degradation |
Improved extraction protocols Using NGS approach ( | Algorithms based on genotype likelihoods rather than a single best genotype for low coverage genomic positions |
| Base damage |
Using a DNA polymerase which does not amplify through uracils (remove uracil-containing fragments from the reaction) Treatment with uracil-DNA glycosylase plus endonuclease VIII (removes uracil, then cleaves abasic sites) Single-primer extension PCR (analyses separate DNA strands) |
Trimming 5-7 bases from read ends Counting and excluding C→T and G→A mutations at ultra-conserved positions Comparing frequencies of different classes of mutations in modern-modern and modern-ancient alignments Estimation of contamination or divergence based on indels and transversions only, not transitions Exclusion of common ancestor-ancient sample branches from calculation of divergence |
| Contamination |
Special protocols for sample collection, transport and storage Special Custom pre-digestion steps (including mechanical and chemical decontamination, short-time pre-incubation) Independent replication in two labs PCR-capture with species-specific primers |
Exclusion of long reads or alignments (in case of 454 or Sanger sequencing) as aDNA fragments are very short, usually <100nt Phylogenetic correctness correction (exclusion of reads based on similarity with non-target species; inclusion of reads based on similarity with the target species or a close relative) Conformity to species- or ethnicity-specific variants or haplotypes Checking homozygosity of X and Y positions in male specimens, absence of Y reads in female specimens, homozygosity of mtDNA positions Absence of haplotypes present in research team members Distinguishing mtDNA sequences from NUMTs |
The solutions aimed at one or more of the problems are not mutually exclusive and are often used in combination for better results. In addition, various bioinformatics ideas for tackling contamination and base damage are sometimes integrated into a single Maximum Likelihood framework for base and genotype calling.
Figure 3.Flowchart of a typical bioinformatics pipeline for aDNA analysis using NGS data.
Figure 4.Epigenetic analysis of aDNA. As a result of cytosine and methyl-cytosine deamination in postmortem sample, we observe C→U and mC→T conversions. When Taq polymerase is used for DNA amplification, both C→U and mC→T will be recorded as T (this is the major difference between ancient and bisulphite-treated samples when only unmethylated cytosine in converted to U while mC remains unchanged). When Pfu polymerase is used, U will not be amplified, while those T that appeared as a result of mC→T conversion will be read as T. The pie charts demonstrate the ratio of sequenced C to T. This C/T ratio with Taq and Pfu along with comparison with the reference genome allows detection of methylated cytosines: in the case of postmortem deamination C→U and PCR by Pfu the frequency of T will be decreased.