| Literature DB >> 19661919 |
Richard E Green1, Adrian W Briggs, Johannes Krause, Kay Prüfer, Hernán A Burbano, Michael Siebauer, Michael Lachmann, Svante Pääbo.
Abstract
Recent advances in high-thoughput DNA sequencing have made genome-scale analyses of genomes of extinct organisms possible. With these new opportunities come new difficulties in assessing the authenticity of the DNA sequences retrieved. We discuss how these difficulties can be addressed, particularly with regard to analyses of the Neandertal genome. We argue that only direct assays of DNA sequence positions in which Neandertals differ from all contemporary humans can serve as a reliable means to estimate human contamination. Indirect measures, such as the extent of DNA fragmentation, nucleotide misincorporations, or comparison of derived allele frequencies in different fragment size classes, are unreliable. Fortunately, interim approaches based on mtDNA differences between Neandertals and current humans, detection of male contamination through Y chromosomal sequences, and repeated sequencing from the same fossil to detect autosomal contamination allow initial large-scale sequencing of Neandertal genomes. This will result in the discovery of fixed differences in the nuclear genome between Neandertals and current humans that can serve as future direct assays for contamination. For analyses of other fossil hominins, which may become possible in the future, we suggest a similar 'boot-strap' approach in which interim approaches are applied until sufficient data for more definitive direct assays are acquired.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19661919 PMCID: PMC2725275 DOI: 10.1038/emboj.2009.222
Source DB: PubMed Journal: EMBO J ISSN: 0261-4189 Impact factor: 11.598
Figure 1Estimates of human mtDNA contamination in Neandertal extracts. DNA extracts of Neandertal bones contain a large excess of microbial DNA (brown), at most a few percent of Neandertal DNA (blue) and generally variable amounts of contaminating DNA from current humans (red). Traditionally, contamination has been assayed through PCR directly from DNA extract from fossil bone (left lower panel). Accumulation of large numbers of reads from high-throughput sequencing allows a direct estimate of mtDNA contamination in the sequencing library (right lower panel). Once human/Neandertal diagnostic nuclear genome positions are learned, this strategy can be extended to nuclear DNA sequences.
MtDNA contamination and mtDNA to nuclear DNA ratios in some DNA extracts and sequencing libraries used to study the Neandertal genome
| Extract | N | H | Extract cont. | Library | N | H | Library cont. | Nuclear-mtDNA ratio |
|---|---|---|---|---|---|---|---|---|
| A | 111 | 1 | 0.8% (0.0–4.9%) | A.1 | 67 | 8 | 10.7% (4.7–19.9%) | 375 |
| A.2 | 4 | 0 | 0% (0.0–60.2%) | 222 | ||||
| B | 103 | 0 | 0.0% (0.0–3.5%) | BC.1 | 22 | 0 | 0.0% (0.0–15.4%) | 186 |
| C | 112 | 0 | 0.0% (0.0–3.2%) | BC.2 | 1822 | 7 | 0.4% (0.2–0.8%) | 157 |
| D | 152 | 8 | 5.0% (2.2–9.6%) | DEF.1 | 30 | 1 | 3.2% (0.1–16.7%) | 419 |
| E | 100 | 1 | 1.0% (0.0–5.4%) | |||||
| F | 174 | 8 | 4.4% (1.9–8.5%) | |||||
| Six extracts of Neandertal bone Vindija Vi33.16 (A–F) were prepared and analysed with respect to mtDNA contamination using PCR. N and H refer to Neandertal- and current human-like clones of mtDNA amplification products, respectively. These extracts were used to construct libraries used for sequencing. Library A.1 was constructed outside the clean room facility using standard 454 sequencing adapters and is published in | ||||||||
Figure 2Lengths of Neandertal and human mtDNA fragments. Distributions of mtDNA fragments carrying Neandertal diagnostic positions are shown in blue for three Neandertal fossils. Each red dot represents a single contaminating human mtDNA fragment of the indicated length (data from Briggs ).
Figure 3Neandertal/human divergence estimated from sequences of increasing length and score filtering. (A) Sequences in each length bin were used to calculate human/Neandertal divergence, given as the percentage of the human lineage back to the human/chimpanzee common ancestor in which the Neandertal sequences diverged. Sequences were filtered for uniqueness in the human and chimpanzee genomes by comparing the best alignment score to the second best score. In red are sequences whose best alignments are at least 1-bit better than the second best, in green with a difference of 5 bits or more. Bars show the 95% confidence interval from 1000 bootstrap replicates of the sequences in each bin. (B) Percentage of the sequences in each bin removed when increasing the alignment score filter from 1 to 5 bits. Shorter sequences are more likely to be removed by stricter filtering as they carry less information to place them uniquely in the human and chimpanzee genomes.
Figure 4Fraction of human polymorphic positions carrying derived alleles in Neandertal and human DNA sequences. (A) Neandertal sequences of increasing length that overlap human polymorphic positions were assessed for having the derived or ancestral (chimpanzee-like) allele. Blue points are for Neandertal data, red points for the corresponding sequences in the human reference genome (hg18). (B) Sequences of length 60–78 nucleotides were split in half and re-analysed (‘30–39'). Derived alleles are preferentially lost when fragments size is reduced.