| Literature DB >> 20174668 |
Ceiridwen J Edwards1, David A Magee, Stephen D E Park, Paul A McGettigan, Amanda J Lohan, Alison Murphy, Emma K Finlay, Beth Shapiro, Andrew T Chamberlain, Martin B Richards, Daniel G Bradley, Brendan J Loftus, David E MacHugh.
Abstract
BACKGROUND: The derivation of domestic cattle from the extinct wild aurochs (Bos primigenius) has been well-documented by archaeological and genetic studies. Genetic studies point towards the Neolithic Near East as the centre of origin for Bos taurus, with some lines of evidence suggesting possible, albeit rare, genetic contributions from locally domesticated wild aurochsen across Eurasia. Inferences from these investigations have been based largely on the analysis of partial mitochondrial DNA sequences generated from modern animals, with limited sequence data from ancient aurochsen samples. Recent developments in DNA sequencing technologies, however, are affording new opportunities for the examination of genetic material retrieved from extinct species, providing new insight into their evolutionary history. Here we present DNA sequence analysis of the first complete mitochondrial genome (16,338 base pairs) from an archaeologically-verified and exceptionally-well preserved aurochs bone sample.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20174668 PMCID: PMC2822870 DOI: 10.1371/journal.pone.0009255
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of Illumina GA sequencing data for the CPC98 aurochs sample.
| Summary of Illumina GA sequencing data for CPC98 aurochs femur bone | |
|
| |
| Total number of sequence reads generated from CPC98 | 49,125,583 |
| Total number of partial/complete Illumina GA adaptor sequences detected and excluded from analysis | 13,292,821 |
| Total number of non-adaptor Illumina GA reads generated from CPC98 | 35,832,762 |
| Total number of base pairs (bp) sequenced from CPC98 (excluding Illumina GA adaptor sequences) | 1,289,979,432 bp |
| Total number of non-adaptor sequence reads mapping to the bovine genome (% of total non-adaptor CPC98 reads) | 8,053,754 (22.48%) |
| Total number of base pairs mapping to bovine genome | 289,935,144 bp |
| Total number of non-adaptor reads mapping to the bovine genome and not to human genome (% of total non-adaptor CPC98 reads) | 7,868,524 (21.96%) |
| Total number of base pairs mapping to bovine genome and not to human genome | 283,266,864 bp |
| Total number of sequence reads mapping to the bovine and human genomes (% of total non-adaptor CPC98 reads) | 185,097 (0.52%) |
| Total number of base pairs mapping to bovine and human genomes | 6,663,492 bp |
| Total number of sequence reads mapping to the human genome and not the bovine genome (% of total non-adaptor CPC98 reads) | 48,555 (0.14%) |
| Total number of base pairs mapping to human genome and not the bovine genome | 1,747,980 bp |
|
| |
| mtDNA haplogroup of CPC98 | P |
| Total number of reads mapping to bovine haplogroup P mtDNA reference sequence DQ124389 (% of total non-adaptor CPC98 reads) | 5,144 (0.06%) |
| Total number of potential duplicate reads mapping to bovine haplogroup P mtDNA sequence | 1,036 |
| Total number of non-duplicate reads mapping to bovine haplogroup P mtDNA sequence (% of total non-adaptor CPC98 reads) | 4,108 (0.05%) |
| Total number of non-duplicate base pairs mapping to bovine haplogroup P mtDNA reference sequence | 147,888 bp |
| Size of Illumina GA-generated CPC98 mtDNA genome (where ≥2× sequencing coverage obtained) | 15,339 bp |
| Mean sequencing depth of Illumina GA-generated CPC98 mtDNA genome | 9.6× |
| Size of Illumina GA and Sanger consensus mtDNA genome | 16,338 bp |
| Mean sequencing depth of combined Illumina GA-generated CPC98 mtDNA genome | 16.9× |
| Number of nucleotide differences between CPC98 and V00654 mtDNA sequences (ti/tv/indels) | 71 (62/7/2) |
| Number of nucleotide differences between CPC98 and DQ124389 mtDNA sequences (ti/tv/indels/undetermined | 22 (19/0/2/1) |
*This includes a putative substitution at nucleotide position 15,714 which was called as an ‘N’ in sample DQ124389. ti (transitions); tv (transversions); indels (insertion/deletions).
Estimates of contamination with modern bovine DNA sequences.
| SNP allele | A | C | G | T | Illumina GA read depth | CPC98 consensus allele | Nucleotide position in Illumina GA read where mismatch occurs | Possible source of discrepancy | |||||||||
| Macro-haplogroup/Haplogroup | I | Q,R,T | P | I | Q,R,T | P | I | Q,R,T | P | I | Q,R,T | P | |||||
| 301 | 7 | 140 | 2 | 4 x T | T | No | |||||||||||
| 1,128 | 7 | 140 | 2 | 10 x G | G | No | |||||||||||
| 2,585 | 2 | 7 | 140 | 5 x C | C | No | |||||||||||
| 4,293 | 7 | 140 | 2 | 3 x T | T | No | |||||||||||
| 4,676 | 7 | 140 | 2 | 9 x G | G | No | |||||||||||
| 5,899 | 7 | 140 | 2 | 8 x G | G | No | |||||||||||
| 7,952 | 7 | 140 | 2 | 17 x T; 1 x C | T | Yes | Sequencing error/contamination | ||||||||||
| 8,236 | 2 | 7 | 140 | 4 x C | C | No | |||||||||||
| 8,358 | 7 | 140 | 2 | 8 x T | T | No | |||||||||||
| 10,126 | 7 | 140 | 2 | 9 x T | T | No | |||||||||||
| 11,140 | 7 | 140 | 2 | 2 x G | G | No | |||||||||||
| 12,016 | 2 | 7 | 140 | 1 x C | C | No | |||||||||||
| 13,821 | 7 | 140 | 2 | 8 x G | G | No | |||||||||||
| 14,129 | 7 | 140 | 2 | 11 x T | T | No | |||||||||||
| 14,873 | 2 | 7 | 140 | 6 x A | A | No | |||||||||||
| 15,673 | 2 | 7 | 140 | 5 x C | C | No | |||||||||||
The nucleotide position (as per the bovine mtDNA reference sequence, GenBank accession no. V00654) of each of the haplogroup P-diagnostic mtDNA SNPs is given in the left-hand column. The SNP allele identities for the I, R, Q and T haplogroups are shown. The numbers represent the number of times that an allele is observed in each of the bovine macro-haplogroups/haplogroups. The haplogroup P allele for the consensus sequence is provided along with the allele and read depth for each of the individual Illumina GA reads spanning the haplogroup P-diagnostic SNPs.
Figure 1The identity and distribution of DNA nucleotide mismatches in individual Illumina GA reads compared to the CPC98 consensus mtDNA genome.
(A) The number and proportion of each nucleotide called in the Illumina GA reads (vertical column) compared to the consensus mtDNA sequence (horizontal column) is presented. (B) Mean percentage of discordant nucleotides for each position across all individual Illumina GA sequence reads.
Figure 2Location of substitutions between the B. taurus reference and the B. primigenius (CPC98) mtDNA genome sequences and evidence of mtDNA heteroplasmy at nucleotide position 16,121 in the CPC98 aurochs sample.
(A) Location of substitutions between the B. taurus reference and the B. primigenius (CPC98) mtDNA genome sequences. (B) Heteroplasmy detected from analysis of individual Illumina GA reads spanning nucleotide position 16,121. (C) Heteroplasmy at nucleotide position 16,121 detected from analysis of Sanger chromatograms. Nucleotide positions according to the bovine mtDNA reference sequence (GenBank accession no. V00654).
Figure 3Rooted Neighbor-Joining (N–J) phylogenetic tree detailing the relationships among all available complete bovine haplogroup I, P, Q, R and T mtDNA genome sequences and five yak (B. grunniens) mtDNA genome sequences.
Evolutionary distances were computed using the Maximum Composite Likelihood method and are in the units of the number of base substitutions per site. Only coding region sequences of the mtDNA genome were used for tree construction (mtDNA nucleotide position 364–15,791). Bootstrap values (1000 replicates) are shown next to the branches. The number of mtDNA sequences within each of the haplogroups is indicated. The haplogroup to which the CPC98 mtDNA genome sequence belongs is highlighted. Five complete mtDNA genome sequences from yak (B. grunniens) were used as outgroups.
Figure 4Location of substitutions distinguishing the complete CPC98 consensus mtDNA genome sequence and the other complete haplogroup P sequence (GenBank accession no. DQ124389).
Nucleotide positions according to the bovine mtDNA reference sequence (GenBank accession no. V00654).
Nucleotide diversity statistics for each of the major Bos mtDNA haplogroups.
| Coding region of mtDNA genome (nucleotide positions 364-15,791 | Whole mtDNA genome | ||||||||
| Super-haplogroup/Macro-haplogroup/Haplogroup | No. of mtDNAs | No. polymorphic sites | π | σ | tv/ti | No. polymorphic sites | π | σ | tv/ti |
| Super-haplogroup IRPQT | 149 | 627 | 0.002129 | 0.001033 | 0.07 | 766 | 0.002792 | 0.001347 | 0.07 |
| Super-haplogroup RPQT | 142 | 474 | 0.001058 | 0.000525 | 0.07 | 599 | 0.001567 | 0.000766 | 0.08 |
| Super-haplogroup PQT | 138 | 412 | 0.000793 | 0.000399 | 0.08 | 526 | 0.001214 | 0.000598 | 0.09 |
| Super-haplogroup QT | 136 | 385 | 0.000724 | 0.000366 | 0.07 | 492 | 0.001111 | 0.000549 | 0.08 |
| Macro-haplogroup I | 7 | 32 | 0.000926 | 0.000543 | 0.14 | 47 | 0.001259 | 0.000727 | 0.21 |
| Haplogroup P | 2 | 11 | 0.000713 | 0.000745 | 0.00 | 22 | 0.001346 | 0.001377 | 0.00 |
| Haplogroup R | 4 | 28 | 0.000907 | 0.000619 | 0.00 | 36 | 0.001101 | 0.000744 | 0.03 |
| Haplogroup Q | 6 | 15 | 0.000480 | 0.000302 | 0.00 | 27 | 0.000845 | 0.000512 | 0.15 |
| Macro-haplogroup T | 130 | 357 | 0.000616 | 0.000315 | 0.08 | 460 | 0.000986 | 0.000490 | 0.08 |
Nucleotide diversity estimates (π) and standard deviations (σ) together with the total number of polymorphic sites for haplogroups I, R, P, Q and T are presented based on coding and complete mtDNA sequences. ti/tv (transition-to-transversion ratios) within each haplogroup are also given.