| Literature DB >> 17894875 |
Sarah M C Flynn1, Steven M Carr.
Abstract
BACKGROUND: Iterative DNA "resequencing" on oligonucleotide microarrays offers a high-throughput method to measure intraspecific iodiversity, one that is especially suited to SNP-dense gene regions such as vertebrate mitochondrial (mtDNA) genomes. However, costs of single-species design and microarray fabrication are prohibitive. A cost-effective, multi-species strategy is to hybridize experimental DNAs from diverse species to a common microarray that is tiled with oligonucleotide sets from multiple, homologous reference genomes. Such a strategy requires that cross-hybridization between the experimental DNAs and reference oligos from the different species not interfere with the accurate recovery of species-specific data. To determine the pattern and limits of such interspecific hybridization, we compared the efficiency of sequence recovery and accuracy of SNP identification by a 15,452-base human-specific microarray challenged with human, chimpanzee, gorilla, and codfish mtDNA genomes.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17894875 PMCID: PMC2211321 DOI: 10.1186/1471-2164-8-339
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1SNP density per 100 bps between the tiled human mtDNA sequence and the chimpanzee and gorilla mtDNA genomes, as identified by dideoxy DNA sequencing. SNP densities were calculated in a sliding window starting at Position 51 of the tiled sequence.
Percentage of 25-bp regions in the chimpanzee and gorilla mtDNA genomes that contain a given number of SNPs with respect to the tiled human mtDNA reference sequence on the MitoChip microarray
| # SNPs SN | Chimpanzee | Gorilla |
| 0 | 17.19 | 12.60 |
| 1 | 24.09 | 17.64 |
| 2 | 24.34 | 23.82 |
| 3 | 18.19 | 21.02 |
| 4 | 9.77 | 12.78 |
| 5 | 4.10 | 7.00 |
| 6 | 1.40 | 3.21 |
| 7 | 0.57 | 1.33 |
| 8 | 0.22 | 0.35 |
| 9 | 0.10 | 0.10 |
| 10 | 0.03 | 0.06 |
| 11 | 0.01 | 0.06 |
| 12 | 0.00 | 0.05 |
Efficiency, accuracy, and errors rates of microarray resequencing
| Correct | Incorrect | |
| Microarray: | Missed SNP + | |
| SNP | at SNP site | |
| Microarray: | Miscalled SNP + | |
| no SNP | at non-SNP site |
Efficiency is the proportion of SNP and non-SNP sites identified correctly as compared with the canonical dideoxy sequence, either at high or low efficiency. Accuracy is the proportion of SNPs identified correctly. Correct and erroneous calls may be made at either high or low confidence, differentiated by dS/N= 0.20 (see Methods and Figure 2). The table is equivalent to a conventional 2 × 2 contingency table, with the categories on the second line transposed. This arrangement emphasizes the column totals of correct versus incorrect calls, and the assignment of low-confidence incorrect calls as Ns, in Tables 3, 4, and 5.
Figure 2Number of errors at various dS/N cutoffs. The number of errors is the number of incorrect SNP identifications in chimpanzee (diamonds) and gorilla (squares).
Efficiency, accuracy, and error of microarray resequencing of the human mtDNA genome (see table 2 for definitions)
| Human | Correct | Incorrect |
| Microarray: | ||
| SNP | ||
| Microarray: | ||
| no SNP | 0 + | |
| = 15451 (> 99.99 %) | + 1 (< 0.01 %) |
(a) Efficiency, accuracy, and error of microarray resequencing and (b) SNP density in the intervals ± 12 bp surrounding correct and incorrect calls of SNP and constant sites in chimpanzee mtDNA (see table 2 for definitions)
| (a) | Correct | Incorrect |
| Microarray: | 498 + | |
| SNP | ||
| Microarray: | ||
| no SNP | ||
| = 13145 (85.07%) | + 2276 (14.73%) | |
| (b) | Correct | Incorrect |
| Microarray: | ||
| SNP | ||
| Microarray: | ||
| no SNP | ||
(a) Efficiency, accuracy, and error of microarray resequencing and (b) SNP density in the intervals ± 12 bp surrounding correct and incorrect calls of SNP and constant sites in gorilla mtDNA (see table 2 for definitions)
| (a) | Correct | Incorrect |
| Microarray: | ||
| SNP | ||
| Microarray: | ||
| no SNP | ||
| = 13668 (88.45%) | + 1476 (9.55%) | |
| (b) | Correct | Incorrect |
| Microarray: | ||
| SNP | 25.80% | 31.29% |
| Microarray: | ||
| no SNP | 21.20% | 32.34% |
Accuracy and error rate of microarray resequencing for 6299 bases called with high-confidence in both chimpanzee and gorilla
| Chimpanzee | Correct | Incorrect |
| Microarray: | ||
| SNP | ||
| Microarray: | ||
| no SNP | ||
| Gorilla | Correct | Incorrect |
| Microarray: | ||
| SNP | ||
| Microarray: | ||
| no SNP | ||
Figure 3SNP density versus mismatch density per 25 bps in chimpanzee and gorilla mtDNA genomes. Bubbles are proportional to the number of events at each point.
Figure 4Experimental DNA binding of human and Atlantic Cod (Gadus morhua) mtDNA hybridized to a human-mtDNA-specific resequencing microarray.
Figure 5Phylogenetic relationships with and among Gorilla, Pan, and Homo, based on mitochondrial DNA genome sequences (without D-loops). The single minimum-length tree had a length of 2828. All nodes are supported in 100% of 10,000 bootstrap replications, Sequences marked (") are from the present paper. The unmarked sequences are from GenBank (Gorilla [NC_001645], Pan troglodytes [NC_001643], P. paniscus [NC_001644], and Homo [revised Cambridge Reference Sequence (rCRS): J01415.1]). The Homo sequence marked (') is from the individual (GenBank AF347008) identified in ref (12) as most divergent from the rCRS.
Figure 6High-confidence error rate (E: squares) and SNP detection rate (circles) versus pairwise sequence divergence (D) for human, chimpanzee, and gorilla mtDNA genomes. The equation of the trend line is log(E) = (19.6)(D) – 3.9.
Sequence and positions of primers used to amplify and/or sequence the mtDNA genomes of chimpanzee and gorilla
| h01 | F: CTCCTCAAAGCAATACACTG | x | 1 |
| h02 | F: CGATCAACCTCACCACCTCT | 635 | |
| h03 | F: GACTAACCCCTATACCTTCTGC | 1 240 | |
| h04 | F: AAATCTTACCCCGCCTGTTT | 1 889 | |
| h05 | F: TACTTCACAAAGCGCCTTCC | 2 558 | |
| h06 | F: TGGCTCCTTTAACCTCTCCA | x | 3 185 |
| h07 | F: ACTAATTAATCCCCTGGCCC | 3 874 | |
| h08 | F: CTAACCGGCTTTTTGCCC | 4 646 | |
| h09 | F: GAGGCCTAACCCCTGTCTTT | x | 5 243 |
| h10 | F: CTCTTCGTCTGATCCGTCCT | x | 5 858 |
| h11 | F: ACGCCAAAATCCATTTCACT | 6 537 | |
| h12 | F: ACGAGTACACCGACTACGGC | 7 316 | |
| h13 | F: TTTCCCCCTCTATTGATCCC | x | 8 010 |
| h14 | F: CCCACCAATCACATGCCTAT | x | 8 619 |
| h15 | F: TCTCCATCTATTGATGAGGGTCT | 9 375 | |
| h16 | F: GCCATACTAGTCTTTGCCGC | 11 061 | |
| h17 | F: TCACTCTCACTGCCCAAGAA | x | 10 703 |
| h18 | F: TATCACTCTCCTACTTACAG | x | 11 337 |
| h19 | F: AAACAACCCAGCTCTCCCTAA | x | 11 959 |
| h20 | F: ACATCTGTACCCACGCCTTC | x | 12 727 |
| h21 | F: GCATAATTAAACTTTACTTC | 13 489 | |
| h22 | F: TGAAACTTCGGCTCACTCCT | x | 14 245 |
| h23 | F: TCATTGGACAAGTAGCATCC | x | 15 200 |
| Gg11 | F: CCCACACAGTTTATGTAGCTTACCTC | x | 6612 |
| Gg12 | R: GAATATTAGCTTTGGGTGCTGATGGTGG | x | 8093 |
| Gg18 | F: CTATCCCTCAACCCCGATATTACT | x | 11 520 |
| Gg20 | F: CCTTACTTCAACCTCCCTAGCCATTG | 12 859 |
Properties of PCR amplicons used for microarray resequencing of chimpanzee and gorilla: size, required mass and observed concentration, and volume added to pool
| Chimpanzee PCR Amplicons | Size (bp) | Mass (ng) | Conc. (ng/mL) | Volume (uL) |
| h01F, h10R | 6736 | 244.5 | 19.8 | 12.35 |
| h09F, h13R | 3582 | 130.0 | 11.1 | 11.71 |
| h14F, h17R | 2885 | 104.7 | 5 | 20.95 |
| h18F, h18R | 866 | 31.4 | 9.4 | 3.34 |
| h19F, h19R | 976 | 35.4 | 6.9 | 5.13 |
| h20F, h22R | 2679 | 97.2 | 12.5 | 7.78 |
| h23F, h23R | 807 | 29.3 | 11.1 | 2.64 |
| Gorilla PCR Amplicons | Size (bp) | Mass (ng) | Conc. (ng/mL) | Volume (uL) |
| h01F, h06R | 4082 | 148.2 | 19.4 | 7.64 |
| h06F, h09R | 2888 | 104.8 | 15.1 | 6.94 |
| h10F, h10R | 886 | 32.2 | 14.1 | 2.28 |
| Gg11F, Gg12R | 1420 | 51.5 | 14.7 | 3.51 |
| h13F, h13R | 815 | 29.6 | 20.6 | 1.44 |
| h14F, h17R | 2885 | 104.7 | 22.8 | 4.59 |
| Gg18F, Gg18R | 590 | 21.4 | 25.4 | 0.84 |