| Literature DB >> 20926420 |
Mark H Wright1, Chih-Wei Tung, Keyan Zhao, Andy Reynolds, Susan R McCouch, Carlos D Bustamante.
Abstract
MOTIVATION: The development of new high-throughput genotyping products requires a significant investment in testing and training samples to evaluate and optimize the product before it can be used reliably on new samples. One reason for this is current methods for automated calling of genotypes are based on clustering approaches which require a large number of samples to be analyzed simultaneously, or an extensive training dataset to seed clusters. In systems where inbred samples are of primary interest, current clustering approaches perform poorly due to the inability to clearly identify a heterozygote cluster.Entities:
Mesh:
Year: 2010 PMID: 20926420 PMCID: PMC2982150 DOI: 10.1093/bioinformatics/btq533
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Density plot of log intensities across all A allele probes for one sample (black solid line) and fit of Gaussian mixture distribution (gray dashed line).
Comparison of reference lines to published genome sequence
| Illumina 1536 SNP GoldenGate OPA | |||||
|---|---|---|---|---|---|
| ALCHEMY | BeadStudio | ||||
| Line | # | Agreement | Call rate | Agreement | Call rate |
| (%) | (%) | (%) | (%) | ||
| Nipponbare | 7 | 99.6 | 99.0 | 96.4 | 99.6 |
| 9311 | 7 | 95.6 | 98.0 | 93.3 | 98.4 |
| NPx9311 F1 | 6 | 93.6 | 96.7 | 91.7 | 99.8 |
| Average | 96.4 | 98.0 | 93.9 | 99.2 | |
| Affymetrix 44K GeneChip | |||||
| ALCHEMY | BRLMM-P | ||||
| Line | # | Agreement | Call rate | Agreement | Call rate |
| (%) | (%) | (%) | (%) | ||
| Nipponbare | 7 | 99.1 | 97.6 | 93.2 | 87.1 |
| 9311 | 5 | 96.5 | 96.4 | 89.1 | 87.1 |
| NPx9311 F1 | 6 | 94.7 | 92.7 | 90.3 | 84.5 |
| Average | 96.9 | 95.6 | 91.1 | 86.2 | |
aNumbers reported are averages across replicate samples.
bPercentage of genotype calls which agree with published sequence presuming homozygosity.
cThe 9311 line genotyped in this study was obtained from a different source than the sequenced line (see text).
dGenotypes predicted from parental genome sequence assuming normal Mendelian transmission and presuming homozygosity of the parents.
Pairwise concordance for replicate samples
| Illumina 1536 SNP GoldenGate OPA | |||||
|---|---|---|---|---|---|
| ALCHEMY | BeadStudio | ||||
| Line | # pairs | Concordance | Call rate | Concordance | Call rate |
| (%) | (%) | (%) | (%) | ||
| Nipponbare | 21 | 100.0 | 98.4 | 98.5 | 99.3 |
| 9311 | 21 | 99.5 | 94.6 | 97.0 | 95.1 |
| NB+9311-GL | 15 | 99.5 | 93.7 | 98.4 | 99.4 |
| All others | 4 | 98.2 | 90.5 | 95.6 | 94.1 |
| Average | 99.6 | 95.4 | 97.8 | 97.5 | |
| Affymetrix 44K GeneChip | |||||
| ALCHEMY | BRLMM-P | ||||
| Line | # pairs | Concordance | Call rate | Concordance | Call rate |
| (%) | (%) | (%) | (%) | ||
| Nipponbare | 21 | 99.6 | 95.7 | 98.1 | 80.1 |
| 9311 | 10 | 99.6 | 92.6 | 98.0 | 79.1 |
| NB+9311-GL | 15 | 99.4 | 88.6 | 98.5 | 75.0 |
| All others | 19 | 99.4 | 92.2 | 97.6 | 77.5 |
| Average | 99.5 | 92.6 | 98.0 | 78.0 | |
aCall rate in this table refers to the percentage of SNPs called in both samples of a replicate pair. Individual sample call rates are higher.
ALCHEMY versus BRLMM-P on 270 Human HapMap Phase II samples
| ALCHEMY (%) | BRLMM-P (%) | |
|---|---|---|
| Accuracy | 99.78 | 99.82 |
| Call rate | 98.82 | 99.19 |
Accuracy refers to the agreement between genotype calls for the respective algorithm and HapMap Phase II published genotypes.
ALCHEMY versus BRLMM-P on single samples and small sample subsets
| # of samples | ALCHEMY | BRLMM-P | ||
|---|---|---|---|---|
| Accuracy | Call rate | Accuracy | Call rate | |
| (%) | (%) | (%) | (%) | |
| Nipponbare alone | 99.2 | 94.9 | 83.2 | 98.2 |
| 9311 alone | 98.8 | 96.1 | 80.1 | 98.4 |
| NPx9311 (F1) alone | 69.0 | 99.2 | 89.8 | 98.6 |
| 3 (full trio) | 97.1 | 97.9 | 87.6 | 97.8 |
| 6 | 97.7 | 97.4 | 87.7 | 96.8 |
| 9 | 98.6 | 97.2 | 87.7 | 96.0 |
| 12 | 98.8 | 97.3 | 88.1 | 95.3 |
| 18 | 99.2 | 97.4 | 88.7 | 94.4 |
| 24 | 99.3 | 97.5 | 89.1 | 93.4 |
| 48 | 99.5 | 97.5 | 90.1 | 91.0 |
| 72 | 99.6 | 97.6 | 91.6 | 90.2 |
Fig. 2.Effect of increasing number of samples which are simultaneously analyzed for ALCHEMY and BRLMM-P (Affymetrix 44K).
Fig. 3.Trade-off between accuracy and completeness of the dataset generated by varying the threshold at which genotypes with lower posterior probabilities are declared ‘no call’ and dropped from the final dataset. Note the limited range of the y-axis.