| Literature DB >> 15760469 |
Lovisa Lovmar1, Annika Ahlford, Mats Jonsson, Ann-Christine Syvänen.
Abstract
BACKGROUND: High-throughput genotyping of single nucleotide polymorphisms (SNPs) generates large amounts of data. In many SNP genotyping assays, the genotype assignment is based on scatter plots of signals corresponding to the two SNP alleles. In a robust assay the three clusters that define the genotypes are well separated and the distances between the data points within a cluster are short. "Silhouettes" is a graphical aid for interpretation and validation of data clusters that provides a measure of how well a data point was classified when it was assigned to a cluster. Thus "Silhouettes" can potentially be used as a quality measure for SNP genotyping results and for objective comparison of the performance of SNP assays at different circumstances.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15760469 PMCID: PMC555759 DOI: 10.1186/1471-2164-6-35
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Principle for Silhouette scores. Principle for quality assessment of genotyping clusters using Silhouette scores, illustrated for one data point (i). The SNP genotypes have been assigned based on cluster formation in scatter plots with the signal intensity fraction on the x-axis and the logarithm of the signals from both alleles on the y-axis. For each data point (i) in the scatter plot, the Silhouette s(i) is calculated by the formula in the figure, where a(i) is the average distance from i to all data points in the same genotype cluster (green lines), and b(i) is the average distances from i to all data points in the cluster closest to the data point, either b1(i) (blue lines) or b2(i) (red lines) [1]. Max and min in the formula denote the largest or smallest of the measures in the brackets. The "average silhouette width" is calculated by calculating the mean of all s(i) for each genotype cluster and the "Silhouette score" for the whole scatter plot (SNP assay) is obtained by taking the mean of the average silhouette width for all clusters.
Figure 2Examples of Silhouette scores. Examples of genotype clusters from nine SNP assays, each with the results from 16 samples genotyped in duplicate using Tag-array minisequencing with the calculated Silhouette scores shown in the right hand upper corner of each panel. The blue circles represent homozygotes for allele 2, the red triangles are heterozygotes and the green squares are homozygotes for allele 1. The SNPs are denoted by their dbSNP identification number, and the DNA polarities analyzed are indicated by "cod" or "nc".
Figure 3Distribution of Silhouette scores from minisequencing assays using four DNA polymerases. The Silhouette score is given on the y-axis. Each black diamond represents the Silhouette score for one SNP assay. The light blue rectangular boxes indicate those 75% of the scatter plots that yielded the highest silhouette scores for each enzyme. Quartiles are indicated by the black horizontal lines.
Silhouette scores, signal to noise ratios and genotyping performance for four DNA polymerases in Tag-array minisequencing1
| Silhouette score 2 | S/N 3 | Genotype calls 4 | |||||||||
| Average | Median | Highest | Average | Highest | Correct | Errors | |||||
| n | % | n | % | n | % | n | % | ||||
| TERMIPol | 0.72 | 0.78 | 20 | 25.3 | 4.3 | 11 | 13.9 | 2337 | 98.9 | 18 | 0.8 |
| Therminator | 0.69 | 0.79 | 15 | 19.0 | 3.6 | 7 | 8.9 | 2323 | 98.3 | 32 | 1.4 |
| KlenThermase | 0.74 | 0.79 | 22 | 27.8 | 8.0 | 21 | 26.6 | 2346 | 99.3 | 10 | 0.4 |
| ThermoSequenase | 0.71 | 0.82 | 22 | 27.8 | 8.9 | 40 | 50.6 | 2324 | 98.3 | 34 | 1.4 |
1 Duplicate experiments, each with duplicate SNP assays in both DNA polarities, were performed and the results are composite values from both experiments.
2 The Silhouette scores were calculated as described in Figure 1. The average and the median score for all SNPs are given for each enzyme together with the number of SNP assays (n) and frequency (%) where an enzyme yielded the highest Silhouette score.
3 Signal to noise ratios (S/N) were calculated from each spot by dividing the fluorescence intensity values from the fluorescently labelled ddNTP/ddNTPs corresponding to a true genotype (signal) by the fluorescent intensity value from the other ddNTPs (noise). The average S/N ratios are given together with the number of SNP assays (n) and frequency (%) where an enzyme yielded the highest S/N.
4 Number of genotype calls (n) and call rate (%). The genotype obtained from the majority of the assays was considered to be the correct one. The percentages of the samples not accounted for in the table failed to give genotypes.