| Literature DB >> 16223442 |
Max Bylesjö1, Daniel Eriksson, Andreas Sjödin, Michael Sjöström, Stefan Jansson, Henrik Antti, Johan Trygg.
Abstract
BACKGROUND: cDNA microarray technology has emerged as a major player in the parallel detection of biomolecules, but still suffers from fundamental technical problems. Identifying and removing unreliable data is crucial to prevent the risk of receiving illusive analysis results. Visual assessment of spot quality is still a common procedure, despite the time-consuming work of manually inspecting spots in the range of hundreds of thousands or more.Entities:
Mesh:
Year: 2005 PMID: 16223442 PMCID: PMC1276784 DOI: 10.1186/1471-2105-6-250
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Flowchart of the classification procedure. The classification process involves an 8-bit image, optimized for segmentation, as well as a 32-bit image, used for information extraction. During the training phase, visual classification results are required while this is not necessary for external data.
The different sub-classes of bad spots.
| not bad | No issue. Contains all spots with no apparent problems according to the classification by the three experienced users. |
| HIFI | High-Intensity Foreground Issue. Typically intensity distribution issues, such a dye debris in the foreground region or donut-shaped spots, with very distinct characteristics. |
| LIFI | Low-Intensity Foreground Issue. Weak intensity distribution issues in the foreground region or morphological issues. |
| HIBI | High-Intensity Background Issue. Typically intensity distribution issues, such a dye debris in the background region, with very distinct characteristics. |
| LIBI | Low-Intensity Background Issue. Weak intensity distribution issues or faint increases in noise level in the background region. |
| HIFI/HIBI | A combination of HIFI and HIBI. |
| HIFI/LIBI | A combination of HIFI and LIBI. |
| LIFI/HIBI | A combination of LIFI and HIBI. |
| LIFI/LIBI | A combination of LIFI and LIBI. |
| HIFI/LIFI | A combination of HIFI and LIFI. |
| HIFI/LIFI/HIBI | A combination of HIFI and LIFI and HIBI. |
Figure 2Receiver Operating Characteristics (ROC) plot. The relation between true positives (bad spots classified as bad) and false positives (not bad spots classified as bad) for the training and test data. The solid line denotes training data whereas the dashed line denotes test data.
Figure 3Density plot of the predicted class conformity of the not bad class. A class conformity value of 1 signifies perfect class conformity while a value of 0 signifies no class conformity. The dashed line illustrates the density for the prediction of the bad spots in the POP2 training set whereas the solid line illustrates the density of the prediction of the not bad spots in the POP2 training set.
Figure 4Relationship between classification accuracy and threshold value for the POP2 data. The threshold value t defines the boundary between bad and not bad spots for the POP2 training set (38 627 spots) and the POP2 test set (39 421 spots). Spots with a predicted class conformity value for the not bad class (CCnb) below the threshold value t are classified as bad while the remaining spots are classified as not bad. a) Overall classification accuracy vs. threshold value calculated as the fraction of correctly classified spots in the data set for a given threshold value. The solid line represents the POP2 training set whereas the dashed line represents the POP2 test set. The dotted vertical line at threshold value t = 0.4 illustrates an approximate maximum. b) Classification accuracy of the bad and not bad spots vs. threshold value. For the POP2 training set, the solid line represents the classification accuracy of the not bad spots and the dashed line represents the classification accuracy of the bad spots. For the POP2 test set, the dot-dashed line represents the classification accuracy of the not bad spots and the long-dashed line represents the classification accuracy of the bad spots. The dotted vertical line at threshold value t = 0.5 denotes the intersection point.
Classification accuracy of the POP2 training data. The classification accuracy for each sub-class as calculated using threshold value t = 0.5.
| not bad | 35983 | 94.7 |
| HIFI | 942 | 98.7 |
| LIFI | 76 | 86.8 |
| HIBI | 987 | 96.5 |
| LIBI | 284 | 85.9 |
| HIFI/HIBI | 81 | 97.5 |
| HIFI/LIBI | 69 | 98.6 |
| LIFI/HIBI | 66 | 98.5 |
| LIFI/LIBI | 44 | 77.3 |
| HIFI/LIFI | 29 | 89.7 |
| HIFI/LIFI/HIBI | 62 | 100.0 |
Comparison to other quality control methods. The presented quality control parameter CCnb was compared to the composite quality score qcom, the mean-median correlation factor mmcorr and the CVspot value. Threshold values for all quality control parameters were set to maximize overall classification accuracy. The classification accuracy was determined from classification of the POP2 test set.
| CCnb | 0.40 | 98.1% |
| qcom | 0.32 | 94.5% |
| mmcorr | 0.65 | 94.3% |
| CVspot | 1.05 | 95.0% |