| Literature DB >> 16872506 |
Naoto Yukinawa1, Shigeyuki Oba, Kikuya Kato, Kazuya Taniguchi, Kyoko Iwao-Koizumi, Yasuhiro Tamaki, Shinzaburo Noguchi, Shin Ishii.
Abstract
BACKGROUND: Although microscopic diagnosis has been playing the decisive role in cancer diagnostics, there have been cases in which it does not satisfy the clinical need. Differential diagnosis of malignant and benign thyroid tissues is one such case, and supplementary diagnosis such as that by gene expression profile is expected.Entities:
Mesh:
Year: 2006 PMID: 16872506 PMCID: PMC1550728 DOI: 10.1186/1471-2164-7-190
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Correlation of global gene expression profiles with differences in thyroid tissue type. a) correlation ratio between follicular adenoma (FA) and normal thyroid (N). The red line and blue lines represent the correlation ratio of the original data and those of the permuted data, respectively. Blue lines are the results of twelve trials of permutation. b) correlation ratios of various combination of the four thyroid tissues.
Figure 2Accuracy curves of binary classifiers differentiating two of the four thyroid tissues. Vertical axis, accuracy; horizontal axis, selection criteria of diagnostic genes as p-value (t-statistics).
Figure 3Combinations of unit binary classifiers used for construction of a multi-class predictor in this study. [l|m] represents a unit binary classifier separating class(es) l and class(es) m.
Figure 4a) The probabilistic decision process of the multi-class predictor, for an example classification problem of four classes: 1, 2, 3 and 4. Firstly, one of the binary classifiers in a set B is selected with uniform probability 1#B, where #B is the number of binary targets in B. Secondly, class subset 1 or 2,3 is selected with probability q[1|23](x) or 1 - q[1|23](x), respectively. Thirdly, class 2 or 3 is selected with a probability of 1/2. Accordingly, one of the classes is selected with a certain probability. b) Calculation of class probability by SHS. For a binary classifier [l|m] in B and an input x which is a member of classes in l or m, we define q[(x) as an estimated probability where x is a member of class(es) in l, and the complement probability 1 - q[(x) where x is a member of class(es) in m. For example, q[1|23](x) and 1 - q[1|23](x) indicate the probability that x belongs to the class 1, and that x belongs to the class 2 or 3 provided that x belongs to the class 1, 2 or 3. In the SHS procedure, the probabilistic outputs by the multiple classifiers are shared and integrated by multiple classes, leading to the estimated class membership probabilities: p1,p2,p3 and p4. When l and/or m are set of multiple classes, the corresponding probabilistic outputs are shared equally to each of the members. For example, q[1|23](x) is added to p1, 1 - q[1|23](x) is shared equally and added to p2 and p3, q[1|234](x) is added to p1, 1 - q[1|234](x) is shared equally and added to p2, p3 and p4, and so on for all members of B. Consequently, we obtain an estimation of multiple class probabilities p1,p2, p3 and p4 by normalizing them so that the summation p1, p2, p3 and p4 would be one.
Prediction accuracies (%) of the learning sample set, evaluated by leave-one-out cross-validation.
| 1R | 69.8 | 69.8 |
| 77.3 | 77.3 | |
| 79.8 | 79.8 | |
| N/A | 79.0 |
Figure 5a) Confusion matrix of the results of the learning set (1A-SHS, evaluated by leave-one-out cross-validation). Each cell shows the number of samples along with true and predicted labels. True and predicted class labels are aligned vertically, and horizontally. b) Confusion matrix of the results of the test set.
Comparison of prediction accuracies of various algorithms. Each figure represents the best accuracy obtained by the gene-selection condition shown as the value in the parenthesis. The greatest accuracy of each data set is shown as bold letters. Values in parentheses are numbers of diagnostic genes selected by recursive feature elimination for MC-SVM, shrinkage parameters for SC, and threshold p-value for others.
| MC-SVM | SC | ||||||||
| thyroid | 74.8 (10-3) | 74.8 (10-3) | 79.0 (10-4) | 74.8 (2000) | 74.8 (0.5) | ||||
| GCM | 86.3 (10-3) | 86.3 (10-3) | 89.0 (10-5) | 89.0 (10-5) | 89.0 (10-6) | 89.0 (10-6) | 80.8 (32) | 74.0 (0) | |
| SRBCT | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 (32~2308) | 100 (1~2) |
| esophageal | 72.3 (10-1) | 72.3 (10-1) | 71.6 (10-1) | 71.6 (10-1) | 73.1 (10-1) | 71.7 (8) | 68.8 (≥ 1) |
Figure 6Visualization of the learning samples by class probabilities.