| Literature DB >> 19208125 |
Chun-Chi Liu1, Jianjun Hu, Mrinal Kalakrishnan, Haiyan Huang, Xianghong Jasmine Zhou.
Abstract
BACKGROUND: Disease classification has been an important application of microarray technology. However, most microarray-based classifiers can only handle data generated within the same study, since microarray data generated by different laboratories or with different platforms can not be compared directly due to systematic variations. This issue has severely limited the practical use of microarray-based disease classification.Entities:
Mesh:
Year: 2009 PMID: 19208125 PMCID: PMC2648756 DOI: 10.1186/1471-2105-10-S1-S25
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Diagram of the integrative disease classification framework. The framework consists of three major steps: (1) Standardization of microarray data and dataset annotation: expression log-rank-ratio vectors were constructed from each microarray data set, and UMLS concepts were extracted from the dataset summary and corresponding MeSH headings. (2) Disease class construction: disease classes were initially constructed by FIM analysis and were further refined by calculating the phenotype distance score. (3) Classification by SVM and ManiSVM.
Selected disease classes and their associated classification performance.
| UMLS concepts | Datasets | Phenotype distance score | ManiSVM accuracy | SVM accuracy |
| C0027651 (Neoplasms), C0027660 (Neoplasms, Glandular and Epithelial), C0040300 (Body tissue), C0007097 (Carcinoma), C0027653 (Neoplasms by Site), C0027652 (Neoplasms by Histologic Type) | GDS1070 | 3.50E-05 | 0.8421 | 0.6018 |
| C0018981 (Hemic and Lymphatic Diseases), C0005773 (Blood Cells), C0018939 (Hematological Disease) | GDS1257 | 7.10E-05 | 0.8047 | 0.6253 |
| C0007682 (CNS disorder), C0006111 (Brain Diseases), C0027765 (nervous system disorder) | GDS1331 | 9.99E-03 | 0.7569 | 0.6483 |
| C0021311 (Infection), C0004615 (Bacterial Infections and Mycoses) | GDS1428 | 2.36E-04 | 0.7498 | 0.5253 |
Figure 2Classification performance increases with size and phenotype homogeneity of disease classes. The disease classes were divided into bins (a) based on the number of datasets (from 3 to 7) in the classes, or (b) based on the p-value of the phenotype distance score (p-value intervals were chosen as: 1.0E-6 to 1.0E-5, 1.0E-5 to 1.0E-4, 1.0E-4 to 1.0E-3, 1.0E-3 to 1.0E-2, and 1.0E-2 to 5.0E-2). For each bin, the average accuracy was calculated by performing ManiSVM and SVM classification.