| Literature DB >> 21526102 |
Mansour Ebrahimi1, Esmaeil Ebrahimie, Narges Shamabadi, Mahdi Ebrahimi.
Abstract
BACKGROUND: The most common cancer among women is breast cancer and it has been blamed as the second leading cause of cancer death in women; so far many approaches have been used to analyze and detect benign and malignant forms of cancer and understanding the features involved in proteins expressed by various types of breast cancers is crucial.Entities:
Keywords: Breast Neoplasms; Computational Biology; Decision Support Techniques
Year: 2010 PMID: 21526102 PMCID: PMC3082830
Source DB: PubMed Journal: J Res Med Sci ISSN: 1735-1995 Impact factor: 1.852
Results of feature selection on important (and one marginal) features contributing to the two types of breast cancers proteins. Higher values indicate that protein feature is more important.
| No | Protein feature | Value | Rank | No | Protein feature | Value | Rank |
|---|---|---|---|---|---|---|---|
| 1 | Count of Leu-Ile | 0.998 | Important | 31 | Count of Thr-Ile | 0.972 | Important |
| 2 | Count of Met-Ser | 0.997 | Important | 32 | Freq of Asp-Asp | 0.972 | Important |
| 3 | Freq of Asn-Ile | 0.996 | Important | 33 | Freq of Asn-Gln | 0.972 | Important |
| 4 | Count of Ile-Ile | 0.995 | Important | 34 | Count of Thr-Tyr | 0.972 | Important |
| 5 | Count of Ile-Cys | 0.995 | Important | 35 | Count of Leu-Cys | 0.971 | Important |
| 6 | Freq of Met-Ser | 0.995 | Important | 36 | Freq of Cys-Tyr | 0.971 | Important |
| 7 | Count of Ser-Phe | 0.993 | Important | 37 | Count of Asp-Tyr | 0.971 | Important |
| 8 | Count of Phe-Leu | 0.992 | Important | 38 | Count of Ile | 0.968 | Important |
| 9 | Freq of Gln-Val | 0.991 | Important | 39 | Count of Phe-Lys | 0.967 | Important |
| 10 | Count of Gly-Phe | 0.99 | Important | 40 | Count of Cys-Val | 0.966 | Important |
| 11 | Count of Gly-Val | 0.989 | Important | 41 | Count of Ser-Val | 0.964 | Important |
| 12 | Freq of Gly-Phe | 0.988 | Important | 42 | Count of Phe-Phe | 0.963 | Important |
| 13 | Count of Ala-Ile | 0.987 | Important | 43 | Freq of Glu-Trp | 0.963 | Important |
| 14 | Count of Tyr-Cys | 0.986 | Important | 44 | Count of Phe-Ile | 0.962 | Important |
| 15 | Count of Ile-Pro | 0.986 | Important | 45 | Freq of Phe-Met | 0.962 | Important |
| 16 | Freq of Asp-Leu | 0.985 | Important | 46 | Count of Gln-Val | 0.961 | Important |
| 17 | Count of Val-Ala | 0.985 | Important | 47 | Count of Asn-Arg | 0.96 | Important |
| 18 | Freq of Ile | 0.984 | Important | 48 | Count of Tyr-Trp | 0.96 | Important |
| 19 | Count of Phe (F) | 0.981 | Important | 49 | Count of Cys | 0.959 | Important |
| 20 | Count of Tyr-Pro | 0.981 | Important | 50 | Freq of Ala-Arg | 0.958 | Important |
| 21 | Freq of Asp-Ala (1) | 0.98 | Important | 51 | Count of Glu-Asn | 0.956 | Important |
| 22 | Count of Ala-Gly | 0.979 | Important | 52 | Freq of Ala-Ala | 0.953 | Important |
| 23 | Count of Val-Tyr | 0.979 | Important | 53 | Count of Ile-Asn | 0.952 | Important |
| 24 | Count of Lys-Phe | 0.978 | Important | 54 | Count of Asp-Arg | 0.952 | Important |
| 25 | Freq of His-Met | 0.977 | Important | 55 | Freq of Gln-Phe | 0.951 | Important |
| 26 | Count of Ile-Phe | 0.976 | Important | 56 | Count of Gly-Ile | 0.951 | Important |
| 27 | Freq of Ala-Leu | 0.974 | Important | 57 | Count of Asp-Lys | 0.951 | Important |
| 28 | Freq of Lys-His | 0.973 | Important | 58 | Count of Trp-Tyr | 0.948 | Important |
| 29 | Count of Phe-Asp | 0.973 | Important | 59 | Freq of Asn-Phe | 0.947 | Marginal |
| 30 | Freq of Asp(D) | 0.973 | Important | 60 | Count of Tyrosine (Y) | 0.947 | Marginal |
Figure 1A decision tree generated by the CHAID modeling method without feature selection filtering showing protein features used to build the decision tree. M = Malignant cancer; B = Benign; C = Common proteins
The association rules found in the data by the generalized rule induction (GRI) method, showing 100 most important rules created by GRI algorithm in classifying benign (B), malignant (M) and common (C) proteins expressed in breast cancers.
| Antecedent | Support (%) |
|---|---|
| Count of Ile-Ile > 2.500 | 42.86 |
| Count of Ile > 27.000 and Freq of Ala-Ala < 0.004 | 42.86 |
| Count of Phe-Lys > 0.500 and Count of Ile > 28.500 | 38.1 |
| Count of Ala-Gly > 0.500 and Count of Ile-Cys > 0.500 | 38.1 |
| Count of Ile > 27.000 and Count of Tyr-Pro > 0.500 | 38.1 |
| Freq of Met-Ser < 0.000 and Freq of Gln-Val < 0.002 and Freq of Asp < 0.078 | 23.81 |
| Count of Asp-Arg > 1.500 | 33.33 |
| Count of Asp-Lys > 2.500 and Freq of Ala-Ala < 0.004 | 33.33 |
| Count of Cys-Val > 1.500 and Count of Ile > 30.000 | 33.33 |
| Count of Ala-Gly > 0.500 and Count of Asp-Arg > 1.500 | 33.33 |
| Count of Ala-Gly > 0.500 and Freq of Asp < 0.059 and Freq of Ala-Ala < 0.004 | 33.33 |
| Count of Ala-Gly > 0.500 and Count of Ile > 28.500 and Freq of Ala-Leu < 0.008 | 33.33 |
| Count of Ile > 27.000 and Count of Tyr-Cys < 1.500 and Freq of Ala-Ala < 0.004 | 33.33 |
| Count of Ile > 27.000 and Count of Val-Ala < 7.000 and Freq of Ala-Ala < 0.004 | 33.33 |
| Count of Ile > 27.000 and Count of Ser-Val < 5.500 and Freq of Ala-Ala < 0.004 | 33.33 |
| Count of Ile > 27.000 and Count of Gln-Val < 6.000 and Freq of Ala-Ala < 0.004 | 33.33 |
| Count of Ile > 27.000 and Count of Asn-Arg < 4.500 and Freq of Ala-Ala < 0.004 | 33.33 |
| Count of Ile > 27.000 and Count of Met-Ser < 2.500 and Freq of Ala-Ala < 0.004 | 33.33 |
| Count of Ile > 27.000 and Count of Lys-Phe < 4.500 and Freq of Ala-Ala < 0.004 | 33.33 |
| Count of Ile > 27.000 and Count of Ile-Pro < 2.500 and Freq of Ala-Ala < 0.004 | 33.33 |
| Count of Ile > 27.000 and Count of Ile-Ile < 5.500 and Freq of Ala-Ala < 0.004 | 33.33 |
| Count of Ile > 27.000 and Count of Gly-Ile < 5.500 and Freq of Ala-Ala < 0.004 | 33.33 |
| Count of Ile > 27.000 and Count of Gly-Phe < 3.500 and Freq of Asp-Leu < 0.006 | 33.33 |
| Count of Ile > 27.000 and Count of Phe-Lys < 4.000 and Freq of Ala-Ala < 0.004 | 33.33 |
| Count of Ile > 27.000 and Count of Phe-Phe > 0.500 and Freq of Ala-Leu < 0.008 | 33.33 |
| Count of Ile > 27.000 and Count of Glu-Asn < 3.500 and Freq of Ala-Ala < 0.004 | 33.33 |
| Count of Ile > 27.000 and Count of Asp-Arg < 4.500 and Freq of Ala-Ala < 0.004 | 33.33 |
| Count of Ile > 27.000 and Count of Asp-Lys < 6.500 and Freq of Ala-Ala < 0.004 | 33.33 |
| Count of Ile > 27.000 and Count of Ala-Ile < 7.000 and Freq of Ala-Ala < 0.004 | 33.33 |
| Count of Ile > 27.000 and Count of Ala-Gly < 3.500 and Freq of Ala-Ala < 0.004 | 33.33 |
| Count of Ile > 27.000 and Freq of Asp < 0.056 and Freq of Ala-Ala < 0.004 | 33.33 |
| Count of Ile > 27.000 and Count of Phe < 64.500 and Freq of Ala-Ala < 0.004 | 33.33 |
| Freq of Asn-Ile < 0.000 and Freq of Ala-Ala < 0.008 and Count of Asp-Lys > 1.500 | 14.29 |
| Freq of Asn-Ile < 0.000 and Count of Asp-Lys > 1.500 and Freq of Ala-Ala < 0.008 | 14.29 |
| Freq of Asn-Ile < 0.000 and Freq of Ile > 0.040 and Freq of Ala-Leu < 0.008 | 14.29 |
| Freq of Asn-Ile < 0.000 and Count of Ile > 11.500 and Freq of Ala-Leu < 0.008 | 14.29 |
| Freq of Gly-Phe < 0.000 and Freq of Ala-Ala < 0.008 and Count of Ile > 11.500 | 14.29 |
| Freq of Gly-Phe < 0.000 and Count of Asp-Lys > 1.500 and Freq of Ala-Ala < 0.008 | 14.29 |
| Count of Ala-Ile < 1.500 and Freq of Ala-Ala < 0.008 and Count of Ile > 11.500 | 14.29 |
| Count of Ala-Ile < 1.500 and Count of Ile-Ile > 0.500 and Freq of Ala-Ala < 0.008 | 14.29 |
| Count of Ala-Ile < 1.500 and Count of Asp-Lys > 1.500 and Freq of Ala-Ala < 0.008 | 14.29 |
| Count of Ile < 25.500 and Freq of Ala-Ala < 0.008 and Count of Ile > 11.500 | 14.29 |
| Count of Ile < 25.500 and Count of Ile-Ile > 0.500 and Freq of Ala-Ala < 0.008 | 14.29 |
| Count of Ile < 25.500 and Count of Asp-Lys > 1.500 and Freq of Ala-Ala < 0.008 | 14.29 |
| Count of Asp-Lys > 2.500 and Count of Asp-Tyr > 1.500 | 28.57 |
| Count of Asp-Lys > 2.500 and Freq of Ile < 0.066 and Freq of Ala-Ala < 0.004 | 28.57 |
| Count of Ala-Ile > 2.500 and Freq of Asp-Leu < 0.006 | 28.57 |
| Count of Ala-Gly > 0.500 and Freq of Ile > 0.059 | 28.57 |
| Count of Ala-Gly > 0.500 and Freq of Asp < 0.059 and Freq of Asp > 0.044 | 28.57 |
| Count of Ala-Gly > 0.500 and Count of Cys > 14.000 and Freq of Ala-Leu < 0.006 | 28.57 |
| Freq of Ile > 0.059 and Count of Ala-Gly > 0.500 | 28.57 |
| Count of Ile > 27.000 and Count of Gly-Ile < 5.500 and Count of Tyr-Pro > 0.500 | 28.57 |
| Count of Ile > 27.000 and Count of Gly-Phe < 3.500 and Count of Phe-Lys > 1.500 | 28.57 |
| Count of Ile > 27.000 and Count of Phe-Lys < 4.000 and Count of Cys-Val > 1.500 | 28.57 |
| Count of Ile > 27.000 and Count of Phe-Asp < 4.000 and Freq of Ala-Ala < 0.004 | 28.57 |
| Count of Ile > 27.000 and Count of Glu-Asn < 3.500 and Count of Cys-Val > 1.500 | 28.57 |
| Count of Ile > 27.000 and Count of Asp-Arg < 4.500 and Count of Glu-Asn > 1.500 | 28.57 |
| Count of Ile > 27.000 and Count of Asp-Lys < 6.500 and Count of Cys-Val > 1.500 | 28.57 |
| Count of Ile > 27.000 and Count of Cys-Val < 2.500 and Freq of Ala-Ala < 0.004 | 28.57 |
| Count of Ile > 27.000 and Count of Ala-Ile < 7.000 and Count of Cys-Val > 1.500 | 28.57 |
| Count of Ile > 27.000 and Freq of Asp < 0.056 and Count of Asn-Arg > 1.500 | 28.57 |
| Count of Ile > 27.000 and Count of Phe < 64.500 and Count of Cys-Val > 1.500 | 28.57 |
| Count of Ile > 27.000 and Count of Cys < 28.500 and Freq of Ala-Ala < 0.004 | 28.57 |
| Freq of Met-Ser < 0.000 and Freq of Lys-His < 0.002 and Count of Thr-Ile > 0.500 | 19.05 |
| Freq of Met-Ser < 0.000 and Freq of Phe-Met < 0.002 and Count of Thr-Ile > 0.500 | 19.05 |
| Freq of Met-Ser < 0.000 and Freq of Asp-Leu < 0.013 and Count of Asn-Arg < 0.500 | 19.05 |
| Freq of Met-Ser < 0.000 and Freq of Asp-Ala (1) < 0.014 and Count of Asn-Arg < 0.500 | 19.05 |
| Freq of Met-Ser < 0.000 and Freq of Asp-Asp < 0.004 and Count of Thr-Ile > 0.500 | 19.05 |
| Freq of Met-Ser < 0.000 and Count of Thr-Ile > 0.500 and Count of Asn-Arg < 0.500 | 19.05 |
| Freq of Met-Ser < 0.000 and Count of Asn-Arg < 0.500 and Count of Cys > 1.500 | 19.05 |
| Freq of Met-Ser < 0.000 and Count of Leu-Ile > 0.500 and Count of Asn-Arg < 0.500 | 19.05 |
| Freq of Met-Ser < 0.000 and Freq of Asp < 0.070 and Count of Asn-Arg < 0.500 | 19.05 |
| Freq of Cys-Tyr < 0.000 and Freq of Lys-His < 0.002 and Count of Thr-Ile > 0.500 | 19.05 |
| Freq of Cys-Tyr < 0.000 and Freq of Phe-Met < 0.002 and Count of Cys > 1.500 | 19.05 |
| Freq of Cys-Tyr < 0.000 and Freq of Asp-Leu < 0.013 and Count of Asn-Arg < 0.500 | 19.05 |
| Freq of Cys-Tyr < 0.000 and Freq of Asp-Ala (1) < 0.014 and Count of Asn-Arg < 0.500 | 19.05 |
| Freq of Cys-Tyr < 0.000 and Freq of Asp-Asp < 0.004 and Count of Thr-Ile > 0.500 | 19.05 |
| Freq of Cys-Tyr < 0.000 and Count of Thr-Ile > 0.500 and Count of Asn-Arg < 0.500 | 19.05 |
| Freq of Cys-Tyr < 0.000 and Count of Asn-Arg < 0.500 and Count of Cys > 1.500 | 19.05 |
| Freq of Cys-Tyr < 0.000 and Count of Leu-Ile > 0.500 and Count of Asp-Arg < 0.500 | 19.05 |
| Freq of Cys-Tyr < 0.000 and Freq of Asp < 0.070 and Count of Asn-Arg < 0.500 | 19.05 |
| Count of Met-Ser < 0.500 and Freq of Ala-Leu > 0.008 | 19.05 |
| Count of Met-Ser < 0.500 and Count of Asn-Arg < 0.500 and Freq of Ala-Leu > 0.008 | 19.05 |
| Count of Met-Ser < 0.500 and Freq of Asp < 0.070 and Freq of Ala-Leu > 0.008 | 19.05 |
| Count of Asp-Lys > 2.500 and Count of Asp-Arg < 4.500 and Freq of Ile > 0.048 | 23.81 |
| Count of Asp-Lys > 2.500 and Count of Ala-Ile < 7.000 and Count of Leu-Ile > 3.500 | 23.81 |
| Count of Asp-Lys > 2.500 and Count of Ala-Gly < 3.500 and Freq of Ala-Ala < 0.004 | 23.81 |
| Count of Asp-Lys > 2.500 and Freq of Ile < 0.066 and Count of Asp-Tyr > 1.500 | 23.81 |
| Count of Asp-Lys > 2.500 and Freq of Asp < 0.056 and Freq of Asp > 0.044 | 23.81 |
| Count of Asp-Lys > 2.500 and Count of Ile < 85.500 and Freq of Ile > 0.048 | 23.81 |
| Count of Asp-Lys > 2.500 and Count of Phe < 64.500 and Count of Leu-Ile > 3.500 | 23.81 |
| Count of Asp-Lys > 2.500 and Count of Cys > 14.000 and Freq of Ala-Leu < 0.006 | 23.81 |
| Count of Cys-Val > 1.500 and Count of Cys > 19.000 | 23.81 |
| Count of Ala-Ile > 2.500 and Count of Val-Tyr > 2.000 | 23.81 |
| Count of Ala-Ile > 2.500 and Count of Val-Ala < 9.500 and Freq of Asp-Leu < 0.006 | 23.81 |
| Count of Ala-Ile > 2.500 and Count of Thr-Tyr < 2.500 and Freq of Asp-Leu < 0.006 | 23.81 |
| Count of Ala-Ile > 2.500 and Count of Thr-Ile < 6.500 and Freq of Asp-Leu < 0.006 | 23.81 |
| Count of Ala-Ile > 2.500 and Count of Ser-Val < 8.000 and Freq of Asp-Leu < 0.006 | 23.81 |
| Count of Ala-Ile > 2.500 and Count of Ser-Phe < 5.500 and Freq of Asp-Leu < 0.006 | 23.81 |
Figure 2Percentage of correctness and wrongness in various decision tree models, in datasets without feature selection (a) and with feature selection (b); showing C5.0 model had the best performance, followed by CR&T, CHAID, and Quest models