| Literature DB >> 29940833 |
Dimitri Kagaris1, Alireza Khamesipour2, Constantin T Yiannoutsos3.
Abstract
BACKGROUND: The Top Scoring Pair (TSP) classifier, based on the concept of relative ranking reversals in the expressions of pairs of genes, has been proposed as a simple, accurate, and easily interpretable decision rule for classification and class prediction of gene expression profiles. The idea that differences in gene expression ranking are associated with presence or absence of disease is compelling and has strong biological plausibility. Nevertheless, the TSP formulation ignores significant available information which can improve classification accuracy and is vulnerable to selecting genes which do not have differential expression in the two conditions ("pivot" genes).Entities:
Keywords: AUC; Breast cancer; Colon cancer; Diffuse large B-Cell lymphoma; Gene expression; Gene selection; Leukemia; Microarray data analysis; Ovarian cancer; Prostate cancer; Receiver operating characteristic (ROC) curve
Mesh:
Substances:
Year: 2018 PMID: 29940833 PMCID: PMC6020231 DOI: 10.1186/s12859-018-2231-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Gene expression levels in two genes
| Healthy | Diseased | Healthy | Diseased | ||||
|---|---|---|---|---|---|---|---|
| Gene | Gene | Gene | Gene | Gene | Gene | Gene | Gene |
| 11 | 12 | 32 | 31 | 10 | 20 | 42 | 31 |
| 21 | 22 | 34 | 33 | 12 | 23 | 43 | 33 |
| 23 | 24 | 36 | 35 | 15 | 25 | 45 | 35 |
| 25 | 26 | 38 | 37 | 17 | 27 | 47 | 37 |
| 27 | 28 | 40 | 39 | 19 | 18 | 39 | 41 |
Simulation results on estimation of P(X>Y) by TSP and AUCTSP
| Gene X | Gene Y |
| TSP | AUCTSP |
|
|---|---|---|---|---|---|
| N(1,1) | N(0,1) | 10 | 0.763 |
| 0.760 |
| 20 | 0.762 |
| 0.760 | ||
| 30 | 0.759 |
| 0.760 | ||
| 40 | 0.759 |
| 0.760 | ||
| N(1,3) | N(0,3) | 10 | 0.595 |
| 0.592 |
| 20 | 0.594 |
| 0.592 | ||
| 30 | 0.594 |
| 0.592 | ||
| 40 | 0.593 |
| 0.592 | ||
| N(5,1) | N(0,1) | 10 |
|
| 0.999 |
| 20 |
|
| 0.999 | ||
| 30 |
|
| 0.999 | ||
| 40 |
|
| 0.999 | ||
| N(5,3) | N(0,3) | 10 | 0.883 |
| 0.878 |
| 20 | 0.881 |
| 0.878 | ||
| 30 | 0.880 |
| 0.878 | ||
| 40 | 0.880 |
| 0.878 | ||
| N(1,1) | N(1,3) | 10 | 0.619 |
| 0.500 |
| 20 | 0.587 |
| 0.500 | ||
| 30 | 0.572 |
| 0.500 | ||
| 40 | 0.563 |
| 0.500 | ||
| N(5,1) | N(5,3) | 10 | 0.616 |
| 0.500 |
| 20 | 0.585 |
| 0.500 | ||
| 30 | 0.570 |
| 0.500 | ||
| 40 | 0.559 |
| 0.500 |
The estimates of P(X>Y) closer to A are marked in bold
Simulation results for the ability of AUCTSP and TSP to identify the most informative gene pair
| Gene 1 | Gene 2 | N=100 | N=100 | N=200 | N=200 | ||||
|---|---|---|---|---|---|---|---|---|---|
| TSP | AUCTSP | TSP | AUCTSP | TSP | AUCTSP | TSP | AUCTSP | ||
| NH(0,1) ND(1,1) | NH(1,1) ND(0,1) | 23.4 | 51.2 | 58.8 | 93.2 | 15.4 | 39.8 | 45.4 | 89.7 |
| NH(-1,1) ND(1,1) | NH(1,1) ND(-1,1) | 69.1 | 98.9 | 97.7 | 99.9 | 57.8 | 97.2 | 94.0 | 99.9 |
| NH(-2,1) ND(2,1) | NH(2,1) ND(-2,1) | 91.6 | 99.9 | 97.6 | 99.9 | 92.7 | 99.8 | 95.7 | 99.9 |
| NH(-2,2) ND(2,2) | NH(2,2) ND(-2,2) | 48.2 | 93.2 | 80.2 | 99.9 | 38.3 | 91.4 | 71.4 | 99.9 |
Top scoring pairs of genes under TSP and AUCTSP
| Score | ||||
|---|---|---|---|---|
| Dataset | Method | Gene pair | TSP | AUCTSP |
| OVARIAN | TSP | [PKM2, OVGP1] | 0.900 | 0.675 |
| AUCTSP | [IRS1, OVGP1] | 0.833 | 0.826 | |
| LEUKEMIA | TSP | [SPTAN1, CD33] | 0.979 | 0.938 |
| TSP | [ARHGAP45, ZYX] | 0.979 | 0.770 | |
| TSP | [PCDHGC3, ZYX] | 0.979 | 0.855 | |
| AUCTSP | [SPTAN1, CD33] | 0.979 | 0.938 | |
| BREAST-ER | TSP | [MUC2, ESR1] | 0.918 | 0.812 |
| TSP | [JAK3, ESR1] | 0.918 | 0.791 | |
| TSP | [GNB3, ESR1] | 0.918 | 0.804 | |
| TSP | [HARS2, ESR1] | 0.918 | 0.834 | |
| TSP | [ERF, ESR1] | 0.918 | 0.822 | |
| AUCTSP | [CTSC, ESR1] | 0.878 | 0.891 | |
| BREAST-LN | TSP | [BP1CR, GYPB] | 0.838 | 0.675 |
| AUCTSP | [BP1CR, KRT31] | 0.717 | 0.765 | |
| TSP | [FABP3, ACVR1B]b | 0.716 | 0.531 | |
| AUCTSP | [GYPB, ACVR1B]b | 0.633 | 0.615 | |
| DLBCL | TSP | [PDE4B, GPR12] | 0.596 | 0.414 |
| AUCTSP | [POLR2J, PTGER4] | 0.341 | 0.46 | |
| DLBCL-FL | TSP | [YWHAZ, SNRPB] | 0.983 | 0.727 |
| AUCTSP | [FCGR1A, NEO1] | 0.759 | 0.83 | |
| COLON | TSP | [VIP, DARS] | 0.879 | 0.637 |
| AUCTSP | [MYH9, HNRNPA1] | 0.759 | 0.724 | |
| PROSTATE | TSP | [CFD, ENO1] | 0.901 | 0.693 |
| AUCTSP | [CFD, NUMB] | 0.882 | 0.883 | |
aindicates the selected TSP gene pair by [7] to break the tie for pairs with equal TSP scores
bindicates the selected pair of genes by TSP and AUCTSP after removing the genetically modified gene BP1CR (see [32, 33]) from the dataset
Gene legend
| Data set | Gene ID | Gene acronym | Gene description |
|---|---|---|---|
| OVARIAN | g47 | IRS1 | Insulin Receptor Substrate 1 |
| g93 | OVGP1 | Oviductal Glycoprotein 1 | |
| g1202 | PKM2 | Pyruvate Kinase, Muscle | |
| LEUKEMIA | D86976 | ARHGAP45 | Rho GTPase Activating Protein 45 |
| J05243 | SPTAN1 | Spectrin Alpha, Non-Erythrocytic 1 | |
| L11373 | PCDHGC3 | Protocadherin Gamma Subfamily C, 3 | |
| M23197 | CD33 | CD33 Molecule | |
| X95735 | ZYX | Zyxin | |
| BREAST-ER | L21998 | MUC2 | Mucin 2 |
| U09607 | JAK3 | Janus Kinase 3 | |
| U15655 | ERF | ETS2 Repressor Factor | |
| U18937 | HARS2 | Histidyl-TRNA Synthetase 2, Mitochondrial | |
| U47931 | GNB3 | G Protein Subunit Beta 3 | |
| X03635 | ESR1 | Estrogen Receptor 1 | |
| X87212 | CTSC | Cathepsin C | |
| BREAST-LN | AFFX-CreX-3 | BP1CR | Bacteriophage P1 Cre Recombinase |
| X82634 | KRT31 | Keratine 31 | |
| J02982 | GYPB | Glycophorin B | |
| M18079 | FABP3 | Fatty Acid Binding Protein 3 | |
| X15357 | ACVR1B | Activin A Receptor Type 1B | |
| DLBCL | K03008 | POLR2J | RNA Polymerase II Subunit J |
| L20971 | PDE4B | Phosphodiesterase 4B | |
| L28175 | PTGER4 | Prostaglandin E Receptor 4 | |
| U18548 | GPR12 | G Protein-Coupled Receptor 12 | |
| DLBCL-FL | D78134 | YWHAZ | Tyrosine 3-Monooxygenase/ Tryptophan 5-Monooxygenase Activation Protein Zeta |
| M63835 | FCGR1A | Fc Fragment Of IgG Receptor Ia | |
| U61262 | NEO1 | Neogenin 1 | |
| X17567 | SNRPB | Small Nuclear Ribonucleoprotein Polypeptides B and B1 | |
| COLON | Hsa.37937 | MYH9 | Myosin Heavy Chain 9 |
| Hsa.8010 | HNRNPA1 | Heterogeneous Nuclear Ribonucleoprotein A1 | |
| Hsa.2097 | VIP | Vasoactive Intestinal Peptide | |
| Hsa.601 | DARS | Aspartyl-TRNA Synthetase | |
| PROSTATE | 40282_s_at | CFD | Complement Factor D |
| 2035_s_at | ENO1 | Enolase 1 | |
| 37693_at | NUMB | NUMB, Endocytic Adaptor Protein |
Deviation of the genes selected by TSP and AUCTSP from the non-informative “pivot” gene
| Dataset | Method | Gene Pair | ( | ( |
| OVARIAN | TSP | (PKM2, OVGP1) | (0.16, 0.03) | (0.84, 0.97) |
| AUCTSP | (IRS1, OVGP1) | (0.84, 0.03) | (0.84, 0.97) | |
| LEUKEMIA | TSP | (SPTAN1, CD33)a | (0.05, 0.99) | (0.95, 0.99) |
| TSP | (ARHGAP45, ZYX) | (0.61, 0.02) | (0.61, 0.98) | |
| TSP | (PCDHGC3, ZYX) | (0.63, 0.02) | (0.63, 0.98) | |
| AUCTSP | (SPTAN1, CD33) | (0.95, 0.01) | (0.95, 0.99) | |
| BREAST-ER | TSP | (MUC2, ESR1)a | (0.72, 0.04) | (0.72, 0.96) |
| TSP | (JAK3, ESR1) | (0.66, 0.04) | (0.66, 0.96) | |
| TSP | (GNB3, ESR1) | (0.56, 0.04) | (0.56, 0.96) | |
| TSP | (HARS2, ESR1) | (0.57, 0.04) | (0.57, 0.96) | |
| TSP | (ERF, ESR1) | (0.58, 0.04) | (0.58, 0.96) | |
| AUCTSP | (CTSC, ESR1) | (0.91, 0.04) | (0.91, 0.96) | |
| BREAST-LN | TSP | (FABP3, ACVR1B) | (0.60, 0.69) | (0.60, 0.69) |
| AUCTSP | (GYPB, ACVR1B) | (0.14, 0.69) | (0.86, 0.69) | |
| DLBCL | TSP | (PDE4B, GPR12) | (0.73, 0.32) | (0.73, 0.68) |
| AUCTSP | (POLR2J, PTGER4) | (0.30, 0.72) | (0.70, 0.72) | |
| DLBCL-FL | TSP | (YWHAZ, SNRPB) | (0.80, 0.10) | (0.80, 0.90) |
| AUCTSP | (FCGR1A, NEO1) | (0.06, 0.84) | (0.94, 0.84) | |
| COLON | TSP | (VIP, DARS) | (0.82, 0.16) | (0.82, 0.84) |
| AUCTSP | (MYH9, HNRNPA1) | (0.89, 0.24) | (0.89, 0.76) | |
| PROSTATE | TSP | (CFD, ENO1) | (0.91, 0.27) | (0.91, 0.73) |
| AUCTSP | (CFD, NUMB) | (0.91, 0.04) | (0.91, 0.96) |
aindicates the selected TSP gene pair by [7] to break the tie for pairs with equal TSP scores
Comparison of classifier accuracy by TSP and AUCTSP for decreasing size of training set
| Test set fraction | OVARIAN | LEUKEMIA | COLON | BREAST-LN | BREAST-ER | DLBCL | DLBCL-FL | PROSTATE | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TSP | AUCTSP | TSP | AUCTSP | TSP | AUCTSP | TSP | AUCTSP | TSP | AUCTSP | TSP | AUCTSP | TSP | AUCTSP | TSP | AUCTSP | |
| 1% | 87.18 | 93.39 | 97.89 | 97.89 | 88.98 | 96.59 | 89.76 | 94.66 | 84.26 | 91.07 | 78.50 | 78.88 | 95.80 | 99.30 | 91.90 | 91.90 |
| 5% | 87.48 | 89.43 | 96.02 | 96.12 | 84.45 | 92.45 | 86.03 | 89.35 | 75.40 | 84.11 | 78.20 | 78.50 | 91.46 | 96.23 | 90.70 | 90.50 |
| 10% | 77.43 | 82.78 | 91.64 | 92.27 | 76.76 | 95.01 | 89.76 | 94.66 | 84.26 | 91.06 | 77.20 | 78.02 | 83.18 | 92.49 | 81.34 | 80.37 |
| 15% | 76.96 | 79.7 | 88.2 | 90.9 | 72.71 | 73.02 | 77.85 | 78.6 | 65.84 | 75.07 | 72.84 | 76.73 | 83.02 | 87.57 | 79.10 | 79.50 |
| 20% | 70.71 | 73.95 | 84.32 | 89.1 | 61.39 | 79.15 | 86.03 | 89.35 | 75.39 | 84.10 | 69.23 | 75.35 | 71.30 | 75.45 | 68.70 | 76.06 |
| 25% | 72.2 | 76.6 | 81.27 | 87 | 53.75 | 67.65 | 82.05 | 85.48 | 71.20 | 80.80 | 66.79 | 72.11 | 66.87 | 67.14 | 63.30 | 74.35 |
| 30% | 61.15 | 80.38 | 77.53 | 81.1 | 41.38 | 42.39 | 77.85 | 78.6 | 65.84 | 75.06 | 63.41 | 72.13 | 67.35 | 66.74 | 53.30 | 60.7 |
Fig. 1Comparison of TSP vs. AUCTSP classification accuracy for different sizes of training sets: OVARIAN dataset
Fig. 2Comparison of TSP vs. AUCTSP classification accuracy for different sizes of training sets: COLON dataset
Fig. 3Comparison of TSP vs. AUCTSP classification accuracy for different sizes of training sets: LEUKEMIA dataset
Fig. 4Comparison of TSP vs. AUCTSP classification accuracy for different sizes of training sets: BREAST-LN dataset
Fig. 5Comparison of TSP vs. AUCTSP classification accuracy for different sizes of training sets: BREAST-ER dataset
Fig. 6Comparison of TSP vs. AUCTSP classification accuracy for different sizes of training sets: DLBCL-FL dataset
Fig. 7Comparison of TSP vs. AUCTSP classification accuracy for different sizes of training sets: DLBCL dataset
Fig. 8Comparison of TSP vs. AUCTSP classification accuracy for different sizes of training sets: PROSTATE dataset