| Literature DB >> 28361698 |
Yang Yang1,2,3, Ning Huang4, Luning Hao4, Wei Kong5.
Abstract
BACKGROUND: MicroRNAs (miRNAs) have great potential serving as tumor biomarkers and therapeutic targets. As the rapid development of high-throughput experimental technology, gene expression experiments have become more and more specialized and diversified. The complex data structure has brought great challenge for the identification of biomarkers. In the meantime, current statistical and machine learning methods for detecting biomarkers have the problem of low reliability and biased criteria.Entities:
Keywords: Biomarker; Clustering; MicroRNA
Mesh:
Substances:
Year: 2017 PMID: 28361698 PMCID: PMC5374636 DOI: 10.1186/s12864-017-3498-8
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Flowchart of the feature extraction method
Sample statistics
| Characteristics | GSE22220 | GSE40525 |
|---|---|---|
| Grading | ||
| G1 | 42 | 3 |
| G2 | 87 | 31 |
| G3 | 65 | 27 |
| Nodal status | ||
| N0 | 127 | 29 |
| N+ | 92 | 32 |
| Estrogen receptor | ||
| Positive | 135 | 47 |
| Negative | 84 | 27 |
Comparison of four selection criteria on GSE22220
| Feature combination | Performance measure | Selection criteria | |||
|---|---|---|---|---|---|
| FDA | T-test | SNR | Centera | ||
| Pair | AvgRank 10 | 7.6 | 8.2 | 96.7 | 112.2 |
| AvgRank 100 | 82.9 | 84.4 | 412.3 | 557.5 | |
| AvgRank 1000 | 1586.5 | 1624.4 | 3248.7 | 3355.7 | |
| HitRatio 10(%) | 80.0 | 70.0 | 0 | 0 | |
| HitRatio 100(%) | 59.0 | 58.0 | 6.0 | 4.0 | |
| HitRatio 1000(%) | 33.0 | 32.0 | 16.0 | 11.0 | |
| Triple | AvgRank 10 | 9.9 | 8.6 | 333.1 | 333.3 |
| AvgRank 100 | 93.4 | 94.4 | 2607.6 | 2270.7 | |
| AvgRank 1000 | 1612.2 | 1684.3 | 13833.3 | 14626.0 | |
| HitRatio 10(%) | 60.0 | 50.0 | 10.0 | 0 | |
| HitRatio 100(%) | 58.0 | 58.0 | 2.0 | 1.0 | |
| HitRatio 1000(%) | 28.0 | 27.1 | 1.3 | 1.5 | |
| Quadruple | AvgRank 10 | 12.8 | 12.8 | 41.7 | 744.9 |
| AvgRank 100 | 115.4 | 108.5 | 408.2 | 4938.1 | |
| AvgRank 1000 | 1482.3 | 1562.6 | 10491.6 | 39605.2 | |
| HitRatio 10(%) | 40.0 | 40.0 | 50.0 | 0 | |
| HitRatio 100(%) | 50.0 | 45.0 | 24.0 | 0 | |
| HitRatio 1000(%) | 26.2 | 24.4 | 8.4 | 0.5 | |
aCenter denotes the method using center gene as the representative member
Comparison of four selection criteria on GSE40525
| Feature combination | Performance measure | Selection criteria | |||
|---|---|---|---|---|---|
| FDA | T-test | SNR | Center | ||
| Pair | AvgRank 10 | 9.1 | 9.1 | 39.1 | 57 |
| AvgRank 100 | 103.6 | 103.0 | 335.6 | 402.6 | |
| AvgRank 1000 | 1469.8 | 1470.5 | 2085.6 | 2683.6 | |
| HitRatio 10(%) | 40.0 | 40.0 | 10.0 | 0 | |
| HitRatio 100(%) | 64.0 | 64.0 | 18.0 | 12.0 | |
| HitRatio 1000(%) | 39.0 | 39.2 | 24.4 | 15.8 | |
| Triple | AvgRank 10 | 26.3 | 26.3 | 262.8 | 360.2 |
| AvgRank 100 | 229.4 | 229.4 | 1427.0 | 1737.2 | |
| AvgRank 1000 | 1573.9 | 1577.2 | 9085.5 | 11949.7 | |
| HitRatio 10(%) | 20.0 | 20.0 | 0 | 0 | |
| HitRatio 100(%) | 66.0 | 66.0 | 6.0 | 4.0 | |
| HitRatio 1000(%) | 37.0 | 36.8 | 4.6 | 2.8 | |
| Quadruple | AvgRank 10 | 174 | 174 | 229 | 191 |
| AvgRank 100 | 273 | 273 | 2610.9 | 1906.5 | |
| AvgRank 1000 | 2836.0 | 2826.5 | 2926.7 | 2965.3 | |
| HitRatio 10(%) | 40.0 | 40.0 | 20.0 | 60.0 | |
| HitRatio 100(%) | 4.0 | 4.0 | 2.6 | 6.0 | |
| HitRatio 1000(%) | 19.3 | 19.0 | 2.6 | 4.8 | |
Fig. 2AvgRank of top 100 lists obtained by the four methods for GSE22220
Fig. 3HitRatio of top 100 lists obtained by the four methods for GSE22220
Fig. 4MSL curve of the initial clusters of GSE40525
Comparison of clustering methods on two data sets
| Feature combination | Performance measure | GSE22220 | GSE40525 | ||
|---|---|---|---|---|---|
| HCb | RCa | HCb | RCa | ||
| Pair | AvgRank 10 | 7 | 8.2 | 8.5 | 9.1 |
| AvgRank 100 | 92.2 | 84.4 | 180.9 | 103.0 | |
| AvgRank 1000 | 2003.1 | 1624.4 | 10696.8 | 1470.5 | |
| HitRatio 10(%) | 70.0 | 70.0 | 60.0 | 40.0 | |
| HitRatio 100(%) | 54.0 | 58.0 | 32.0 | 64.0 | |
| HitRatio 1000(%) | 30.0 | 32.0 | 15.0 | 39.2 | |
| Triple | AvgRank 10 | 8.2 | 8.6 | 9.2 | 26.3 |
| AvgRank 100 | 95.4 | 94.3 | 68.7 | 229.4 | |
| AvgRank 1000 | 1776.2 | 1684.3 | 3675.2 | 1577.2 | |
| HitRatio 10(%) | 60.0 | 60.0 | 20.0 | 20.0 | |
| HitRatio 100(%) | 58.0 | 58.0 | 71.0 | 66.0 | |
| HitRatio 1000(%) | 27.1 | 27.1 | 30.3 | 36.8 | |
| Quadruple | AvgRank 10 | 14.2 | 12.8 | 9.0 | 174.0 |
| AvgRank 100 | 112.6 | 108.5 | 257.2 | 273.0 | |
| AvgRank 1000 | 1639.1 | 1482.3 | 3171.3 | 2826.6 | |
| HitRatio 10(%) | 30.0 | 40.0 | 78.0 | 40.0 | |
| HitRatio 100(%) | 48.0 | 50.0 | 12.0 | 4.0 | |
| HitRatio 1000(%) | 23.2 | 26.4 | 16.0 | 19.3 | |
aRC: refined clustering, in which the inconsistency coefficient for raw clusters and thresholds of MSL and MLR are fixed
bHC: hierarchical clustering, which performs the best by trying different inconsistency coefficients
Comparison of feature selection methods on two data setsa
| Methods | Feature # | GSE22220 | GSE40525 | ||||
|---|---|---|---|---|---|---|---|
| Sensitivity | Specificity | TA | Sensitivity | Specificity | TA | ||
| CFS | 29/6 | 0.984 | 0.744 | 0.783 | 0.942 | 0.925 | 0.933 |
| BFS | 4/3 | 0.953 | 0.733 | 0.758 | 0.904 | 0.904 | 0.904 |
|
| 2 | 0.976 | 0.701 | 0.729 | 0.923 | 0.906 | 0.913 |
| 3 | 0.913 | 0.753 | 0.763 | 0.923 | 0.923 | 0.923 | |
| 4 | 0.945 | 0.764 | 0.787 | 0.923 | 0.923 | 0.923 | |
| Consistency | 13/5 | 0.953 | 0.771 | 0.797 | 0.942 | 0.925 | 0.933 |
| IG | 2 | 0.976 | 0.701 | 0.729 | 0.827 | 0.896 | 0.865 |
| 3 | 0.913 | 0.753 | 0.763 | 0.942 | 0.925 | 0.933 | |
| 4 | 0.945 | 0.764 | 0.787 | 0.923 | 0.906 | 0.913 | |
| RF | 2 | 0.976 | 0.701 | 0.729 | 0.923 | 0.906 | 0.913 |
| 3 | 0.913 | 0.734 | 0.744 | 0.942 | 0.891 | 0.913 | |
| 4 | 0.953 | 0.747 | 0.773 | 0.942 | 0.907 | 0.923 | |
| t-test | 2 | 0.913 | 0.753 | 0.763 | 0.923 | 0.906 | 0.913 |
| 3 | 0.890 | 0.807 | 0.802 | 0.923 | 0.889 | 0.904 | |
| 4 | 0.890 | 0.837 | 0.826 | 0.942 | 0.891 | 0.913 | |
| Wilcon test | 2 | 0.913 | 0.753 | 0.763 | 0.923 | 0.906 | 0.913 |
| 3 | 0.890 | 0.807 | 0.802 | 0.942 | 0.891 | 0.913 | |
| 4 | 0.937 | 0.793 | 0.812 | 0.942 | 0.925 | 0.933 | |
| CluFDAb | 2 | 0.969 | 0.750 | 0.783 | 0.923 | 0.923 | 0.923 |
| 3 | 0.976 | 0.775 | 0.812 | 0.942 | 0.925 | 0.933 | |
| 4 | 0.906 | 0.833 |
| 0.962 | 0.943 |
| |
aThe numbers before and after ‘/’ denotes feature numbers of GSE22220 and GSE40525, respectively Sensitivity = TP/(TP+FN), Specificity = TN/(FP+TN) TA: total accuracy
bCluFDA denotes the clustering-based feature selection using FDA method for selecting representative miRNAs
Most frequent miRNAs in pairs and triplesa
| GSE22220 | GSE40525 | ||
|---|---|---|---|
| MiRNA |
| MiRNA |
|
|
| 2.09E-10 | hsa-miR-139-5p | 2.37E-24 |
|
| 2.79E-10 | hsa-miR-378 | 7.59E-20 |
|
| 7.01E-09 | hsa-miR-145 | 5.07E-18 |
|
| 1.43E-08 | hsa-miR-125b-2* | 1.53E-14 |
| hsa-miR-577 | 1.02E-07 |
| 1.30E-10 |
|
| 1.51E-07 |
| 1.34E-10 |
| hsa-miR-18a | 1.89E-07 |
| 1.02E-08 |
|
| 2.28E-07 | ||
aMiRNAs that have evidence of association with breast cancer (from HMDD and miR2Disease) are in bold
Most frequent miRNAs in pairs and triples
| MiRNA name | PMID | Description |
|---|---|---|
| hsa-mir-18a | 16754881 | Copy number loss |
| 19684618 | Higher levels of expression in ERalpha-negative tumors | |
| 19624877 | Differentially expressed between breast cancer cells and mammary epithelial cells, highly expressed in MCF-7 cells | |
| 21755340 | Expression was much higher in ERa-negative than in ERa-positive tumors. | |
| hsa-mir-146b | 16461460 | Overexpressed |
| 19190326 | miR-146: Breast cancer metastasis suppressor 1 up-regulates miR-146, which suppresses breast cancer metastasis | |
| 18634034 | miR-146: rs2910164 were associated with increased risk of breast cancer in Chinese women | |
| 21409395 | miR-146b-5p preferentially expressed in normal basal cells | |
| 21472990 | Down-regulation of BRCA1 expression by miR-146a and miR-146b-5p in triple negative sporadic breast cancers. | |
| hsa-mir-149 | 18634034 | miR-149: rs2292832 were associated with increased risk of breast cancer in Chinese women |
| hsa-mir-224 | 21953071 | Down-regulated during lobular neoplasia progression compared to normal epithelium. |
| 22809510 | MicroRNA-224 targets RKIP to control cell invasion and expression of metastasis genes in human breast cancer cells. | |
| hsa-mir-452 | 22353773 | Differentially expressed between serum samples from patients with cancer and serum samples from healthy controls |
| hsa-miR-365 | 18812439 | Up-regulated greater than 2-fold in BC compared with NAT, potential target genes include members of RAS oncogenes. |
| hsa-mir-340 | 21225860 | Inhibition of breast cancer cell migration and invasion through targeting of oncoprotein c-Met |
| 21692045 | Inhibites breast cancer cell migration and invasion through targeting of oncoprotein c-Met. | |
| hsa-mir-100 | 21634028 | Regulates beta-tubulin isotypes in MCF7 breast cancer cells. |
| 22926517 | Suppresses IGF2 and inhibits breast tumorigenesis by interfering with proliferation and survival signaling. | |
| hsa-mir-141 | 18376396 | Downregulated |
| 22952344 | CTC (circulating tumour cells)-positive had significantly higher levels of miR-141 than CTC-negative MBC and controls. |
Accuracies of different feature subsetsa
| Feature subset | Accuracy measure | GSE22220 | GSE40525 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| T-test | FDA | SNR | Center | T-test | FDA | SNR | Center | ||
| Alla | Sensitivity | 0.874 | 0.882 | 0.921 | 0.906 | 0.962 | 0.962 | 0.962 | 0.942 |
| Specificity | 0.804 | 0.794 | 0.770 | 0.762 | 0.806 | 0.781 | 0.806 | 0.817 | |
| TA | 0.792 | 0.787 | 0.783 | 0.768 | 0.865 | 0.846 | 0.865 | 0.865 | |
| Pair | Sensitivity | 0.969 | 0.969 | 0.929 | 0.984 | 0.923 | 0.923 | 0.885 | 0.846 |
| Specificity | 0.750 | 0.750 | 0.756 | 0.714 | 0.923 | 0.923 | 0.920 | 0.917 | |
| TA | 0.783 | 0.783 | 0.773 | 0.749 | 0.923 | 0.923 | 0.904 | 0.885 | |
| Triple | Sensitivity | 0.976 | 0.976 | 0.850 | 0.984 | 0.942 | 0.942 | 0.942 | 0.904 |
| Specificity | 0.775 | 0.775 | 0.812 | 0.714 | 0.925 | 0.925 | 0.925 | 0.922 | |
| TA | 0.812 | 0.812 | 0.787 | 0.749 | 0.933 | 0.933 | 0.933 | 0.913 | |
| Quadruple | Sensitivity | 0.906 | 0.906 | 0.906 | 0.890 | 0.942 | 0.942 | 0.887 | 0.923 |
| Specificity | 0.833 | 0.833 | 0.821 | 0.819 | 0.961 | 0.961 | 0.940 | 0.960 | |
| TA |
|
| 0.821 | 0.812 |
|
| 0.915 | 0.942 | |
aAll: the full set of representative miRNAs selected from clusters Sensitivity = TP/(TP+FN), Specificity = TN/(FP+TN) TA: total accuracy