| Literature DB >> 35590258 |
Ying Liang1, Ze-Qun Zhang1, Nian-Nian Liu1, Ya-Nan Wu1, Chang-Long Gu2, Ying-Long Wang3.
Abstract
BACKGROUND: Many long non-coding RNAs (lncRNAs) have key roles in different human biologic processes and are closely linked to numerous human diseases, according to cumulative evidence. Predicting potential lncRNA-disease associations can help to detect disease biomarkers and perform disease analysis and prevention. Establishing effective computational methods for lncRNA-disease association prediction is critical.Entities:
Keywords: Attention mechanism; Convolutional neural network; Graph convolutional network; LncRNA-disease associations; Multi-view; Stacking ensemble model
Mesh:
Substances:
Year: 2022 PMID: 35590258 PMCID: PMC9118755 DOI: 10.1186/s12859-022-04715-w
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1Performance of MAGCNSE using different parameters. (a) Comparison of the AUC values under different GCN embedding sizes. (b) Comparison of the AUC values under different number of filters in CNN. (c) Comparison of the AUC values under different number of GCN layers. (d) Comparison of the AUC values under different number of base classifiers
Fig. 2ROC curves (a) and PR curves (b) of MAGCNSE and its variants
Comparison of the evaluation metrics between MAGCNSE and its four variants
| Method | Accuracy | Sensitivity | Specificity | Precision | MCC | |
|---|---|---|---|---|---|---|
| MAGCNSE-fgl | 0.9029 | 0.9013 | 0.9043 | 0.8984 | 0.8998 | 0.8056 |
| MAGCNSE-natt | 0.9013 | 0.9068 | 0.8959 | 0.8952 | 0.901 | 0.8026 |
| MAGCNSE-nattcnn | 0.8885 | 0.9003 | 0.8783 | 0.8647 | 0.8822 | 0.7771 |
| MAGCNSE-ncnn | 0.9013 | 0.896 | 0.907 | 0.9128 | 0.9043 | 0.8025 |
| MAGCNSE |
The bold number is the highest value of each column and its clarifies the superiority of our model
Fig. 3ROC curves (a) and PR curves (b) of MAGCNSE and traditional ML classifiers
Comparison of the evaluation metrics between MAGCNSE and six traditional machine learning classifiers
| Method | Accuracy | Sensitivity | Specificity | Precision | MCC | |
|---|---|---|---|---|---|---|
| RandonForest | 0.8945 | 0.877 | 0.9120 | 0.9089 | 0.8926 | 0.7896 |
| ExtraTrees | 0.8958 | 0.8859 | 0.9057 | 0.9042 | 0.8948 | 0.7921 |
| XGBoost | 0.9076 | 0.9101 | 0.9050 | 0.9056 | 0.9078 | 0.8153 |
| LightGBM | 0.9037 | 0.9031 | 0.9044 | 0.9052 | 0.9036 | 0.8085 |
| CatBoost | 0.9108 | 0.9146 | 0.9070 | 0.9079 | 0.9111 | 0.8218 |
| LogisticRegression | 0.8652 | 0.8470 | 0.8834 | 0.8792 | 0.8627 | 0.7312 |
| MAGCNSE |
The bold number is the highest value of each column and its clarifies the superiority of our model
Comparison of the AUC values and AUPR values of MAGCNSE using GCN and other graph models
| Method | GAT | GraphSAGE | GCN |
|---|---|---|---|
| AUC | 0.9668 | 0.9713 | |
| AUPR | 0.9713 | 0.9723 | |
| Accuracy | 0.9045 | 0.9188 | |
| Sensitivity | 0.8929 | 0.9192 | |
| Specificity | 0.9156 | 0.9142 | |
| Precision | 0.9106 | 0.9202 | |
| 0.9016 | 0.9217 | ||
| MCC | 0.8089 | 0.8374 |
Fig. 4ROC curves (a) and PR curves (b) of MAGCNSE and other state-of-the-art methods
Fig. 5AUC values and AUPR values of MAGCNSE using different views
The top 10 predicted colon cancer-associated lncRNAs
| Rank | lncRNA name | Evidence |
|---|---|---|
| 1 | CDKN2B-AS1 | MNDR v3.1 |
| 2 | NPTN-IT1 | Unconfirmed |
| 3 | HOXA11-AS | Unconfirmed |
| 4 | AFAP1-AS1 | Lnc2Cancer 3.0, MNDR v3.1 |
| 5 | PCAT1 | PMID:33277833 |
| 6 | GAS5 | Lnc2Cancer 3.0, MNDR v3.1 |
| 7 | CRNDE | MNDR v3.1 |
| 8 | CASC2 | PMID:32655801 |
| 9 | SNHG16 | Lnc2Cancer 3.0, MNDR v3.1 |
| 10 | SPRY4-IT1 | PMID:28651500 |
The top 10 predicted lung cancer-associated lncRNAs
| Rank | lncRNA name | Evidence |
|---|---|---|
| 1 | ZFAS1 | PMID: 31692094 |
| 2 | LINC-ROR | Lnc2Cancer 3.0, MNDR v3.1 |
| 3 | CRNDE | PMID: 30554121 |
| 4 | HOXA11-AS | Lnc2Cancer 3.0, MNDR v3.1 |
| 5 | CYTOR | MNDR v3.1 |
| 6 | PTENP1 | Unconfirmed |
| 7 | XIST | MNDR v3.1 |
| 8 | DRAIC | PMID: 30544991 |
| 9 | NEAT1 | Lnc2Cancer 3.0, MNDR v3.1 |
| 10 | NPTN-IT1 | Unconfirmed |
The top 10 predicted cervical cancer-associated lncRNAs
| Rank | lncRNA name | Evidence |
|---|---|---|
| 1 | CCAT2 | LncRNADisease v2.0 |
| 2 | MALAT1 | LncRNADisease v2.0 |
| 3 | H19 | LncRNADisease v2.0 |
| 4 | TUG1 | LncRNADisease v2.0 |
| 5 | CDKN2B-AS1 | LncRNADisease v2.0 |
| 6 | UCA1 | LncRNADisease v2.0 |
| 7 | HOTAIR | LncRNADisease v2.0 |
| 8 | MEG3 | LncRNADisease v2.0 |
| 9 | CCAT1 | LncRNADisease v2.0 |
| 10 | GAS5 | LncRNADisease v2.0 |
Fig. 6The flowchart of MAGCNSE. Step 1: extract features from the 3 views of similarity graphs of lncRNAs and 2 views of similarity graphs diseases utilizing GCN. Step 2: leverage attention mechanism for adaptively assigning weights to different feature matrices of lncRNAs and diseases. Step 3: acquire the final representations of lncRNAs and diseases by further extracting features from the multi-channel feature matrices of lncRNAs and diseases using the CNN. Step 4: employ a stacking ensemble classifier to make LDA predictions
Fig. 7The flowchart of the stacking ensemble classifier
Key hyperparameters of the six traditional classifiers and their optimal value after grid search
| Method | Optimal hyperparameters |
|---|---|
| RandonForest | max_feature=10; min_sample_split=2; n_estimators=2000 |
| ExtraTrees | max_feature=10; min_sample_split=2; n_estimators=2000 |
| XGBoost | learning_rate=0.05; max_depth=4; gamma=0; n_estimators=1000 |
| LightGBM | learning_rate=0.15; max_depth=10; num_leaves=31; n_estimators=200 |
| CatBoost | depth=3; iteration=800; learning_rate=0.1; border_count=32; l2_leaf_reg=5 |
| LogisticRegression | C=20.0; max_iter=40; penalty=‘l2’ |