| Literature DB >> 34986865 |
Boxue He1,2,3, Cong Wei3, Qidong Cai1,2, Pengfei Zhang1,2, Shuai Shi1,2, Xiong Peng1,2, Zhenyu Zhao1,2, Wei Yin1,2, Guangxu Tu1,2, Weilin Peng1,2, Yongguang Tao1,2,4,5, Xiang Wang6,7.
Abstract
BACKGROUND: Alternative splicing (AS) plays important roles in transcriptome and proteome diversity. Its dysregulation has a close affiliation with oncogenic processes. This study aimed to evaluate AS-based biomarkers by machine learning algorithms for lung squamous cell carcinoma (LUSC) patients.Entities:
Keywords: Alternative splicing; Biomarkers; Lung squamous cell carcinoma; Machine learning algorithms; Splicing switch
Year: 2022 PMID: 34986865 PMCID: PMC8734344 DOI: 10.1186/s12935-021-02429-2
Source DB: PubMed Journal: Cancer Cell Int ISSN: 1475-2867 Impact factor: 5.722
Fig. 1Schematic diagram of the overall study
Fig. 2UpSet plots of AS events included in this study (A) and AS events after preliminary exclusion (B) presented in different splicing patterns. The exclusion criteria: AS events available in less than 30%, average PSI value ≤ 0.05, or standard deviation < 0.01). AS, alternative splicing
Fig. 3Heat maps in identifying AS events matter in the splicing switch of LUSC samples. A PSI levels of differentially expressed AS events between normal and LUSC samples after Boruta selection. B PSI levels of 16 pairs of completely negatively correlated AS events which expressed totally contrarily in normal and LUSC tissues
The clinicopathological characteristics of LUSC data from TCGA and patient data utilized from our department
| TCGA dataa | Test data | P value | |
|---|---|---|---|
| n | 501 | 20 | |
| Gender (%) | 0.795 | ||
| Female | 130 (25.9) | 6 (30.0) | |
| Male | 371 (74.1) | 14 (70.0) | |
| Age (mean (SD)) | 67.20 (8.58) | 53.90 (8.72) | < 0.001 |
| Smoking history (%) | 0.183 | ||
| Nonsmoker | 18 (3.7) | 2 (10.0) | |
| Smoker | 471 (96.3) | 18 (90.0) | |
| Stage (%) | 0.077 | ||
| Stage I | 244 (49.1) | 16 (80.0) | |
| Stage II | 162 (32.6) | 3 (15.0) | |
| Stage III | 84 (16.9) | 1 (5.0) | |
| Stage IV | 7 (1.4) | 0 (0.0) |
aSome information of several patients was missed in TCGA
Fig. 4Verification of some genes or important AS events by qRT-PCR with 20 pairs of normal and LUSC tissues. Each dot represents an individual patient. Results are expressed as mean ± SEM. N, normal; C, cancer
Fig. 5Exploration of the LNM-related AS classifier. A Z-score of the top 30 important AS events (top 19 of them were confirmed as important features, the other 11 were shown as red boxes) and other 20 randomly selected ones (in yellow) via the Boruta algorithm. B The mean cross-validation error regarding the number of AS events in the five-round five-fold cross-validation. C The heat map for PSI levels of AS events in the LNM classifier. The data were normalized via the R function scale. D ROC curves in identifying LNM status of LUSC patients with the classifier by the fivefold cross-validation. LNM, lymph node metastasis
Fig. 6Prognostic model construction and efficiency assessment. A Identification of survival-related AS events simultaneously identified by Cox regression and random survival forests. B, C Visualization of the risk score and survival for each patient. D The PSI levels of the 21-ASE signature in the high-risk and the low-risk group. E The Kaplan–Meier survival curve for patients in the high-risk and the low-risk group. F Time-dependent ROC curves for LUAD patients at 1, 3, and 5 years
Fig. 7Forest plots and the nomogram for the prognosis of LUSC patients. The evaluation effects of several clinical features and the risk model for the prognostic of LUSC patients assessed by univariate Cox regression analysis (A) and multivariate Cox regression analysis (B). C The nomogram predicts the overall survival probability of LUSC patients
Fig. 8Functional enrichment analyses. A The upset plot shows the 102 overlapping AS events selected by Cox regression and random survival forests. B Pathway analyses of these genes associated with OS-related splicing events
Fig. 9Correlation analysis between splicing factors and AS events in LUSC cohort. AThe splicing network for splicing factors and AS events. Yellow nodes indicate splicing factors, red nodes indicate poor survival associated with AS events, and blue nodes represent good survival associated with AS events; Red lines represent positive correlation, and blue lines represent the negative correlation. B The correlation between PSI values of FLT4-75015-AT or FLT4-75016-AT and the expression of TRA2B. C The correlation between PSI values of YPEL3-36066-AP and the expression of TRA2B. D The correlation between PSI values of LAMP2-89999-AT and the expression of CPSF6. E The correlation between PSI values of SSFA2-56439-AP or SSFA2-56438-AP and the expression of BUD31. The negative correlation is presented in dodger blue, while the positive correlation in medium violate color