| Literature DB >> 28680165 |
Tianci Song1, Sha Cao2, Sheng Tao2, Sen Liang1,2, Wei Du3,4, Yanchun Liang5,6.
Abstract
The aberrant alterations of biological functions are well known in tumorigenesis and cancer development. Hence, with advances in high-throughput sequencing technologies, capturing and quantifying the functional alterations in cancers based on expression profiles to explore cancer malignant process is highlighted as one of the important topics among cancer researches. In this article, we propose an algorithm for quantifying biological processes by using gene expression profiles over a sample population, which involves the idea of constructing principal curves to condense information of each biological process by a novel scoring scheme on an individualized manner. After applying our method on several large-scale breast cancer datasets in survival analysis, a subset of these biological processes extracted from corresponding survival model is then found to have significant associations with clinical outcomes. Further analyses of these biological processes enable the study of the interplays between biological processes and cancer phenotypes of interest, provide us valuable insights into cancer biology in biological process level and guide the precision treatment for cancer patients. And notably, prognosis predictions based on our method are consistently superior to the existing state of art methods with the same intention.Entities:
Mesh:
Year: 2017 PMID: 28680165 PMCID: PMC5498659 DOI: 10.1038/s41598-017-04961-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1An example of quantifying biological processes based on Local Principal Curves (LPCs) algorithm. The input for the proposed method are a simulated gene expression matrix with n samples and p genes, and k biological processes we concern. 1) For selected biological process, we preprocess expression matrix of these three genes involved in by Principal Component Analysis (PCA), and then, and we then choose suitable Principal Components (PCs) to construct mapping space (see Method). To keep things simple, we here identify gene with PC. The plot intuitively illustrates the distribution of samples in mapping space consisting three PCs. 2) We stretch LPC with a starting point and compute projection indices for samples as Biological Process Scores (BPSs). (see Method). The plot illustrates a well-defined curve passes through data cloud with samples projecting onto corresponding position on the curve. The output of the proposed method is a matrix consisting of BPSs.
Figure 2The comparison of the association between selected biological processes and clinical outcomes among gene-based model and biological process-based models. The samples are divided into two groups by performing the hierarchical clustering on the PAM50 gene expression matrix, PDS matrix yielded by Pathifier and BPS matrix yielded by LPC of selected biological processes, respectively. Green and red colors represent alive and dead survival status of samples respectively, White and black colors are associated with positive and negative status of ER, PR, p53 mutation, and lymph node of samples respectively. Yellow, blue and dark green colors denote different grades of samples respectively. The p-values of these clinical outcomes and dichotomized two groups with relation to survival status are calculated using Chi-square tests.
Figure 3The comparison of the prognosis performance among gene-based model and biological process-based models on GSE3494 dataset. The prognosis indices (PIs) for all samples in the dataset are calculated with associated models, and applied to dichotomize the samples into high and low risk groups in comparison to PI cutoff. The p-values of the survival difference between the two groups are calculated using Wilcoxon log-rank tests, and (+) denotes the censored observations. The ROC curves are generated by regarding PI values as predictions in comparison to survival status of samples. Leave one out cross validation (LOOCV) are performed to calculate Wilcoxon log rank p-values and AUCs across models. We found the Wilcoxon log rank p-values and the AUCs derived from the model based on BPS matrix yielded by LPC outperform than those derived from the models based on PAM50 gene expression matrix and PDS matrix yielded by Pathifier both in training and cross validating results.