Literature DB >> 31488946

Multiple Omics Data Integration to Identify Long Noncoding RNA Responsible for Breast Cancer-Related Mortality.

Tapasree Roy Sarkar1,2, Arnab Kumar Maity3, Yabo Niu2, Bani K Mallick2.   

Abstract

Long non-coding RNAs (lncRNAs) are a large and diverse class of transcribed RNAs, which have been shown to play a significant role in developing cancer. In this study, we apply integrative modeling framework to integrate the DNA copy number variation (CNV), lncRNA expression, and downstream target protein expression to predict patient survival in breast cancer. We develop a 3-stage model combining a mechanical model (lncRNA regressed on CNV and target proteins regressed on lncRNA) and a clinical model (survival regressed on estimated effects from the mechanical models). Using lncRNAs (such as HOTAIR and MALAT1) along with their CNV, target protein expressions, and survival outcomes from The Cancer Genome Atlas (TCGA) database, we show that predicted mean square error and integrated Brier score (IBS) are both lower for the proposed 3-step integrated model than that of 2-step model. Therefore, the integrative model has better predictive ability than the 2-step model not considering target protein information.

Entities:  

Keywords:  Long noncoding RNA; TCGA; breast cancer; integrative modeling; survival model

Year:  2019        PMID: 31488946      PMCID: PMC6710679          DOI: 10.1177/1176935119871933

Source DB:  PubMed          Journal:  Cancer Inform        ISSN: 1176-9351


Introduction

Several evidences highlight the emerging impact of long noncoding RNAs (lncRNAs) in cancer progression.[1-4] The aim of this study is to identify the predictive capability of some oncogenic lncRNAs in tumor progression and prognosis of breast cancer. Breast cancer is the most common malignancy and the leading cause of cancer death in women. By focusing on a single type of genetic alteration such as copy number variation (CNV), scientists have identified significant genes that may contribute to cancer progression.[5-8] Due to its complexity, the study of cancer should focus on incorporating data from multiple platforms ranging from genes, transcripts, and proteins found in cancer cells,[9] to whole biological systems, represented by molecular pathways and cell populations.[10] The integration, where multiple levels of omics data (ie, CNV, methylation, and gene expression) are gathered from the same subjects and analyzed, is known as vertical integration.[10-12] In this study, we introduce an easy and simplified way to integrate multiple omics data to show that the survival prediction due to the presence of lncRNAs increases significantly in breast cancer. We consider the genomic platform such as CNV, mRNA expression, proteomic platform such as protein expression, and the phenotype such as the survival of the patients. This study focuses only on the lncRNA expressions from The Cancer Genome Atlas (TCGA) breast cancer data. We consider the target protein expressions as proteomics data.

An Integrative Model

We consider a 3-stage model here. Suppose that n is the number of patients, p is the number of lncRNAs, and L is the number of CNV expressions. The mechanistic model for each lncRNA can be expressed as where is the level of gene expression for gene , and is of dimension ; is part of the expression that is attributed to the lth CNV; is the other (remaining) part of the gene expression which is not regulated by CNV and is of dimension ; and is the regression coefficient vector. Next, the downstream target protein of each specific lncRNA was identified from PubMed articles, TCGA RNA-Seq database, and other extensive analyses such as differential expression analysis. The mechanistic model for each protein (for every lncRNA) can be expressed as where and represents the “other” part of the protein expression that is not regulated by lncRNA. and are the regression coefficients corresponding to the CNV expressions and the error part from equation (1), respectively. The clinical component part models the effect of the mechanistic parts of the genes on a clinical outcome of interest and can be written as where is the survival outcome, is the error term, and and are the usual regression coefficients corresponding to lncRNA and the estimated error part from equation (2), respectively. The variable represents the vectorized downstream gene effects attributed to protein expressions and is estimated from the second-stage mechanistic model. Therefore, the clinical component additively models the effects of all the gene expressions and their components—derived from different sources (gene expression, CNV) in a unified manner. Assumptions such as and give rise to the usual linear model, whereas we obtain the log-normal accelerated failure time (AFT) model when we assume . In the presence of right censoring, we observe the tuple , where if the event is observed (death in this case), and 0 otherwise; with being the censoring time. A standard statistical software can be used to fit a log-normal AFT model and the other linear regression models. To quantify the prediction accuracy, we consider a standard comparative predictive approach Brier score (BS)[13] which uses the predicted survival times where denotes the Kaplan-Meier estimate of the censoring distribution which is based on the observations , and stands for the estimated survival function. As the mathematical form suggests, BS provides a numerical comparison between the observed and estimated survival functions. Brier score is defined for each time point and hence can be added for the entire time range to obtain IBS, . We can see that models with smaller scores are preferred. We compute integrated Brier score (IBS) using ipred package.[14] Nevertheless, we also compute the prediction square error by comparing the observed data and their posterior predicted values. From TCGA database, we consider the information of 222 breast tumor samples with their survival data. We observe that at least 82% data are right censored. Along with the clinical observations, we also collected measurements of 12 lncRNA expressions (Table 1). Among those, we found the CNV information available for 9 genes (or lncRNAs). We also consider 64 target protein expressions for these genes.
Table 1.

The lncRNA considered for our experiment.

GeneFunction
BCAR4 [a] Oncogenic, promotes invasion and metastasis[15]
BCYRN1 Oncogenic, promotes tumor progression[16]
GAS5 [a] Tumor suppressor[17]
H19 [a] Oncogenic, promotes proliferation and metastasis[18]
HOTAIR [a] Oncogenic, promotes EMT, proliferation, and metastasis[19]
MALAT1 [a] Oncogenic, promotes proliferation, invasion, and migration[20]
MEG3 [a] Tumor suppressor, induces accumulation of p53[21]
PVT1 [a] Oncogenic, promotes tumor progression[22]
SOX2OT Oncogenic, promotes tumor growth and metastasis[23]
SRA1 [a] Oncogenic[24]
UCA1 [a] Oncogenic, promotes cell growth, suppresses the tumor suppressor p27[25]
XIST Tumor suppressor[26]

Abbreviation: EMT, epithelial-mesenchymal transition.

The copy number variation available (among those lncRNAs, SRA1 transcribes both long noncoding and protein-coding RNAs which are produced by alternative splicing).

The lncRNA considered for our experiment. Abbreviation: EMT, epithelial-mesenchymal transition. The copy number variation available (among those lncRNAs, SRA1 transcribes both long noncoding and protein-coding RNAs which are produced by alternative splicing). We apply the integrative modeling in these data and obtain the results shown in Table 2. We notice that the mean squared prediction error and IBS are both lower for the proposed model than for the 2-stage model after omitting the protein expressions from the analysis.
Table 2.

MSPE and IBS for fitted models in TCGA breast cancer data.

ModelsMSPEIBS
2-stage1.9030.488
3-stage1.1960.395

Abbreviations: IBS, integrated Brier score; MSPE, mean squared prediction error; TCGA, The Cancer Genome Atlas.

MSPE and IBS for fitted models in TCGA breast cancer data. Abbreviations: IBS, integrated Brier score; MSPE, mean squared prediction error; TCGA, The Cancer Genome Atlas. In this article, we have shown that when the contribution of lncRNA’s target protein expression measurement is not ignored, then the survival prediction has improved dramatically. Toward this, we have developed a simple yet integrative modeling strategy which borrows strengths from all 3 platforms such as DNA CNV, mRNA expressions for the long noncoding genes, and their target protein expressions to predict the survival of the subjects. We have shown that this integrated model outperforms its closest competitor.
  25 in total

1.  Assessment and comparison of prognostic classification schemes for survival data.

Authors:  E Graf; C Schmoor; W Sauerbrei; M Schumacher
Journal:  Stat Med       Date:  1999 Sep 15-30       Impact factor: 2.373

2.  Powerful SNP-set analysis for case-control genome-wide association studies.

Authors:  Michael C Wu; Peter Kraft; Michael P Epstein; Deanne M Taylor; Stephen J Chanock; David J Hunter; Xihong Lin
Journal:  Am J Hum Genet       Date:  2010-06-11       Impact factor: 11.025

3.  Amplification of MDS1/EVI1 and EVI1, located in the 3q26.2 amplicon, is associated with favorable patient prognosis in ovarian cancer.

Authors:  Meera Nanjundan; Yasuhisa Nakayama; Kwai Wa Cheng; John Lahad; Jinsong Liu; Karen Lu; Wen-Lin Kuo; Karen Smith-McCune; David Fishman; Joe W Gray; Gordon B Mills
Journal:  Cancer Res       Date:  2007-04-01       Impact factor: 12.701

4.  The c-Myc oncogene directly induces the H19 noncoding RNA by allele-specific binding to potentiate tumorigenesis.

Authors:  Dalia Barsyte-Lovejoy; Suzanne K Lau; Paul C Boutros; Fereshteh Khosravi; Igor Jurisica; Irene L Andrulis; Ming S Tsao; Linda Z Penn
Journal:  Cancer Res       Date:  2006-05-15       Impact factor: 12.701

5.  Expression of the steroid receptor RNA activator in human breast tumors.

Authors:  E Leygue; H Dotzlaw; P H Watson; L C Murphy
Journal:  Cancer Res       Date:  1999-09-01       Impact factor: 12.701

Review 6.  The emergence of lncRNAs in cancer biology.

Authors:  John R Prensner; Arul M Chinnaiyan
Journal:  Cancer Discov       Date:  2011-10       Impact factor: 39.397

7.  Integration of DNA copy number alterations and transcriptional expression analysis in human gastric cancer.

Authors:  Biao Fan; Somkid Dachrut; Ho Coral; Siu Tsan Yuen; Kent Man Chu; Simon Law; Lianhai Zhang; Jiafu Ji; Suet Yi Leung; Xin Chen
Journal:  PLoS One       Date:  2012-04-23       Impact factor: 3.240

8.  SOX2 is an amplified lineage-survival oncogene in lung and esophageal squamous cell carcinomas.

Authors:  Adam J Bass; Hideo Watanabe; Craig H Mermel; Soyoung Yu; Sven Perner; Roel G Verhaak; So Young Kim; Leslie Wardwell; Pablo Tamayo; Irit Gat-Viks; Alex H Ramos; Michele S Woo; Barbara A Weir; Gad Getz; Rameen Beroukhim; Michael O'Kelly; Amit Dutt; Orit Rozenblatt-Rosen; Piotr Dziunycz; Justin Komisarof; Lucian R Chirieac; Christopher J Lafargue; Veit Scheble; Theresia Wilbertz; Changqing Ma; Shilpa Rao; Hiroshi Nakagawa; Douglas B Stairs; Lin Lin; Thomas J Giordano; Patrick Wagner; John D Minna; Adi F Gazdar; Chang Qi Zhu; Marcia S Brose; Ivan Cecconello; Ulysses Ribeiro; Suely K Marie; Olav Dahl; Ramesh A Shivdasani; Ming-Sound Tsao; Mark A Rubin; Kwok K Wong; Aviv Regev; William C Hahn; David G Beer; Anil K Rustgi; Matthew Meyerson
Journal:  Nat Genet       Date:  2009-10-04       Impact factor: 38.330

9.  GOLPH3 modulates mTOR signalling and rapamycin sensitivity in cancer.

Authors:  Kenneth L Scott; Omar Kabbarah; Mei-Chih Liang; Elena Ivanova; Valsamo Anagnostou; Joyce Wu; Sabin Dhakal; Min Wu; Shujuan Chen; Tamar Feinberg; Joseph Huang; Abdel Saci; Hans R Widlund; David E Fisher; Yonghong Xiao; David L Rimm; Alexei Protopopov; Kwok-Kin Wong; Lynda Chin
Journal:  Nature       Date:  2009-06-25       Impact factor: 49.962

10.  Comprehensive molecular portraits of human breast tumours.

Authors: 
Journal:  Nature       Date:  2012-09-23       Impact factor: 49.962

View more
  3 in total

1.  Prediction and prognostic significance of ALOX12B and PACSIN1 expression in gastric cancer by genome-wide RNA expression and methylation analysis.

Authors:  Zhiping Liu; Lei Li; Xindi Li; Mingtao Hua; Huaqing Sun; Shengui Zhang
Journal:  J Gastrointest Oncol       Date:  2021-10

2.  Improving existing analysis pipeline to identify and analyze cancer driver genes using multi-omics data.

Authors:  Quang-Huy Nguyen; Duc-Hau Le
Journal:  Sci Rep       Date:  2020-11-25       Impact factor: 4.379

3.  Selection of lncRNAs That Influence the Prognosis of Osteosarcoma Based on Copy Number Variation Data.

Authors:  Jian Zhang; Chi Huang; Guiqi Zhu; Guanyi He; Wenbo Xu; Jianming Li; Dong Wang; Kecheng Han; Zilong Shen; Jianyu Liu
Journal:  J Oncol       Date:  2022-03-26       Impact factor: 4.375

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.