Literature DB >> 30040079

A merged lung cancer transcriptome dataset for clinical predictive modeling.

Su Bin Lim^1,2, Swee Jin Tan³, Wan-Teck Lim^4,5,6, Chwee Teck Lim^1,2,7,8.

Abstract

The Gene Expression Omnibus (GEO) database is an excellent public source of whole transcriptomic profiles of multiple cancers. The main challenge is the limited accessibility of such large-scale genomic data to people without a background in bioinformatics or computer science. This presents difficulties in data analysis, sharing and visualization. Here, we present an integrated bioinformatics pipeline and a normalized dataset that has been preprocessed using a robust statistical methodology; allowing others to perform large-scale meta-analysis, without having to conduct time-consuming data mining and statistical correction. Comprising 1,118 patient-derived samples, the normalized dataset includes primary non-small cell lung cancer (NSCLC) tumors and paired normal lung tissues from ten independent GEO datasets, facilitating differential expression analysis. The data has been merged, normalized, batch effect-corrected and filtered for genes with low variance via multiple open source R packages integrated into our workflow. Overall this dataset (with associated clinical metadata) better represents the diseased population and serves as a powerful tool for early predictive biomarker discovery.

Entities: Disease Gene Species

Mesh：

Year: 2018 PMID： 30040079 PMCID： PMC6057440 DOI： 10.1038/sdata.2018.136

Source DB: PubMed Journal: Sci Data ISSN： 2052-4463 Impact factor: 6.444

Background & Summary

The big data boom heralds a new era of precision medicine – access to large pools of ‘omics’ data has driven breakthroughs in this emerging field. In particular, microarray technology is one of the most extensively explored high-throughput methodologies for the quantitative assessment of gene expression[1,2]. The Gene Expression Omnibus (GEO) database at the National Center for Biotechnology Information (NCBI) was launched in 2000 to support public use of such genomic resources provided by the scientific communities[3,4]. Since then, 94,577 series probed with 18,138 platforms, for over 2 million samples have been submitted to the GEO database. The challenge with these vast datasets, however, is that exploring a huge breadth of data is not straightforward – from effectively querying the correct dataset to utilizing the right pipelines for realizing true significance from such high-dimensional data. Successful differential expression analyses, for example, are reliant on careful interrogation to minimize non-biological variations. Preprocessing of microarray data is thus an essential step prior to downstream analysis. Several preprocessing pipelines exist for background correction and normalization of array-dependent gene expression. The most commonly used techniques are Robust Multiarray Average (RMA)[5], frozen Robust Multiarray Analysis (fRMA)[6], Single Channel Array Normalization (SCAN)[7], and Universal exPression Code (UPC)[8]. The fRMA method was chosen in this study for its use in the InSilico DB package[9] implemented in our developed framework. The merging of multiple genomic datasets into a single matrix for large-scale meta-analysis poses another source of variation termed the batch effect. Such bias arises as a consequence of systematic technical or non-biological differences between independent laboratories[10]. It is nonetheless possible to adjust this inter-dataset variation with previously established models for such batch effect removal. These include the Empirical Bayes method, also known as ComBat[11], the Batch mean-centering (BMC)[12], the Gene standardization (GENENORM)[13], and the distance-weighted discrimination (DWD)[14]. The Combat method was applied to ten fRMA-preprocessed microarray datasets in this work for the integration into a single dataset. Here, we present an integrated R pipeline and a transcriptome dataset for non-small cell lung cancer (NSCLC), together with its associated clinical metadata (Fig. 1). Using this strategy, we recently identified an expression pattern of specific genes that could serve as an accurate clinical tool for its predictive value in prognosis and adjuvant therapy response in NSCLC[15]. Our unique selection and integration of multiple open source R packages greatly reduce computational complexity and processing time to ultimately identify putative cancer-associated gene signatures. To facilitate gene differential expression (DE) analyses, we processed a total of 1,118 patient-derived samples including primary tumors as well as tumor-free control tissues. Additionally, we embedded two robust quality control metrics utilizing RNA-Seq data from the Cancer Genome Atlas (TCGA) in the present pipeline for multi-platform assessment and validation of differentially expressed genes. This normalized dataset serves as an excellent large-scale ‘discovery cohort’ for identification of clinically relevant NSCLC biomarkers.

Figure 1

Study design.

Preprocessing of raw data from ten independent datasets was done for normalization, background correction and probe-to-gene mapping. The fRMA-normalized data were corrected for batch effect using ComBat method and filtered for genes with low variance across samples. Validation of our dataset was done with PCA analyses and similarity measurement using RNA-Seq-profiled samples. Statistical R packages used to develop this dataset are stated.

Methods

Detailed methods, including the study design and statistical analyses, for constructing NSCLC gene panel and developing clinically applicable risk scoring metrics for patient stratification and prognostication can be found in our recent publication[15].

Data collection and preprocessing

The raw data of gene expression profiles from ten independent GEO datasets comprising a total of 1,118 NSCLC samples including both primary tumors and normal lung tissues were downloaded from the NCBI via the inSilicoDb package[9]. Samples processed using the same chip platform (Affymetrix Human Genome U133 Plus 2.0 Array) were analyzed (Table 1). This minimizes batch effects that arise from different microarray platforms and allows the analysis of the same set of genes with the same probesets. The fRMA method was first applied to the raw data via the getDataset function for background correction, normalization and probe-to-gene mapping. This embedded function allows fast data accession and simultaneous preprocessing of expression profiles, regardless of the screening platform. All clinical information annotated in ten initial datasets were further collected and curated for clinical model development (Data Citation 1).

Table 1

GSE accession number and number of samples for each phenotype.

	Dataset	Lung tissue	Microarray	Platform
1	GSE10799	3	16	Affymetrix Human Genome U133 Plus 2.0 Array
2	GSE12667	0	75	Affymetrix Human Genome U133 Plus 2.0 Array
3	GSE50081	0	181	Affymetrix Human Genome U133 Plus 2.0 Array
4	GSE31210	20	226	Affymetrix Human Genome U133 Plus 2.0 Array
5	GSE18842	45	46	Affymetrix Human Genome U133 Plus 2.0 Array
6	GSE10445	0	72	Affymetrix Human Genome U133 Plus 2.0 Array
7	GSE33356	60	60	Affymetrix Human Genome U133 Plus 2.0 Array
8	GSE19188	65	91	Affymetrix Human Genome U133 Plus 2.0 Array
9	GSE28571	0	100	Affymetrix Human Genome U133 Plus 2.0 Array
10	GSE10245	0	58	Affymetrix Human Genome U133 Plus 2.0 Array
	TOTAL	193	925	1118

Batch effect removal

Using the inSilicoMerging package[16], we next merged ten fRMA-preprocessed datasets and corrected for batch effects that arise from technical variation between independent studies. The merge function included in this package is simple and straightforward to use for batch effect correction, regardless of the number of independent datasets being queried. Of existing batch effect removal techniques, the ComBat method[11] was applied to these preprocessed microarray datasets. Technical validation of any chosen method can be done using embedded functions such as plotMDS, plotRLE, and plotGeneWiseBoxPlot. These features allow visual demonstration of reduced variance via the Principal Component Analysis (PCA) approach. Only the first two PCs are plotted as these variables capture the most significant patterns of variation which arises as a consequence of non-biological difference across independent batches[10]. In our recent study[15], we used the prcomp function in the stats package and the ggbiplot function in the ggbiplot package[17] for generating PCA graphs and subsequent visualization, respectively. In this work, we demonstrate the batch effect removal using the embedded plotMDS function (Fig. 2).

Figure 2

Validity of our generated dataset.

(a) The effect of batch effect removal is clearly demonstrated using the plotMDS function. (b) The MDS plot of our merged microarray dataset shows a clear separation between different disease phenotypes (925 primary NSCLC tumors: red; 193 non-tumors: green). (c) The merging effect of the ComBat technique on the fRMA-normalized data is illustrated using the plotRLE function. (d) The local effect of the ComBat method at the gene-level is demonstrated using the plotGeneWiseBoxPlot function. A1BG gene was selected for the demonstration purpose.

Gene filtering

Genes with low variance across samples can be filtered prior to performing DE analysis. This step prevents flat genes from affecting the downstream analysis and improves the computational processing time by focusing on only statistically significant genes in a meta-analysis. Our integrated dataset stores a huge amount of transcriptomic data, including expression values of 20,155 genes for 1,118 NSCLC patients. Gene filtering was performed using the nsFilter function in the genefilter package[18], removing 10,078 genes for subsequent identification of DE genes.

Code Availability

The R code used to generate our normalized dataset and all the plots described in this paper (and in our recent work[15]) can be found in figshare (Data Citation 1).

Data Records

Our normalized microarray dataset with associated clinical metadata is available at ArrayExpress (Data Citation 2). DE gene lists with full description are deposited as individual text files in figshare (Data Citation 1). These include annotations of log 2 fold-change, average expressions, t, P-value and adjusted P-value derived from both microarray and RNA-Seq platforms. All the GEO datasets processed through our pipeline are available from the National Center for Biotechnology Information Gene Expression Omnibus (GEO) databases (Data Citation 3, Data Citation 4, Data Citation 5, Data Citation 6, Data Citation 7, Data Citation 8, Data Citation 9, Data Citation 10, Data Citation 11, Data Citation 12).

Technical Validation

Visual validation of batch effect removal

The following functions available in the inSilicoMerging package[16] are used to check the validity of our approach in correcting for batch effects. In this study, the ComBat adjustment is visualized at both systemic and gene-specific levels.

A. The plotMDS function

The effect of ComBat technique is clearly demonstrated on ten preprocessed datasets (Fig. 2a). The resulting MDS plot in Fig. 2b shows a clear separation of the samples according to the disease phenotype (biological variation), and not the source of dataset (non-biological variation), highlighting successful removal of the batch effect in this merged dataset.

B. The plotRLE function

Similarly, other functions implemented in the present pipeline can be used to visualize the statistical correction. Here, we randomly selected 50 samples using the RLE plots for demonstration purposes (Fig. 2c). Samples are colored according to the study they are extracted from. Although not as clearly visible as the plotMDS function due to large number of variables, the merging effect of ComBat transformation can clearly be indicated using the plotRLE function.

C. The plotGeneWiseBoxPlot function

Unlike the two above-mentioned functions, the last visualization technique included in our R framework shows the local effect of batch effect adjustment at the individual gene level. For demonstration purposes, we selected A1BG gene to be illustrated in the gene-wise boxplot (Fig. 2d). A notable change in expression of this gene resulting from the adjustment again demonstrates the validity of the merging technique used in our integrative pipeline for the identification of DE genes.

Multi-platform assessment of DE genes

The following steps implemented in our workflow aim to address continuing concerns raised in previous works regarding reproducibility of DE genes using the microarray platform[19,20]. Briefly, we first performed random sampling using our generated dataset and derived ranked list of DE genes with each iteration. A significant overlap between ranked lists was indicated by a high overlap coefficient, showing high intra-platform reproducibility in differential gene expression. We further compared DE gene signatures generated from our normalized dataset with that from RNA-seq platform using the TCGA database and observed high inter-platform concordance. Altogether, these additional steps in our pipeline ensure the reproducibility of potential cancer biomarkers derived from our dataset.

A. An iterative approach - random sampling

We first determined DE genes using our NSCLC dataset via the limma package[21] by applying the following statistical criteria: (1) log 2 fold change >1.5; (2) adjusted P-value<1.0E-10. Such stringent cutoff thresholds produce only a handful of significant genes that distinguish tumors from tumor-free lung tissues. To dispel any possible bias against our feature selection, we performed random sampling using our dataset – the overlap coefficient was computed using all DE gene lists derived from 10,000 iterations. The mean overlap coefficient of 0.899 was obtained in our previous work[15], validating the robustness of our approach in identifying DE genes. Overall, we show a simple, yet reliable meta-analysis pipeline for discovering reproducible DE genes and facilitating development of clinically applicable models.

B. Different profiling platform – TCGA RNA-Seq data

As our dataset exclusively comprised of datasets probed with the same platform (microarray), we further investigated the generalizability of our merged data using RNA-Seq-assayed samples. Level-3 RNAseqV2 gene expression profiles of lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) from TCGA were preprocessed via the TCGA-Assembler package[22] for subsequent DE analyses. The raw sequencing data were first normalized with RNA-Seq by expectancy maximization (RSEM) method using the DownloadRNASeqData function. Prior to DE analysis, these RSEM-normalized data were preprocessed using the DGEList function and only genes expressing at a counts-per-million (CPM) above zero in at least 20% of the samples were retained using the cpm function via the edgeR package[23]. The resulting data were again normalized by Trimmed Mean of M-values (TMM) using the calcNormFactors function via the edgeR package[23]. The voom-transformed data were then used to derive final DE gene list via the limma package[21]. As previously described, the PCA plot was generated for this preprocessed TCGA data to visualize a clear separation according to the disease status. To further demonstrate the utility of our generated dataset in identifying unique set of genes defining distinct subtypes of NSCLC, we performed separate meta-analyses of adenocarcinoma and squamous cell carcinoma (SCC). DE gene lists obtained from the two subtypes were then compared with that from TCGA LUAD and LUSC cohorts, respectively (Fig. 3a). To dispel any bias that could be introduced from different number of genes assayed within each platform, only common genes included in the final DE gene lists were ranked and compared. Regardless of cancer subtypes, a high degree of overlap between DE genes derived from the two platforms was observed (Spearman’s correlation coefficient r=0.917 and 0.933 for ADC and SCC, respectively). We further identified uniquely and commonly up-regulated DE genes in tumors compared to control tissues (Fig. 3b) by applying our defined cutoff thresholds (logFC >1.5 and logFC >3 for the microarray-based dataset and RNA-seq-based TCGA dataset, respectively).

Figure 3

The interplatform concordance between microarray (normalized dataset) and RNA-Seq (TCGA) platforms in discovering DE genes for distinct subtypes of NSCLC.

(a) Linear regression lines (black line) and marginal histograms (blue) are drawn; rs=Spearman’s correlation coefficient. (b) DEG lists generated for adenocarcinoma and squamous cell carcinoma (SCC). logFC >1.5 and logFC >3 were used for statistical criteria to define DE genes for our normalized dataset and TCGA cohorts, respectively.

The present normalized dataset of lung cancer together with its associated clinical metadata will allow exploration of distinct patterns of DE genes in relation to clinical features, including histology, gender, age, pathological and TNM stage, and survival outcomes, facilitating clinical predictive modeling for accurate diagnosis and prognosis in oncology.

Additional information

How to cite this article: Lim, S. B. et al. A merged lung cancer transcriptome dataset for clinical predictive modeling. Sci. Data 5:180136 doi: 10.1084/sdata.2018.136 (2018). Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

20 in total

1. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements.

Authors: Leming Shi; Laura H Reid; Wendell D Jones; Richard Shippy; Janet A Warrington; Shawn C Baker; Patrick J Collins; Francoise de Longueville; Ernest S Kawasaki; Kathleen Y Lee; Yuling Luo; Yongming Andrew Sun; James C Willey; Robert A Setterquist; Gavin M Fischer; Weida Tong; Yvonne P Dragan; David J Dix; Felix W Frueh; Frederico M Goodsaid; Damir Herman; Roderick V Jensen; Charles D Johnson; Edward K Lobenhofer; Raj K Puri; Uwe Schrf; Jean Thierry-Mieg; Charles Wang; Mike Wilson; Paul K Wolber; Lu Zhang; Shashi Amur; Wenjun Bao; Catalin C Barbacioru; Anne Bergstrom Lucas; Vincent Bertholet; Cecilie Boysen; Bud Bromley; Donna Brown; Alan Brunner; Roger Canales; Xiaoxi Megan Cao; Thomas A Cebula; James J Chen; Jing Cheng; Tzu-Ming Chu; Eugene Chudin; John Corson; J Christopher Corton; Lisa J Croner; Christopher Davies; Timothy S Davison; Glenda Delenstarr; Xutao Deng; David Dorris; Aron C Eklund; Xiao-hui Fan; Hong Fang; Stephanie Fulmer-Smentek; James C Fuscoe; Kathryn Gallagher; Weigong Ge; Lei Guo; Xu Guo; Janet Hager; Paul K Haje; Jing Han; Tao Han; Heather C Harbottle; Stephen C Harris; Eli Hatchwell; Craig A Hauser; Susan Hester; Huixiao Hong; Patrick Hurban; Scott A Jackson; Hanlee Ji; Charles R Knight; Winston P Kuo; J Eugene LeClerc; Shawn Levy; Quan-Zhen Li; Chunmei Liu; Ying Liu; Michael J Lombardi; Yunqing Ma; Scott R Magnuson; Botoul Maqsodi; Tim McDaniel; Nan Mei; Ola Myklebost; Baitang Ning; Natalia Novoradovskaya; Michael S Orr; Terry W Osborn; Adam Papallo; Tucker A Patterson; Roger G Perkins; Elizabeth H Peters; Ron Peterson; Kenneth L Philips; P Scott Pine; Lajos Pusztai; Feng Qian; Hongzu Ren; Mitch Rosen; Barry A Rosenzweig; Raymond R Samaha; Mark Schena; Gary P Schroth; Svetlana Shchegrova; Dave D Smith; Frank Staedtler; Zhenqiang Su; Hongmei Sun; Zoltan Szallasi; Zivana Tezak; Danielle Thierry-Mieg; Karol L Thompson; Irina Tikhonova; Yaron Turpaz; Beena Vallanat; Christophe Van; Stephen J Walker; Sue Jane Wang; Yonghong Wang; Russ Wolfinger; Alex Wong; Jie Wu; Chunlin Xiao; Qian Xie; Jun Xu; Wen Yang; Liang Zhang; Sheng Zhong; Yaping Zong; William Slikker
Journal: Nat Biotechnol Date: 2006-09 Impact factor: 54.908

2. Multiplatform single-sample estimates of transcriptional activation.

Authors: Stephen R Piccolo; Michelle R Withers; Owen E Francis; Andrea H Bild; W Evan Johnson
Journal: Proc Natl Acad Sci U S A Date: 2013-10-15 Impact factor: 11.205

3. Frozen robust multiarray analysis (fRMA).

Authors: Matthew N McCall; Benjamin M Bolstad; Rafael A Irizarry
Journal: Biostatistics Date: 2010-01-22 Impact factor: 5.899

4. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models.

Authors: Leming Shi; Gregory Campbell; Wendell D Jones; Fabien Campagne; Zhining Wen; Stephen J Walker; Zhenqiang Su; Tzu-Ming Chu; Federico M Goodsaid; Lajos Pusztai; John D Shaughnessy; André Oberthuer; Russell S Thomas; Richard S Paules; Mark Fielden; Bart Barlogie; Weijie Chen; Pan Du; Matthias Fischer; Cesare Furlanello; Brandon D Gallas; Xijin Ge; Dalila B Megherbi; W Fraser Symmans; May D Wang; John Zhang; Hans Bitter; Benedikt Brors; Pierre R Bushel; Max Bylesjo; Minjun Chen; Jie Cheng; Jing Cheng; Jeff Chou; Timothy S Davison; Mauro Delorenzi; Youping Deng; Viswanath Devanarayan; David J Dix; Joaquin Dopazo; Kevin C Dorff; Fathi Elloumi; Jianqing Fan; Shicai Fan; Xiaohui Fan; Hong Fang; Nina Gonzaludo; Kenneth R Hess; Huixiao Hong; Jun Huan; Rafael A Irizarry; Richard Judson; Dilafruz Juraeva; Samir Lababidi; Christophe G Lambert; Li Li; Yanen Li; Zhen Li; Simon M Lin; Guozhen Liu; Edward K Lobenhofer; Jun Luo; Wen Luo; Matthew N McCall; Yuri Nikolsky; Gene A Pennello; Roger G Perkins; Reena Philip; Vlad Popovici; Nathan D Price; Feng Qian; Andreas Scherer; Tieliu Shi; Weiwei Shi; Jaeyun Sung; Danielle Thierry-Mieg; Jean Thierry-Mieg; Venkata Thodima; Johan Trygg; Lakshmi Vishnuvajjala; Sue Jane Wang; Jianping Wu; Yichao Wu; Qian Xie; Waleed A Yousef; Liang Zhang; Xuegong Zhang; Sheng Zhong; Yiming Zhou; Sheng Zhu; Dhivya Arasappan; Wenjun Bao; Anne Bergstrom Lucas; Frank Berthold; Richard J Brennan; Andreas Buness; Jennifer G Catalano; Chang Chang; Rong Chen; Yiyu Cheng; Jian Cui; Wendy Czika; Francesca Demichelis; Xutao Deng; Damir Dosymbekov; Roland Eils; Yang Feng; Jennifer Fostel; Stephanie Fulmer-Smentek; James C Fuscoe; Laurent Gatto; Weigong Ge; Darlene R Goldstein; Li Guo; Donald N Halbert; Jing Han; Stephen C Harris; Christos Hatzis; Damir Herman; Jianping Huang; Roderick V Jensen; Rui Jiang; Charles D Johnson; Giuseppe Jurman; Yvonne Kahlert; Sadik A Khuder; Matthias Kohl; Jianying Li; Li Li; Menglong Li; Quan-Zhen Li; Shao Li; Zhiguang Li; Jie Liu; Ying Liu; Zhichao Liu; Lu Meng; Manuel Madera; Francisco Martinez-Murillo; Ignacio Medina; Joseph Meehan; Kelci Miclaus; Richard A Moffitt; David Montaner; Piali Mukherjee; George J Mulligan; Padraic Neville; Tatiana Nikolskaya; Baitang Ning; Grier P Page; Joel Parker; R Mitchell Parry; Xuejun Peng; Ron L Peterson; John H Phan; Brian Quanz; Yi Ren; Samantha Riccadonna; Alan H Roter; Frank W Samuelson; Martin M Schumacher; Joseph D Shambaugh; Qiang Shi; Richard Shippy; Shengzhu Si; Aaron Smalter; Christos Sotiriou; Mat Soukup; Frank Staedtler; Guido Steiner; Todd H Stokes; Qinglan Sun; Pei-Yi Tan; Rong Tang; Zivana Tezak; Brett Thorn; Marina Tsyganova; Yaron Turpaz; Silvia C Vega; Roberto Visintainer; Juergen von Frese; Charles Wang; Eric Wang; Junwei Wang; Wei Wang; Frank Westermann; James C Willey; Matthew Woods; Shujian Wu; Nianqing Xiao; Joshua Xu; Lei Xu; Lun Yang; Xiao Zeng; Jialu Zhang; Li Zhang; Min Zhang; Chen Zhao; Raj K Puri; Uwe Scherf; Weida Tong; Russell D Wolfinger
Journal: Nat Biotechnol Date: 2010-07-30 Impact factor: 54.908

5. A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data.

Authors: J Luo; M Schumacher; A Scherer; D Sanoudou; D Megherbi; T Davison; T Shi; W Tong; L Shi; H Hong; C Zhao; F Elloumi; W Shi; R Thomas; S Lin; G Tillinghast; G Liu; Y Zhou; D Herman; Y Li; Y Deng; H Fang; P Bushel; M Woods; J Zhang
Journal: Pharmacogenomics J Date: 2010-08 Impact factor: 3.550

6. Unlocking the potential of publicly available microarray data using inSilicoDb and inSilicoMerging R/Bioconductor packages.

Authors: Jonatan Taminau; Stijn Meganck; Cosmin Lazar; David Steenhoff; Alain Coletta; Colin Molter; Robin Duque; Virginie de Schaetzen; David Y Weiss Solís; Hugues Bersini; Ann Nowé
Journal: BMC Bioinformatics Date: 2012-12-24 Impact factor: 3.169

7. NCBI GEO: archive for functional genomics data sets--10 years on.

Authors: Tanya Barrett; Dennis B Troup; Stephen E Wilhite; Pierre Ledoux; Carlos Evangelista; Irene F Kim; Maxim Tomashevsky; Kimberly A Marshall; Katherine H Phillippy; Patti M Sherman; Rolf N Muertter; Michelle Holko; Oluwabukunmi Ayanbule; Andrey Yefanov; Alexandra Soboleva
Journal: Nucleic Acids Res Date: 2010-11-21 Impact factor: 16.971

8. The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets - improving meta-analysis and prediction of prognosis.

Authors: Andrew H Sims; Graeme J Smethurst; Yvonne Hey; Michal J Okoniewski; Stuart D Pepper; Anthony Howell; Crispin J Miller; Robert B Clarke
Journal: BMC Med Genomics Date: 2008-09-21 Impact factor: 3.063

9. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Authors: Mark D Robinson; Davis J McCarthy; Gordon K Smyth
Journal: Bioinformatics Date: 2009-11-11 Impact factor: 6.937

10. InSilico DB genomic datasets hub: an efficient starting point for analyzing genome-wide studies in GenePattern, Integrative Genomics Viewer, and R/Bioconductor.

Authors: Alain Coletta; Colin Molter; Robin Duqué; David Steenhoff; Jonatan Taminau; Virginie de Schaetzen; Stijn Meganck; Cosmin Lazar; David Venet; Vincent Detours; Ann Nowé; Hugues Bersini; David Y Weiss Solís
Journal: Genome Biol Date: 2012-11-18 Impact factor: 13.583

10 in total

1. Large-scale gene expression analysis reveals robust gene signatures for prognosis prediction in lung adenocarcinoma.

Authors: Yiyan Songyang; Wei Zhu; Cong Liu; Lin-Lin Li; Wei Hu; Qun Zhou; Han Zhang; Wen Li; Dejia Li
Journal: PeerJ Date: 2019-06-03 Impact factor: 2.984

2. Pan-cancer analysis connects tumor matrisome to immune response.

Authors: Su Bin Lim; Melvin Lee Kiang Chua; Joe Poh Sheng Yeong; Swee Jin Tan; Wan-Teck Lim; Chwee Teck Lim
Journal: NPJ Precis Oncol Date: 2019-05-22

3. Compendiums of cancer transcriptomes for machine learning applications.

Authors: Su Bin Lim; Swee Jin Tan; Wan-Teck Lim; Chwee Teck Lim
Journal: Sci Data Date: 2019-10-08 Impact factor: 6.444

Review 4. Recent advances in diagnostic technologies in lung cancer.

Authors: Hye Jung Park; Sang Hoon Lee; Yoon Soo Chang
Journal: Korean J Intern Med Date: 2020-02-28 Impact factor: 2.884

Review 5. Single-Cell Analysis of Circulating Tumor Cells: Why Heterogeneity Matters.

Authors: Su Bin Lim; Chwee Teck Lim; Wan-Teck Lim
Journal: Cancers (Basel) Date: 2019-10-19 Impact factor: 6.639

6. Proteogenomics of non-small cell lung cancer reveals molecular subtypes associated with specific therapeutic targets and immune evasion mechanisms.

Authors: Janne Lehtiö; Lukas M Orre; Taner Arslan; Ioannis Siavelis; Yanbo Pan; Fabio Socciarelli; Olena Berkovska; Husen M Umer; Georgios Mermelekas; Mohammad Pirmoradian; Mats Jönsson; Hans Brunnström; Odd Terje Brustugun; Krishna Pinganksha Purohit; Richard Cunningham; Hassan Foroughi Asl; Sofi Isaksson; Elsa Arbajian; Mattias Aine; Anna Karlsson; Marija Kotevska; Carsten Gram Hansen; Vilde Drageset Haakensen; Åslaug Helland; David Tamborero; Henrik J Johansson; Rui M Branca; Maria Planck; Johan Staaf
Journal: Nat Cancer Date: 2021-11-22

7. Versatile workflow for cell type-resolved transcriptional and epigenetic profiles from cryopreserved human lung.

Authors: Maria Llamazares-Prada; Elisa Espinet; Vedrana Mijošek; Uwe Schwartz; Pavlo Lutsik; Raluca Tamas; Mandy Richter; Annika Behrendt; Stephanie T Pohl; Naja P Benz; Thomas Muley; Arne Warth; Claus Peter Heußel; Hauke Winter; Jonathan J M Landry; Felix Jf Herth; Tinne Cj Mertens; Harry Karmouty-Quintana; Ina Koch; Vladimir Benes; Jan O Korbel; Sebastian M Waszak; Andreas Trumpp; David M Wyatt; Heiko F Stahl; Christoph Plass; Renata Z Jurkowska
Journal: JCI Insight Date: 2021-03-22

8. Gene selection using pyramid gravitational search algorithm.

Authors: Amirhossein Tahmouresi; Esmat Rashedi; Mohammad Mehdi Yaghoobi; Masoud Rezaei
Journal: PLoS One Date: 2022-03-15 Impact factor: 3.240

9. The Role of the Extracellular Matrix and Tumor-Infiltrating Immune Cells in the Prognostication of High-Grade Serous Ovarian Cancer.

Authors: Yuri Belotti; Elaine Hsuen Lim; Chwee Teck Lim
Journal: Cancers (Basel) Date: 2022-01-14 Impact factor: 6.639

10. Integrative Analysis of Identifying Methylation-Driven Genes Signature Predicts Prognosis in Colorectal Carcinoma.

Authors: Hao Huang; Jinming Fu; Lei Zhang; Jing Xu; Dapeng Li; Justina Ucheojor Onwuka; Ding Zhang; Liyuan Zhao; Simin Sun; Lin Zhu; Ting Zheng; Chenyang Jia; Binbin Cui; Yashuang Zhao
Journal: Front Oncol Date: 2021-06-11 Impact factor: 6.244

10 in total