Literature DB >> 29859855

Multi-Omics Analysis Reveals a HIF Network and Hub Gene EPAS1 Associated with Lung Adenocarcinoma.

Zhaoxi Wang¹, Yongyue Wei², Ruyang Zhang³, Li Su¹, Stephanie M Gogarten⁴, Geoffrey Liu⁵, Paul Brennan⁶, John K Field⁷, James D McKay⁶, Jolanta Lissowska⁸, Beata Swiatkowska⁹, Vladimir Janout¹⁰, Ciprian Bolca¹¹, Milica Kontic¹², Ghislaine Scelo⁶, David Zaridze¹³, Cathy C Laurie⁴, Kimberly F Doheny¹⁴, Elizabeth K Pugh¹⁴, Beth A Marosy¹⁴, Kurt N Hetrick¹⁴, Xiangjun Xiao¹⁵, Claudio Pikielny¹⁵, Rayjean J Hung¹⁶, Christopher I Amos¹⁵, Xihong Lin¹⁷, David C Christiani¹⁸.

Abstract

Recent technological advancements have permitted high-throughput measurement of the human genome, epigenome, metabolome, transcriptome, and proteome at the population level. We hypothesized that subsets of genes identified from omic studies might have closely related biological functions and thus might interact directly at the network level. Therefore, we conducted an integrative analysis of multi-omic datasets of non-small cell lung cancer (NSCLC) to search for association patterns beyond the genome and transcriptome. A large, complex, and robust gene network containing well-known lung cancer-related genes, including EGFR and TERT, was identified from combined gene lists for lung adenocarcinoma. Members of the hypoxia-inducible factor (HIF) gene family were at the center of this network. Subsequent sequencing of network hub genes within a subset of samples from the Transdisciplinary Research in Cancer of the Lung-International Lung Cancer Consortium (TRICL-ILCCO) consortium revealed a SNP (rs12614710) in EPAS1 associated with NSCLC that reached genome-wide significance (OR = 1.50; 95% CI: 1.31-1.72; p = 7.75 × 10-9). Using imputed data, we found that this SNP remained significant in the entire TRICL-ILCCO consortium (p = .03). Additional functional studies are warranted to better understand interrelationships among genetic polymorphisms, DNA methylation status, and EPAS1 expression.

Entities: Chemical

Keywords: Hypoxia-inducible factor; Integrated analysis; Lung adenocarcinoma; Network analysis; Non-small cell lung cancer

Mesh：

Substances：

Year: 2018 PMID： 29859855 PMCID： PMC6021270 DOI： 10.1016/j.ebiom.2018.05.024

Source DB: PubMed Journal: EBioMedicine ISSN： 2352-3964 Impact factor: 8.143

Introduction

Lung cancer is the leading cause of cancer-related mortality worldwide for both men and women [1]. Although advances in cancer prevention, early detection, and treatment have been made in recent decades, the general prognosis for lung cancer remains poor. The high case–fatality ratio of lung cancer has been attributed to advanced stage of disease at diagnosis, poor response to current therapies, and the aggressive biological nature of lung cancer. Non-small cell lung cancer (NSCLC) is the most common type of lung cancer accounting for about 85% of all lung cancers [2,3]. Histologic subtypes of NSCLC include adenocarcinoma, squamous cell carcinoma, and large cell carcinoma [4]. Further, previous studies have demonstrated that heritable factors are significantly important in lung cancer, independent of smoking history or exposure to environmental tobacco smoke [5,6]. More recently, genome-wide association studies (GWAS) have been used to identify multiple independent loci for most diseases, because GWAS can identify common disease susceptibility loci without prior knowledge of locus function or position [7,8]. Several GWAS have identified at least five loci associated with lung cancer risk—on chromosomes 3q28, 5p15, 6p21, 13q13.1 and 15q25—in populations with European ancestry [[9], [10], [11], [12], [13], [14]]. Additional loci at 22q12 and 15q15 have been associated with lung cancer risk [[15], [16], [17]]. GWAS can define lung cancer-associated genomic loci with low to moderate effects, but cannot identify causal mutations given the complicated relationships among disease-associated loci. Recent technological advances have permitted high-throughput measurement of the human genome, epigenome, metabolome, transcriptome, and proteome at the population level. Each study can offer complementary analyses of a certain biological function, and integrative multi-omics analyses are needed to uncover synergistic interactions [18]. However, because each omic study analyzes a different molecular layer, integrative analyses comparing top-ranked genes from different omic studies might not reveal much overlapping genes. We hypothesized that there are subsets of genes identified from different omic studies that might have closely related biological functions and thus might directly interact at the network level. Therefore, it is possible to build network(s) with direct interactions among multiple molecular layers, characterized by higher network complexity and larger gene ratios, where network complexity is defined as the ratio of total number of connections between genes to number of genes within a network, and gene ratio is defined as proportion of genes within a network to total number of genes used to build a network. In addition, incorporating biological functionality from different molecular layers, such as RNA, proteome, and metabolome results, can boost the power of genetic mapping. In this study, we conducted an integrative analysis of GWAS and transcriptomic profiling for NSCLC using network building based on an algorithm that searches for direct interactions from a high-quality, manually curated database of genetic and physical interactions. To evaluate the identified networks, we repeated network building from a large set of randomly generated gene lists for distributions of network complexity and gene ratio. We also used hub genes identified from significant networks for targeted sequencing and further validation in the Transdisciplinary Research in Cancer of the Lung-International Lung Cancer Consortium (TRICO-ILCCO) GWAS meta-analysis.

Materials and Methods

Study Population

This study was based on data derived from 1000 NSCLC cases and 1000 cancer-free controls, frequency-matched by age (±5 years), gender, and smoking status (by packyears) as previously described [19]. All cases were recruited at Massachusetts General Hospital (MGH) from 1992 to 2004, were > 18 years old, and had newly diagnosed, histologically confirmed primary NSCLC. Controls were healthy, non-blood-related family members and friends of patients with cancer or with cardiothoracic conditions undergoing surgery. Histological classification was done by two staff pulmonary pathologists at MGH according to the International Classification of Diseases for Oncology (ICD-O3). For histology analysis, the following codes were used: adenocarcinoma, 8140/3, 8250/3, 8260/3, 8310/3, 8480/3, and 8560/3; large cell carcinoma, 8012/3 and 8031/3; squamous cell carcinoma, 8070/3, 8071/3, 8072/3, and 8074/3; and other non-small cell carcinomas, 8010/3, 8020/3, 8021/3, 8032/3, and 8230/3. The Institutional Review Board of MGH and the Human Subjects Committee of the Harvard School of Public Health approved the study, and all participants signed consent forms.

GWAS Dataset

DNA was extracted from peripheral white blood cells using standard protocols and was genotyped using the Human610-Quad BeadChip (Illumina, San Diego, CA). Before association tests, we conducted a systematic quality evaluation of raw genotyping data according to a general quality control (QC) procedure described by Anderson et al. [20] Briefly, unqualified samples were excluded if they fit the following QC criteria: (i) overall genotype completion rates <95%; (ii) gender discrepancies; (iii) unexpected duplicates or probable relatives (based on pairwise identity by state value, PI_HAT in PLINK >0.185); or (iv) heterozygosity rates >6 standard deviations from the mean. Unqualified SNPs were excluded if they fit the following QC criteria: (i) overall genotype completion rates <95%; (ii) gender discrepancies; (iii) unexpected duplicates or probable relatives (based on pairwise identity by state value, PI_HAT in PLINK >0.185); (iv) heterozygosity rates >6 standard deviations from the mean; or (v) individuals were non-Caucasians (using the HapMap release 23, including JPT, CEPH, CEU, and YRI populations as reference). Unqualified SNPs were excluded if they fit the following QC criteria: (i) not mapped on autosomes; (ii) call rate < 95% in all GWAS samples; (iii) MAF < 0.01; or (iv) genotype distributions deviated from those expected by Hardy-Weinberg equilibrium (p < 1.0 × 10−6). After quality evaluation, we had a dataset of 984 cases and 970 controls with 543,697 autosomal SNPs for epistasis analysis.

Transcriptomic Profiling

FFPE tissues were obtained by surgical biopsy from patients with NSCLC and archived. Histopathologic sections were prepared from tumor and non-affected lung parenchyma tissue by manual microdissection of FFPE blocks. A pathologist who had no knowledge of the study outcome reviewed all tissue sections. Each specimen was evaluated for amount and quality of tumor cells and histologically classified using WHO criteria. Specimens with lower than 70% cancer cellularity were not included for transcriptomic profiling. Sectioned FFPE tissues were sent to Q2 Solutions (formerly Expression Analysis Inc., Morrisville, NC) for RNA extraction, quality assessment, and transcriptomic profiling using whole genome-DASL assay [21]. The Whole-Genome DASL HT assay covered >47,000 annotated transcripts (Illumina, San Diego, CA) [22]. A total of 59 FFPE transcriptomic profiles were obained, including 39 tumor/non-involved tissues from adenocarcinomas, 16 tumor/non-involved tissues from squamous cell carcinomas, and 4 tumor/non-involved tissues from other types of lung carcer. Among them, there were 18 pairs of tumor and matched non-involved tissues of adenocarcinomas and 8 pairs of squamous cell carcinomas, which were used in the transcriptomic analysis.

External Transcriptomic Data

Two transcriptomic datasets of NSCLC were selected and raw data were downloaded from Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo): GSE10072 and GSE18842. GSE10072 focused on lung adenocarcinoma and included 180 paired adenocarcinoma and non-affected tissue samples [23]. GSE18842 included 91 samples of mainly squamous cell carcinoma [24]. GSE10072 and GSE18842 were generated from fresh snap-frozen samples from surgical resection and profiled on Affymetrix Human Genome U133 array (Affymetrix, Santa Clara, CA).

Transcriptomic Data Analysis

Although FFPE profiles and external data were generated from different platforms, we used DNA-Chip Analyzer 2006 (dChip, http://www.dchip.org) software, which applied an invariant set of genes for normalization and calculation of expression values across all microarrays, to normalize raw microarray signals. This analysis assumed that a subset of genes had constant expression among all cell subtypes [25] Only paired tumor and non-affected tissue samples were used in the analysis, including 18 FFPE adenocarcinomas, 8 FFPE squamous cell carcinomas, 33 snap-frozen adenocarcinomas (GSE10072), and 32 snap-frozen squamous cell carcinomas (GSE18842).

Integrated Analysis by Network Building

Gene lists from GWAS and/or transcriptomic profiling were uploaded into MetaCore GeneGo database (https://portal.genego.com, Thomson Reuters, New York, NY) for network building which has >1.7 million molecular interactions, 1600 pathway maps, and 230,000 gene-disease associations [26]. MetaCore is an integrated online software suite for functional analysis of omics data that is based on a high-quality, manually-curated database of molecular interactions, molecular pathways, gene–disease associations, chemical metabolism, and toxicity information. We used direct interaction algorithms on the MetaCore platform to build gene networks consisting only of uploaded genes and their direct interactions, without adding other genes/objects from the GeneGo database. Considering that GWAS and transcriptome were different molecular layers and would not reveal much overlapping genes, we used less stringent criteria to select top-ranked genes from GWAS and transcriptomic profiling without correcting the multiple comparisons. Significant networks were identified and evaluated by two parameters: network complexity and gene ratio. Network complexity was defined as ratio of total number of connections among genes to number of genes within a network. Gene ratio was defined as proportion of genes within a network to total number of genes used to build a network. These parameters allowed us to distinguish simple networks driven by a few supergenes. In such simple networks, the majority of networked genes only had a single connection to one or several genes, called supergenes. Connections of supergenes usually accounted for the majority of connections within a network, and removing supergenes often dramatically reduced numbers of networked genes or demolished the networks. We also explored different p-value cut offs for the selection of genes from GWAS and transcriptomic profiling. With more stringent cut offs, we could not build a significant network. and with more relaxed cut offs, the network complexities were reduced (data not shown).

Network Evaluation by Randomly Generated Gene Lists

SNPs of GWAS data and probes of transcriptomic profiling mapped to a total of 24,847 genes. From these genes, we randomly generated 6 sets of gene lists, with each set containing 100 gene lists, for a total of 600 random gene lists. Lists from each set had the same number of genes, but lists from different sets had different numbers of genes—either 50, 100, 200, 300, 400, or 500 genes. Each random gene list was individually uploaded into MetaCore GeneGo database for network building, and network parameters, including network complexity and gene ratio, from the largest networks were recorded. Quantile regression at 95th percentile was performed to estimate the 95% confidence interval.

Targeted and Whole Exome Sequencing

Targeted and whole exome sequencing was performed at the Center for Inherited Disease Research. Ninety-nine custom regions targeted for a total of 17.26 Mb of custom content was captured and sequenced. BAM files were created by aligning FASTQ files to GRCh37 and joint sample variant calling and variant site filtering was performed. Genotypes for biallelic SNPs were further refined using CalculateGenotypePosteriors and allele frequency information from 1000 genomes phase 3 data as well as the Exome Aggregation Consortium data. Further details were described in Supplementary Methods.

Statistical Analysis

Clinical characteristics were described as mean ± standard deviation (SD) for continuous variables or n (%) for categorical variables. Student's t-test or Fisher's exact test was used for comparison between groups for continuous or categorical variables, respectively. We carried out gene-based analysis on GWAS data and targeted sequencing data using SKAT-O [27]. SKAT-O aggregates weighted variance-component score statistics for each SNP/SNV within a set using a kernel function and tests for associations between groups of SNPs/SNVs and a phenotype while adjusting for relevant covariates [28]. For GWAS data, initially all SNPs that passed QC were mapped to human genes within ±20 kb regions based on information curated in the RefSeq database(NCBI build GRCh37.p13). Separate analyses were conducted individually for all SNPs and rare SNVs with MAF <0.01. Models were adjusted for age (years), gender, smoking status, and top significant eigenvectors. SKAT-O analyses were carried out using the SKAT package (R v. 2.13.0). In meta-analysis of GWAS datasets within TRICL-ILCCO, we combined imputed genotypes from 13,479 lung cancer cases and 43,218 controls undertaken by the previous TRICL-ILCCO GWAS [22,29,30]. We excluded poorly imputed SNPs defined by imputation quality Rsq <0.3 or Info <0.4 for each study and conducted fixed effects meta-analysis with inverse variance weighting and random effects meta-analysis from the DerSimonian-Laird method [31]. We also generated an index of heterogeneity (I2) and p-value of Cochran's Q statistic to assess heterogeneity in meta-analyses. We only considered SNPs with MAF >0.005 and that showed little evidence for effect heterogeneity between studies (Cochran's Q statistic p > .05). All meta-analyses and calculations were performed using SAS version 9.4 (SAS Institute Inc., Cary, NC, USA).

Results

Initial Screening of GWAS and Transcriptomic Profiling Data

We produced a genomic dataset of germline polymorphisms (GWAS genotyping data on the Illumina 610 Quad platform) and a transcriptomic dataset of tumor and non-affected tissue (genome-wide expression profiling on Illumina DASL HT platform) from NSCLC samples [29]. The GWAS dataset included 543,697 single nucleotide polymorphisms (SNPs) from 984 NSCLC cases and 970 healthy controls after quality assessment. Transcriptomic profiling was carried out on formalin-fixed paraffin-embedded (FFPE) paired samples of tumor and non-affected tissues from 30 NSCLC cases, including 18 adenocarcinoma, 8 squamous cell carcinoma, and 2 unclassified cases. Eighteen NSCLC cases had both GWAS and transcriptomic data. Patient characteristics are described in Table 1.

Table 1

Demographic distribution of study populations.

	GWAS samples			FFPE samples
	Case (n = 984)	Control (n = 970)	p	Case (n = 28)
Age (years)	65.5 ± 10.6	59.4 ± 11.6	<0.001	67.1 ± 9.6
≥65	540 (54.9%)	351 (36.2%)	<0.001	20 (71.4%)
Sex (female)	477 (48.5%)	528 (54.4%)	0.010	12 (42.9%)
Smoking pack-years	49.7 ± 35.7	25.0 ± 26.7	<0.001	53.3 ± 34.8
≥30	664 (55.1%)	326 (33.6%)	<0.001	22 (78.6%)
Smoking status			<0.001
Never	92 (9.4%)	161 (16.6%)		2 (7.1%)
Former	502 (51.0%)	555 (57.2%)		18 (64.3%)
Current	390 (39.6%)	254 (26.2%)		8 (28.6%)
Pathology
Adenocarcinomaa	597 (60.7%)			18 (64.3%)
Squamous cell	216 (22.0%)			7 (25.0%)
Small cell	0 (0.0%)			0 (0.0%)
Other	171 (17.3%)			3 (10.7%)

Including adenocarcinoma in situ.

Demographic distribution of study populations. Including adenocarcinoma in situ. Instead of analyzing individual SNP, we conducted gene-based analysis of GWAS data using the optimal unified sequence kernel association test (SKAT-O) method and applied gene lists in the subsequent network analysis [32]. Among 21,981 mapped genes of GWAS data, there were 103 genes with p < .005 (Supplementary Table S1), 232 genes with p < .01, and 1007 genes with p < .05. The top genes associated with risk of developing NSCLC were HYKK (also known as AGPHD1, 15q25, p = 2.30 × 10−6), CLPTM1L (5p15, p = 3.54 × 10−5), CHRNA3 (15q25, p = 6.77 × 10−5), and DNAJC16 (1p36.1, p = 7.12 × 10−5), with 3 genes located within the two previously identified risk loci at 5p15 and 15q25 [29]. We also screened transcriptomic data for differentially expressed genes with >2 fold changes (FC) between tumor and non-affected lung tissues of 18 pairs of FFPE samples (lung adenocarcinoma, including bronchioloalveolar carcinoma) obtained by surgical biopsy. Among 20,818 genes (29,378 probesets total), there were 75 genes with p < .001 (Supplementary Table S2), 252 genes with p < .005, 402 genes with p < .01, and 805 genes with p < .05. Top differentially expressed genes were PTPRB (p = 8.65 × 10−7), SEMA6A (p = 2.03 × 10−6), and PION (p = 4.83 × 10−6). We compared gene lists from GWAS analysis and transcriptomic profiling and identified 46 common genes with p < .05 in both analyses, which we called core genes (Table 2). Except for SEMA6A [GWAS: p = .004; transcriptome: FC = −4.4 (tumor/non-affected tissue), p = 2.03 × 10−6] and MYLK (GWAS: p = .009; transcriptome: FC = −3.5, p = 3.00 × 10−4), most genes were low-ranked in either GWAS or transcriptomic profiling but high-ranked in the other analysis.

Table 2

Common genes between GWAS and transcriptomic profiling (p < .05).

Gene	Chr	Gene ID	Description	Transcriptomea		GWAS
Gene	Chr	Gene ID	Description	Fold change	p	SNP	p
ALDH1A1	9	216	Aldehyde dehydrogenase 1 family, member A1	−2.1	0.01294	29	0.02264
ANGPTL2	9	23,452	Angiopoietin-like 2	−2.1	0.01364	5	0.00878
BBS4	15	585	Bardet-Biedl syndrome 4	−2.1	0.01240	5	0.04792
C1orf54	1	79,630	Chromosome 1 open reading frame 54	−2.2	0.01047	3	0.03904
CANX	5	821	Calnexin	−2.5	0.00366	1	0.04936
CCDC144A	17	9720	Coiled-coil domain containing 144A	−2.0	0.02563	1	0.04716
CLDN18	3	51,208	Claudin 18	−2.0	0.00403	7	0.03534
CRYZL1	21	9946	Crystallin, zeta (quinone reductase)-like 1	−2.3	0.03630	3	0.03622
CTSS	1	1520	Cathepsin S	−2.2	0.01058	2	0.00318
DDI2	1	84,301	DDI1, DNA-damage inducible 1, homolog 2 (S. cerevisiae)	−2.4	0.00707	2	0.00074
DGKH	13	160,851	Diacylglycerol kinase, eta	−2.5	0.01353	71	0.04530
DSG2	18	1829	Desmoglein 2	−2.3	0.00147	12	0.01370
EGFR	7	1956	Epidermal growth factor receptor [erythroblastic leukemia viral (v-erb-b) oncogene homolog, avian)	−2.4	0.01896	1	0.01781
EPAS1	2	2034	Endothelial PAS domain protein 1	−2.6	0.04683	35	0.00262
FOS	14	2353	V-fos FBJ murine osteosarcoma viral oncogene homolog	−2.9	0.02894	2	0.03475
GPR4	19	2828	G protein-coupled receptor 4	−2.2	0.02893	5	0.01226
GYPC	2	2995	Glycophorin C (Gerbich blood group)	−2.9	0.00198	23	0.03402
HIF1A	14	3091	Hypoxia-inducible factor 1, alpha subunit (basic helix-loop-helix transcription factor)	−2.0	0.01108	8	0.00307
HIST1H1A	6	3024	Histone cluster 1, H1a	2.1	0.00948	1	0.01148
HSDL1	16	83,693	Hydroxysteroid dehydrogenase like 1	2.1	0.02955	1	0.01549
IFT81	12	28,981	Intraflagellar transport 81 homolog (Chlamydomonas)	−2.2	0.04329	6	0.02961
KIAA1407	3	57,577	KIAA1407	−3.1	0.01488	6	0.00387
LEPR	1	3953	Leptin receptor	−2.0	0.02291	1	0.03057
LYVE1	11	10,894	Lymphatic vessel endothelial hyaluronan receptor 1	−2.5	0.02175	2	0.02959
MED31	17	51,003	Mediator complex subunit 31	−2.6	0.01772	1	0.02361
MS4A6A	11	64,231	Membrane-spanning 4-domains, subfamily A, member 6A	−2.0	0.04791	4	0.00202
MYLK	3	4638	Myosin, light chain kinase	−3.5	0.00032	1	0.00935
NAMPT	7	10,135	nicotinamide phosphoribosyltransferase	−2.1	0.01653	5	0.00979
NUP50	22	10,762	Nucleoporin 50 kDa	−2.6	0.00218	4	0.03813
PAAF1	11	80,227	Proteasomal ATPase-associated factor 1	−2.0	0.00515	8	0.00595
PACRG	6	135,138	PARK2 co-regulated	−2.0	0.00203	1	0.04082
PARP1	1	142	Poly (ADP-ribose) polymerase family, member 1	2.2	0.01837	8	0.02273
PDCD2	6	5134	Programmed cell death 2	−2.3	0.03616	3	0.04564
PLEKHB1	11	58,473	Pleckstrin homology domain containing, family B (evectins) member 1	−2.5	0.00524	5	0.04509
PRKAG2	7	51,422	Protein kinase, AMP-activated, gamma 2 non-catalytic subunit	−2.3	0.02595	1	0.03415
PRKCQ	10	5588	Protein kinase C, theta	−2.3	0.00363	83	0.02882
RBMS1	2	5937	RNA binding motif, single stranded interacting protein 1	−2.3	0.00746	26	0.01284
RRM2B	8	50,484	Ribonucleotide reductase M2 B (TP53 inducible)	−2.0	0.01162	10	0.02509
S1PR5	19	53,637	Sphingosine-1-phosphate receptor 5	−2.0	0.00713	2	0.01977
SEMA6A	5	57,556	Sema domain, transmembrane domain, and cytoplasmic domain, (semaphorin) 6A	−4.4	2.03E-06	60	0.00432
SMG1	16	23,049	PI-3-kinase-related kinase SMG-1	−2.1	0.01381	11	0.00747
TACC3	4	10,460	Transforming, acidic coiled-coil containing protein 3	2.2	0.03279	5	0.02719
TCIRG1	11	10,312	T-cell, immune regulator 1, ATPase, H+ transporting, lysosomal V0 subunit A3	−2.7	0.02007	1	0.02523
ZDHHC19	3	131,540	Zinc finger, DHHC-type containing 19	−2.6	0.01204	8	0.04905
ZNF274	19	10,782	Zinc finger protein 274	−3.1	0.00116	7	0.04410
ZRANB2	1	9406	Zinc finger, RAN-binding domain containing 2	−2.3	0.00502	4	0.00498

Fold change indicates difference between tumor tissue and non-affected adjacent tissue. P-values are from paired t-tests.

Common genes between GWAS and transcriptomic profiling (p < .05). Fold change indicates difference between tumor tissue and non-affected adjacent tissue. P-values are from paired t-tests.

Networks Built between GWAS and Transcriptomic Profiling Data

No significant network could be built from individual or combined lists of top-ranked genes from GWAS or transcriptomic profiling. Although no significant network could be built from the list of core genes (n = 46), small and simple networks could be built when core genes were combined with either GWAS or transcriptomic profiling data. Further, by combining top-ranked GWAS genes (103 genes with p < .005), top-ranked transcriptomic profiling genes (75 genes with p < .001), and core genes (37 non-overlapped genes out of 46 core genes), we could build a single large complex network (Fig. 1a). The process of integrated analysis is summarized in Supplementary Fig. S1.

Fig. 1

Networks built from GWAS and transcriptomic profiling data. a, Significant network built from combining top-ranked GWAS genes (103 genes with p < .005), top-ranked transcriptomic profiling genes (75 genes with p < .001), and a core list of genes (37 non-overlapping genes out of 46 core genes). Blue indicates genes only identified from transcriptomic profiling; green indicates genes only identified from GWAS; and red indicates genes from the core list. b, Network of 13 hub genes (≥5 connections within the network). Thirteen hub genes (≥5 connections within the network) that significantly contributed to complexity of the network were identified and were related to lung cancer, including FOS, EGFR, HDAC4, and TERT (Fig. 1b). Moreover, the network was centered on important genes belonging to the hypoxia-inducible factor (HIF) family, including hub genes HIF1A, ARNT (also known as HIF1B), and EPAS1 (also known as HIF2A), which are transcription factors that respond to changes in available oxygen in the cellular environment [33]. We therefore named this the HIFs-EGFR-HDAC4-TERT network. Similarly, using top-ranked genes from a GWAS dataset containing only lung adenocarcinomas (597 cases and 970 controls), we built a significant network from a combined list of GWAS genes (90 genes with p < .005), transcriptomic profiling genes (75 genes with p < .001), and core genes.

Network Evaluation by Random Gene Lists

To examine the possibility that significant networks were formed by random chance of the increased number of genes from combined gene lists, we conducted a series of network analyses using randomly generated lists with different numbers of genes (total list: n = 600) and evaluated the largest network built from each random list. As the number of genes for network building increased, we more frequently observed an increased number of genes and connections within networks, resulting in increased gene ratios. However, the complexity of these networks remained relatively unchanged (data not shown). Networks were better evaluated in a two-dimensional space of network complexity and gene ratio than any individual parameter we tested. As shown in Fig. 2, the large complex network built from combined gene lists was located far above from a 95% upper-tail conference interval (95% CI) curve, whereas most networks from individual gene lists or combinations of any two lists were located either under or around the 95% CI curve. A similar result was observed for analysis of top-ranked genes of GWAS including only lung adenocarcinomas (data not shown).

Fig. 2

Distribution of networks built by randomly selected genes. From a total of 24,847 genes, we randomly generated 6 sets of gene lists, with each set containing 100 gene lists. Each set contained 50, 100, 200, 300, 400, or 500 genes, denoted by R50, R100, R200, R300, R400, and R500, respectively. Networks were built from each gene list using MetaCore GeneGo database and were used to calculate a 95% CI curve. GWAS denotes network from GWAS data; FFPE denotes network from transcriptomic profiling data; and GWAS+FFPE+core denotes networks built from combining GWAS data, transcriptomic profiling data, and a core gene list. We further examined larger gene lists of GWAS and transcriptomic profiling data with a lower p-value cut-off. Networks for individual gene lists were all under the 95% CI curve for both GWAS (gene list: 232 genes, p < .01; network: complexity = 1.36, gene ratio = 0.10) and transcriptomic profiling (gene list: 252 genes, p < .005; network: complexity = 1.30, gene ratio = 0.08) data, and the network for a combination of two lists was just above the 95% CI curve (network: complexity = 1.38, gene ratio = 0.24). Among points outside the 95% CI, the results for the hypoxia network we identified had the highest gene ratio.

Network Validation by External Transcriptomic Data

An external transcriptomic dataset (GSE10072) was generated on a different platform (Affymetrix Human Genome U133A array) using 33 fresh-frozen pairs of tumor and non-affected tissues from NSCLC adenocarcinomas collected in the Lombardy region of Italy [23]. We identified 85 top-ranked, differentially expressed genes (FC ≥ 2; p < 10−16). Similar to the FFPE dataset, a significant network could only be built from combined lists of top-ranked GWAS genes, top-ranked transcriptomic profiling genes, and a core list of 29 non-overlapping genes (Supplementary Fig. S2). This network had a complexity and gene ratio comparable with that from the FFPE dataset and also had common hub genes (≥5 connections within the network) shared with the FFPE dataset (Table 3). Moreover, we identified 88 common, differentially expressed genes (FC ≥ 2; p < .05) between FFPE and GSE10072 datasets, with 85 genes (97%) having expression changes in the same direction between tumor and non-affected tissues (Supplementary Table S3). Although only a simple and small network could be built from this common transcriptomic list, a significant network with the same hub genes, including HIF1A, FOS, HDAC4, and EDN1, could be built by direct combination with the GWAS list (Table 3).

Table 3

Network evaluations.

	Network 1	Network 2	Network 3	Network 4
Total genes	215	198	217	191
Genomic list	GWAS	GWAS-AC	GWAS	GWAS
Transcriptomic list	FFPE	FFPE	GSE10072	Common of FFPE & GSE10072
Core list	37	33	29	−
Largest network
Genes	46	43	46	40
Connections	70	62	69	58
Complexity	1.556	1.476	1.533	1.487
Gene ratio	0.214	0.217	0.212	0.209
Hub gene of largest network
ARNT	+	−	+	−
BCL6	−	+	−	−
CDH5	−	−	+	−
EDN1	+	+	−	+
EGFR	+	+	−	−
ENO1	−	−	+	−
EPAS1	+	+	+	−
ERG	−	−	+	−
FOS	+	−	+	+
HDAC4	+	+	+	+
HIF1A	+	+	+	+
MMP12	−	−	−	+
PARP1	+	+	−	−
TERT	+	+	−	−

Network evaluations.

Hub Gene Validation by Sequencing and Meta-Analysis of Genotyping Data

As a member of the TRICL-ILCCO consortium, we submitted 13 hub genes as candidates for next-generation targeted and whole exome sequencing [34], which included 1059 NSCLC cases and 900 unrelated controls genetically enriched with young-onset or positive lung cancer family history from four sites (Supplementary Table S4): Harvard School of Public Health, International Agency for Research on Cancer, University of Liverpool, and Mount Sinai Hospital-Princess Margaret Hospital study in Toronto, Canada. Using the SKAT-O method to test combined effects of all common and rare single nucleotide variants (SNVs) within one gene [32], we found that EPAS1 (p = .0009) was significantly associated with NSCLC after adjusting for multiple comparisons by Bonferroni method (p = .05/13 genes = 0.0038). Further, there was no significant aggregation of variants with moderate to high functional impacts (http://useast.ensembl.org/info/genome/variation/predicted_data.html) in either NSCLC cases or controls in the other 12 hub genes (Supplementary Table S5). In EPAS1, sequencing identified 2061 SNVs, including 1617 rare/low-frequency SNVs [minor allele frequency (MAF) ≤ 0.01] and 36 SNVs with moderate–high functional impact. One common SNV (SNP: rs12614710) located within the first intron and identified by sequencing reached genome-wide significance (MAF = 0.45; OR = 1.50; 95% CI: 1.31–1.72; p = 7.75 × 10−9) (Fig. 3).

Fig. 3

Manhattan plot of EPAS1 in targeted sequencing project.

Manhattan plot of EPAS1 in targeted sequencing project. We further validated SNP rs12614710 in a much larger GWAS dataset using meta-analysis. A fixed-effect model was applied to estimate pooled effects of each SNP using the TRICL-ILCCO GWAS dataset, which included 13,479 lung cancer cases and 43,218 controls (Supplementary Table S6) [34]. Meta-analysis of SNP rs12614710 had a p-value of 0.03 (imputation accuracy: R2 = 0.86).

Discussion

In this study, we conducted an integrative analysis of multi-omic datasets of NSCLC to assess associations beyond the genome and transcriptome. A large, complex, and robust gene network containing well-known lung cancer-related genes, including EGFR and TERT, was identified for lung adenocarcinoma from the combined gene lists. However, the framework of this network was built by key members of the HIF gene family. Subsequent sequencing of network hub genes within a subset of consortium samples revealed a SNP (rs12614710) in EPAS1 associated with NSCLC that reached genome-wide significance based on whole exome sequencing data. Although this SNP was not covered in any GWAS dataset, we used imputed data to find that this SNP is borderline significant in the entire TRICL-ILCCO GWAS dataset. This discrepancy could be due to differential associations among genetically enriched individuals as those in the whole exome sequencing project. HIFs are a family of proteins that sense and respond to oxygen deficiency by acting as heterodimeric transcription factors that regulate expression of multiple genes involved in the adaptation and progression of cancer. Hypoxia is a typical cancer microenvironment, particularly in rapidly growing tumors, and activation of HIFs is the first step of tumor cells' adaptive responses to hypoxic surroundings [33]. HIFs are involved in every aspect of cancer development and progression, including cell proliferation, apoptosis, metabolism, immune responses, genomic instability, vascularization, invasion, and metastasis. HIFs consist of two subunits: an oxygen-sensitive α subunit, including HIF-1α (HIF1A), HIF-2α (EPAS1 or HIF2A), and HIF-3α (HIF3A) isoforms; and a ubiquitously expressed β subunit (HIF1B or ARNT). Hypoxic conditions result in HIF-α stabilization, nuclear translocation, and dimerization with HIF-1β to form the HIF transcription factor, which can bind to hypoxia-response elements (A/GCGTG consensus motif) in numerous target gene promoter regions [35]. HIFs are attractive therapeutic targets in cancer [33]. HIF-1α and HIF-2α are the predominant regulators of hypoxic responses at both cellular and organismal levels. Although they share highly conserved structural features, each isoform mediates a unique set of target genes and even oppositely influences some critical factors, such as c-Myc, p53, and nitric oxide [36]. Expression of HIF-2α has been identified in human lung cells, including type II pneumocytes and pulmonary endothelial cells, in response to hypoxia, as well as in epithelium and mesenchymal structures that give rise to the vascular endothelium [37]. Additional studies report that HIF-2α plays a vital role in malignant behavior. In murine models of lung cancer, increased tumor size, invasion, and angiogenesis correlate with high levels of HIF-2α expression cooperating with RAS [38]. Further, high levels of HIF-2α in NSCLC tumor tissue are associated with significantly poor patient prognosis [39,40]. However, in our transcriptomic data from FFPE samples, tumor tissue had low EPAS1 expression compared with non-affected tissue. A similar low EPAS1 expression profile was also observed in fresh-frozen samples. The most significant SNP (rs12614710, p = 7.75 × 10−9) of EPAS1 identified from sequencing was located in the first intron, and several adjacent SNPs within this intron had p-values of 10−5–10−7. Previous studies have reported associations of EPAS1 polymorphisms with development of renal cell carcinoma (rs11894252, p = 1.8 × 10−8; rs9679290, p = 5.75 × 10−8; rs4953346, p = 4.09 × 10−14) and prostate cancer [[40], [41], [42]]. In a small study of 346 NSCLC patients and 247 controls from a Japanese population, SNP rs4953354 was associated with increased risk of lung adenocarcinoma (OR = 1.80; 95% CI, 1.16–2.79; p = .008) [42]. In our TRICL-ILCCO GWAS dataset of 13,479 cases and 43,218 controls [34], all of these SNPs were significantly associated with NSCLC (rs11894252, p = .043; rs9679290, p = .0011; rs4953346, p = .0015; rs4953354, p = .025). All previously reported SNPs are located in the first intron, except for rs4953354, which is located in the third intron. In addition, some SNPs are not correlated [40], including rs12614710 from our sequencing project. Moreover, bioinformatic analyses using Genome Browser (http://genome.ucsc.edu) suggest that most of these first intron SNPs are located in histone mark H3K27Ac, which is defined by a ChIP-seq assay related to enhanced gene transcription [43]. Further, analysis of ChIP-seq datasets from ENCODE identified binding sites and binding activities for C/EBP-β, AP-1, and MYC families of transcription factors in many cancer cell types within the first intron of EPAS1. Further, the A allele of rs13419896 is associated with enhanced EPAS1 expression and poor prognosis of 76 NSCLC patients [44]. It is likely that genetic polymorphism of EPAS1 may lead to varied gene expression through either changes in binding sites and binding activities for certain transcription factors or modification of histone epigenetic regulation. In a study of chronic obstructive pulmonary disease, hypermethylation of EPAS1 is correlated with decreased EPAS1 expression and is significantly associated with disease severity [45]. Although GWAS has provided useful insights into the genetic architecture of complex diseases, there is weak evidence for how GWAS findings improve understanding of molecular pathways involved in disease, thus bringing post-GWAS challenges to the characterization of molecular data. Therefore, it is important to assess how diverse omic datasets at different biological levels can be integrated to exploit the full potential of information to identify causal genes and networks, regulatory genes and networks, and predictive markers for complex traits. Using direct interaction algorithms for network building, we successfully conducted an integrated study of multi-omic data for exploration beyond GWAS. This approach implemented a stringent criterion of only searching for direct gene–gene interactions within a manually curated database (MetaCore, https://portal.genego.com), while using less strict p-value cut-offs to select gene lists from different omic datasets. Thus, we could explore less significant genes, which often do not reach genome-wide significance, in omic datasets. The underlying hypothesis is that, because genes are highly networked and coordinated and do not act alone, polymorphisms of several genes in one biological process might not reach genome-wide significance. Initially, we could not find any meaningful networks (with relatively higher complexities and gene ratios) from gene lists selected from single omic datasets. This result is reasonable in that a list of top genes with small p-values is more likely to contain discrete genes, especially for transcriptomic data, as differentially large expressions were often downstream target genes in a transcription pathway and might not have direct interactions. Only combined gene lists from both GWAS and transcriptomic datasets plus a core list of common genes could build a large, complex HIFs-EGFR-HDAC4-TERT network. However, with even less stringent p-value cut-offs, we built a large network from genes selected from a single omic dataset. All such networks were simple, with the majority of networked genes only having a single connection to one or several hub genes, called supergenes. Connections of supergenes usually accounted for the majority of connections within a network, and removing supergenes often dramatically reduced numbers of networked genes or demolished the networks. To distinguish from supergene networks, we evaluated the HIFs-EGFR-HDAC4-TERT network by investigating networks built from 600 randomly selected gene lists of different sizes. We found that a gene network had to be evaluated by two factors: size and complexity. Network size was measured by gene ratio of number of networked genes to number of total genes used to build a network. Network complexity was measured by the ratio of total number of network connections to total number of networked genes. A supergene network always was large in size but low in complexity. The HIFs-EGFR-HDAC4-TERT network had a moderate size but high complexity without supergenes. From the distribution of network properties of 600 randomly selected gene lists plotted for network size against network complexity, the HIFs-EGFR-HDAC4-TERT network was above the 95% CI curve, suggesting that this network was not randomly built. We further validated the HIFs-EGFR-HDAC4-TERT network generated from multi-omic datasets by using different transcriptomic datasets. The GSE10072 dataset was from a study of gene expression signatures of cigarette smoking and its role in lung adenocarcinoma development and survival, and it contained 33 fresh-frozen pairs of tumor and non-affected tissues from NSCLC adenocarcinomas [23]. Combined with our GWAS dataset, which contained a majority of NSCLC adenocarcinomas, and a core gene list, we also built a network with network size and complexity comparable to the HIFs-EGFR-HDAC4-TERT network and several shared hub genes. Different combinations of the GWAS dataset, transcriptomic dataset, and core gene list provided similar results. We also analyzed a transcriptomic dataset of squamous cell carcinoma, including our 8 pairs of tumor and non-affected tissues from FFPE samples and 32 pairs of fresh-frozen samples from GSE18842 [24], and found that no complex networks except several supergene networks could be built (data not shown). This might be because a majority of our GWAS samples were lung adenocarcinomas (60.7%), and squamous cell carcinomas only accounted for ~22% of samples. Meanwhile, no identified network from squamous cell GWAS transcriptomic datasets suggests that our integrated approach of multi-omic data was sensitive to tumor histology. Sequencing of hub genes identified a new locus in EPAS1 that reached genome-wide significance and was validated in the largest lung cancer consortium, providing additional evidence that the HIFs-EGFR-HDAC4-TERT network is associated with NSCLC adenocarcinoma. Further, a recent study reported that EPAS1 could specifically bind to tyrosine kinase inhibitor (TKI)-resistant T790 M EGFR in NSCLC cell lines and enhance amplification of MET [46]. These findings suggest that EPAS1 is a key factor in EGFR-MET crosstalk in conferring TKI resistance in NSCLC cases and provide in vitro support of the HIFs-EGFR-HDAC4-TERT network. At present, FFPE samples, which represent the greatest stock of archived disease entities, are limited mainly to investigations of a small number of genes using quantitative real-time PCR or global micro-RNA profiling, which is much more stable than mRNA [47,48]. The main reason for this restriction is that RNA is often altered and degraded within FFPE samples from the impact of collection and storage [49,50]. In our study, from the top 808 differentially expressed genes from FFPE samples, we identified 88 common, differentially expressed genes between FFPE and the GSE10072 dataset, with 85 genes (97%) having expression changes in the same direction between tumor and non-affected tissues. This common gene list could be used as a transcriptomic list to build a network containing HIFs without incorporating a core gene list. These results demonstrate that FFPE samples could generate a transcriptomic profile for integrated analysis, as we found similar networks with shared hub genes as compared to fresh-frozen samples. During the analyses, we evaluated different network results by exploring different p-value cut offs for the selection of genes from different omics datasets. We didn't find any significant networks as all networks were under the 95% CI curve in random gene list evaluation. We noticed there was a limitation that we hadn't control this level of multiple comparisons. Currently, we didn't have a proper analytic method to adjust the multiple comparisons. Therefore, we used external data to evaluate the network results. Through integrated analysis and subsequent sequencing of the identified network, we identified a new locus associated with lung cancer risk. This locus is in hub gene EPAS1, which is a key member of the HIF family involved in every aspect of cancer development and progression. Because this locus has potential functions related to epigenetic regulation, the observation of low EPAS1 expression in tumor compared to non-affected tissues warrants additional functional studies to further illustrate interrelationships among genetic polymorphisms, DNA methylation status, and EPAS1 expression.

46 in total

1. Fine mapping of chromosome 5p15.33 based on a targeted deep sequencing and high density genotyping identifies novel lung cancer susceptibility loci.

Authors: Linda Kachuri; Christopher I Amos; James D McKay; Mattias Johansson; Paolo Vineis; H Bas Bueno-de-Mesquita; Marie-Christine Boutron-Ruault; Mikael Johansson; J Ramón Quirós; Sabina Sieri; Ruth C Travis; Elisabete Weiderpass; Loic Le Marchand; Brian E Henderson; Lynne Wilkens; Gary E Goodman; Chu Chen; Jennifer A Doherty; David C Christiani; Yongyue Wei; Li Su; Shelley Tworoger; Xuehong Zhang; Peter Kraft; David Zaridze; John K Field; Michael W Marcus; Michael P A Davies; Russell Hyde; Neil E Caporaso; Maria Teresa Landi; Gianluca Severi; Graham G Giles; Geoffrey Liu; John R McLaughlin; Yafang Li; Xiangjun Xiao; Gord Fehringer; Xuchen Zong; Robert E Denroche; Philip C Zuzarte; John D McPherson; Paul Brennan; Rayjean J Hung
Journal: Carcinogenesis Date: 2015-11-20 Impact factor: 4.944

2. Sequence kernel association tests for the combined effect of rare and common variants.

Authors: Iuliana Ionita-Laza; Seunggeun Lee; Vlad Makarov; Joseph D Buxbaum; Xihong Lin
Journal: Am J Hum Genet Date: 2013-05-16 Impact factor: 11.025

3. A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25.

Authors: Rayjean J Hung; James D McKay; Valerie Gaborieau; Paolo Boffetta; Mia Hashibe; David Zaridze; Anush Mukeria; Neonilia Szeszenia-Dabrowska; Jolanta Lissowska; Peter Rudnai; Eleonora Fabianova; Dana Mates; Vladimir Bencko; Lenka Foretova; Vladimir Janout; Chu Chen; Gary Goodman; John K Field; Triantafillos Liloglou; George Xinarianos; Adrian Cassidy; John McLaughlin; Geoffrey Liu; Steven Narod; Hans E Krokan; Frank Skorpen; Maiken Bratt Elvestad; Kristian Hveem; Lars Vatten; Jakob Linseisen; Françoise Clavel-Chapelon; Paolo Vineis; H Bas Bueno-de-Mesquita; Eiliv Lund; Carmen Martinez; Sheila Bingham; Torgny Rasmuson; Pierre Hainaut; Elio Riboli; Wolfgang Ahrens; Simone Benhamou; Pagona Lagiou; Dimitrios Trichopoulos; Ivana Holcátová; Franco Merletti; Kristina Kjaerheim; Antonio Agudo; Gary Macfarlane; Renato Talamini; Lorenzo Simonato; Ray Lowry; David I Conway; Ariana Znaor; Claire Healy; Diana Zelenika; Anne Boland; Marc Delepine; Mario Foglio; Doris Lechner; Fumihiko Matsuda; Helene Blanche; Ivo Gut; Simon Heath; Mark Lathrop; Paul Brennan
Journal: Nature Date: 2008-04-03 Impact factor: 49.962

4. A genome-wide association study of lung cancer identifies a region of chromosome 5p15 associated with risk for adenocarcinoma.

Authors: Maria Teresa Landi; Nilanjan Chatterjee; Kai Yu; Lynn R Goldin; Alisa M Goldstein; Melissa Rotunno; Lisa Mirabello; Kevin Jacobs; William Wheeler; Meredith Yeager; Andrew W Bergen; Qizhai Li; Dario Consonni; Angela C Pesatori; Sholom Wacholder; Michael Thun; Ryan Diver; Martin Oken; Jarmo Virtamo; Demetrius Albanes; Zhaoming Wang; Laurie Burdette; Kimberly F Doheny; Elizabeth W Pugh; Cathy Laurie; Paul Brennan; Rayjean Hung; Valerie Gaborieau; James D McKay; Mark Lathrop; John McLaughlin; Ying Wang; Ming-Sound Tsao; Margaret R Spitz; Yufei Wang; Hans Krokan; Lars Vatten; Frank Skorpen; Egil Arnesen; Simone Benhamou; Christine Bouchard; Andres Metspalu; Andres Metsapalu; Tonu Vooder; Mari Nelis; Kristian Välk; John K Field; Chu Chen; Gary Goodman; Patrick Sulem; Gudmar Thorleifsson; Thorunn Rafnar; Timothy Eisen; Wiebke Sauter; Albert Rosenberger; Heike Bickeböller; Angela Risch; Jenny Chang-Claude; H Erich Wichmann; Kari Stefansson; Richard Houlston; Christopher I Amos; Joseph F Fraumeni; Sharon A Savage; Pier Alberto Bertazzi; Margaret A Tucker; Stephen Chanock; Neil E Caporaso
Journal: Am J Hum Genet Date: 2009-10-15 Impact factor: 11.025

5. Constitutional CHEK2 mutations are associated with a decreased risk of lung and laryngeal cancers.

Authors: Cezary Cybulski; Bartlomiej Masojc; Dorota Oszutowska; Ewa Jaworowska; Tomasz Grodzki; Piotr Waloszczyk; Piotr Serwatowski; Juliusz Pankowski; Tomasz Huzarski; Tomasz Byrski; Bohdan Górski; Anna Jakubowska; Tadeusz Debniak; Dominika Wokolorczyk; Jacek Gronwald; Czeslawa Tarnowska; Pablo Serrano-Fernández; Jan Lubinski; Steven A Narod
Journal: Carcinogenesis Date: 2008-02-14 Impact factor: 4.944

6. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls.

Authors: Nick Craddock; Matthew E Hurles; Niall Cardin; Richard D Pearson; Vincent Plagnol; Samuel Robson; Damjan Vukcevic; Chris Barnes; Donald F Conrad; Eleni Giannoulatou; Chris Holmes; Jonathan L Marchini; Kathy Stirrups; Martin D Tobin; Louise V Wain; Chris Yau; Jan Aerts; Tariq Ahmad; T Daniel Andrews; Hazel Arbury; Anthony Attwood; Adam Auton; Stephen G Ball; Anthony J Balmforth; Jeffrey C Barrett; Inês Barroso; Anne Barton; Amanda J Bennett; Sanjeev Bhaskar; Katarzyna Blaszczyk; John Bowes; Oliver J Brand; Peter S Braund; Francesca Bredin; Gerome Breen; Morris J Brown; Ian N Bruce; Jaswinder Bull; Oliver S Burren; John Burton; Jake Byrnes; Sian Caesar; Chris M Clee; Alison J Coffey; John M C Connell; Jason D Cooper; Anna F Dominiczak; Kate Downes; Hazel E Drummond; Darshna Dudakia; Andrew Dunham; Bernadette Ebbs; Diana Eccles; Sarah Edkins; Cathryn Edwards; Anna Elliot; Paul Emery; David M Evans; Gareth Evans; Steve Eyre; Anne Farmer; I Nicol Ferrier; Lars Feuk; Tomas Fitzgerald; Edward Flynn; Alistair Forbes; Liz Forty; Jayne A Franklyn; Rachel M Freathy; Polly Gibbs; Paul Gilbert; Omer Gokumen; Katherine Gordon-Smith; Emma Gray; Elaine Green; Chris J Groves; Detelina Grozeva; Rhian Gwilliam; Anita Hall; Naomi Hammond; Matt Hardy; Pile Harrison; Neelam Hassanali; Husam Hebaishi; Sarah Hines; Anne Hinks; Graham A Hitman; Lynne Hocking; Eleanor Howard; Philip Howard; Joanna M M Howson; Debbie Hughes; Sarah Hunt; John D Isaacs; Mahim Jain; Derek P Jewell; Toby Johnson; Jennifer D Jolley; Ian R Jones; Lisa A Jones; George Kirov; Cordelia F Langford; Hana Lango-Allen; G Mark Lathrop; James Lee; Kate L Lee; Charlie Lees; Kevin Lewis; Cecilia M Lindgren; Meeta Maisuria-Armer; Julian Maller; John Mansfield; Paul Martin; Dunecan C O Massey; Wendy L McArdle; Peter McGuffin; Kirsten E McLay; Alex Mentzer; Michael L Mimmack; Ann E Morgan; Andrew P Morris; Craig Mowat; Simon Myers; William Newman; Elaine R Nimmo; Michael C O'Donovan; Abiodun Onipinla; Ifejinelo Onyiah; Nigel R Ovington; Michael J Owen; Kimmo Palin; Kirstie Parnell; David Pernet; John R B Perry; Anne Phillips; Dalila Pinto; Natalie J Prescott; Inga Prokopenko; Michael A Quail; Suzanne Rafelt; Nigel W Rayner; Richard Redon; David M Reid; Susan M Ring; Neil Robertson; Ellie Russell; David St Clair; Jennifer G Sambrook; Jeremy D Sanderson; Helen Schuilenburg; Carol E Scott; Richard Scott; Sheila Seal; Sue Shaw-Hawkins; Beverley M Shields; Matthew J Simmonds; Debbie J Smyth; Elilan Somaskantharajah; Katarina Spanova; Sophia Steer; Jonathan Stephens; Helen E Stevens; Millicent A Stone; Zhan Su; Deborah P M Symmons; John R Thompson; Wendy Thomson; Mary E Travers; Clare Turnbull; Armand Valsesia; Mark Walker; Neil M Walker; Chris Wallace; Margaret Warren-Perry; Nicholas A Watkins; John Webster; Michael N Weedon; Anthony G Wilson; Matthew Woodburn; B Paul Wordsworth; Allan H Young; Eleftheria Zeggini; Nigel P Carter; Timothy M Frayling; Charles Lee; Gil McVean; Patricia B Munroe; Aarno Palotie; Stephen J Sawcer; Stephen W Scherer; David P Strachan; Chris Tyler-Smith; Matthew A Brown; Paul R Burton; Mark J Caulfield; Alastair Compston; Martin Farrall; Stephen C L Gough; Alistair S Hall; Andrew T Hattersley; Adrian V S Hill; Christopher G Mathew; Marcus Pembrey; Jack Satsangi; Michael R Stratton; Jane Worthington; Panos Deloukas; Audrey Duncanson; Dominic P Kwiatkowski; Mark I McCarthy; Willem Ouwehand; Miles Parkes; Nazneen Rahman; John A Todd; Nilesh J Samani; Peter Donnelly
Journal: Nature Date: 2010-04-01 Impact factor: 49.962

7. Genome-wide association study of renal cell carcinoma identifies two susceptibility loci on 2p21 and 11q13.3.

Authors: Mark P Purdue; Mattias Johansson; Diana Zelenika; Jorge R Toro; Ghislaine Scelo; Lee E Moore; Egor Prokhortchouk; Xifeng Wu; Lambertus A Kiemeney; Valerie Gaborieau; Kevin B Jacobs; Wong-Ho Chow; David Zaridze; Vsevolod Matveev; Jan Lubinski; Joanna Trubicka; Neonila Szeszenia-Dabrowska; Jolanta Lissowska; Péter Rudnai; Eleonora Fabianova; Alexandru Bucur; Vladimir Bencko; Lenka Foretova; Vladimir Janout; Paolo Boffetta; Joanne S Colt; Faith G Davis; Kendra L Schwartz; Rosamonde E Banks; Peter J Selby; Patricia Harnden; Christine D Berg; Ann W Hsing; Robert L Grubb; Heiner Boeing; Paolo Vineis; Françoise Clavel-Chapelon; Domenico Palli; Rosario Tumino; Vittorio Krogh; Salvatore Panico; Eric J Duell; José Ramón Quirós; Maria-José Sanchez; Carmen Navarro; Eva Ardanaz; Miren Dorronsoro; Kay-Tee Khaw; Naomi E Allen; H Bas Bueno-de-Mesquita; Petra H M Peeters; Dimitrios Trichopoulos; Jakob Linseisen; Börje Ljungberg; Kim Overvad; Anne Tjønneland; Isabelle Romieu; Elio Riboli; Anush Mukeria; Oxana Shangina; Victoria L Stevens; Michael J Thun; W Ryan Diver; Susan M Gapstur; Paul D Pharoah; Douglas F Easton; Demetrius Albanes; Stephanie J Weinstein; Jarmo Virtamo; Lars Vatten; Kristian Hveem; Inger Njølstad; Grethe S Tell; Camilla Stoltenberg; Rajiv Kumar; Kvetoslava Koppova; Olivier Cussenot; Simone Benhamou; Egbert Oosterwijk; Sita H Vermeulen; Katja K H Aben; Saskia L van der Marel; Yuanqing Ye; Christopher G Wood; Xia Pu; Alexander M Mazur; Eugenia S Boulygina; Nikolai N Chekanov; Mario Foglio; Doris Lechner; Ivo Gut; Simon Heath; Hélène Blanche; Amy Hutchinson; Gilles Thomas; Zhaoming Wang; Meredith Yeager; Joseph F Fraumeni; Konstantin G Skryabin; James D McKay; Nathaniel Rothman; Stephen J Chanock; Mark Lathrop; Paul Brennan
Journal: Nat Genet Date: 2010-12-05 Impact factor: 38.330

8. Mapping and analysis of chromatin state dynamics in nine human cell types.

Authors: Jason Ernst; Pouya Kheradpour; Tarjei S Mikkelsen; Noam Shoresh; Lucas D Ward; Charles B Epstein; Xiaolan Zhang; Li Wang; Robbyn Issner; Michael Coyne; Manching Ku; Timothy Durham; Manolis Kellis; Bradley E Bernstein
Journal: Nature Date: 2011-03-23 Impact factor: 49.962

Review 9. Systematic review of the relationship between family history and lung cancer risk.

Authors: A Matakidou; T Eisen; R S Houlston
Journal: Br J Cancer Date: 2005-10-03 Impact factor: 7.640

10. Integrative analysis of DNA methylation and gene expression data identifies EPAS1 as a key regulator of COPD.

Authors: Seungyeul Yoo; Sachiko Takikawa; Patrick Geraghty; Carmen Argmann; Joshua Campbell; Luan Lin; Tao Huang; Zhidong Tu; Robert F Foronjy; Robert Feronjy; Avrum Spira; Eric E Schadt; Charles A Powell; Jun Zhu
Journal: PLoS Genet Date: 2015-01-08 Impact factor: 5.917

16 in total

1. EPAS1 promotes peritoneal carcinomatosis of non-small-cell lung cancer by enhancing mesothelial-mesenchymal transition.

Authors: Qiang Zhen; Yaxiao Zhang; Lina Gao; Renfeng Wang; Weiwei Chu; Xiaojian Zhao; Zhe Li; Huixian Li; Bing Zhang; Baolei Lv; Jiabao Liu
Journal: Strahlenther Onkol Date: 2020-07-17 Impact factor: 3.621

2. Multi-Omics Integrative Analysis of Lung Adenocarcinoma: An in silico Profiling for Precise Medicine.

Authors: Xinjia Ruan; Yuqing Ye; Wenxuan Cheng; Li Xu; Mengjia Huang; Yi Chen; Junkai Zhu; Xiaofan Lu; Fangrong Yan
Journal: Front Med (Lausanne) Date: 2022-06-03

3. Clonal Hematopoiesis Mutations in Patients with Lung Cancer Are Associated with Lung Cancer Risk Factors.

Authors: Wei Hong; Ang Li; Yanhong Liu; Xiangjun Xiao; David C Christiani; Rayjean J Hung; James McKay; John Field; Christopher I Amos; Chao Cheng
Journal: Cancer Res Date: 2021-11-23 Impact factor: 13.312

4. Decreased HLF Expression Predicts Poor Survival in Lung Adenocarcinoma.

Authors: Zaiyan Wang; Xiaoning Li; Hao Chen; Li Han; Xiaobin Ji; Qiubo Wang; Li Wei; Yafang Miao; Jing Wang; Jianfeng Mao; Zeming Zhang
Journal: Med Sci Monit Date: 2021-05-12

5. Postdiagnosis BMI Change Is Associated with Non-Small Cell Lung Cancer Survival.

Authors: Qianyu Yuan; Mulong Du; Elizabeth Loehrer; Bruce E Johnson; Justin F Gainor; Michael Lanuti; Yi Li; David C Christiani
Journal: Cancer Epidemiol Biomarkers Prev Date: 2021-11-02 Impact factor: 4.090

6. Gene set enrichment analysis and meta-analysis identified 12 key genes regulating and controlling the prognosis of lung adenocarcinoma.

Authors: Wenwu He; Liangmin Fu; Qunlun Yan; Qiuxi Zhou; Kun Yuan; Linxin Chen; Yongtao Han
Journal: Oncol Lett Date: 2019-04-09 Impact factor: 2.967

7. Hypoxia-sensitive LINC01436 is regulated by E2F6 and acts as an oncogene by targeting miR-30a-3p in non-small cell lung cancer.

Authors: Shuai Yuan; Ying Xiang; Guilu Wang; Meiyu Zhou; Gang Meng; Qingyun Liu; Zeyao Hu; Chengying Li; Weijia Xie; Na Wu; Long Wu; Tongjian Cai; Xiangyu Ma; Yao Zhang; Zubin Yu; Li Bai; Yafei Li
Journal: Mol Oncol Date: 2019-01-30 Impact factor: 6.603

8. FOX-A1 contributes to acquisition of chemoresistance in human lung adenocarcinoma via transactivation of SOX5.

Authors: Dongqin Chen; Rui Wang; Chen Yu; Fei Cao; Xuefeng Zhang; Feng Yan; Longbang Chen; Hong Zhu; Zhengyuan Yu; Jifeng Feng
Journal: EBioMedicine Date: 2019-05-27 Impact factor: 8.143

Review 9. Integrated Multi-Omics Analyses in Oncology: A Review of Machine Learning Methods and Tools.

Authors: Giovanna Nicora; Francesca Vitali; Arianna Dagliati; Nophar Geifman; Riccardo Bellazzi
Journal: Front Oncol Date: 2020-06-30 Impact factor: 6.244

10. Integrated transcriptomics reveals master regulators of lung adenocarcinoma and novel repositioning of drug candidates.

Authors: Marco Antônio De Bastiani; Fábio Klamt
Journal: Cancer Med Date: 2019-09-10 Impact factor: 4.452