Literature DB >> 15653641

Conserved transcription factor binding sites of cancer markers derived from primary lung adenocarcinoma microarrays.

Yee Leng Yap1, David C L Lam, Girard Luc, Xue Wu Zhang, David Hernandez, Robin Gras, Elaine Wang, S W Chiu, Lap Ping Chung, W K Lam, David K Smith, John D Minna, Antoine Danchin, Maria P Wong.   

Abstract

Gene transcription in a set of 49 human primary lung adenocarcinomas and 9 normal lung tissue samples was examined using Affymetrix GeneChip technology. A total of 3442 genes, called the set M AD, were found to be either up- or down-regulated by at least 2-fold between the two phenotypes. Genes assigned to a particular gene ontology term were found, in many cases, to be significantly unevenly distributed between the genes in and outside M AD. Terms that were overrepresented in M AD included functions directly implicated in the cancer cell metabolism. Based on their functional roles and expression profiles, genes in M AD were grouped into likely co-regulated gene sets. Highly conserved sequences in the 5 kb region upstream of the genes in these sets were identified with the motif discovery tool, MoDEL. Potential oncogenic transcription factors and their corresponding binding sites were identified in these conserved regions using the TRANSFAC 8.3 database. Several of the transcription factors identified in this study have been shown elsewhere to be involved in oncogenic processes. This study searched beyond phenotypic gene expression profiles in cancer cells, in order to identify the more important regulatory transcription factors that caused these aberrations in gene expression.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 15653641      PMCID: PMC546166          DOI: 10.1093/nar/gki188

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The transformation of normal lung tissue into lung adenocarcinomas involves, among other characteristic features, a hallmark process by which the cell loses control of its replication process (an accelerated cell cycle) (1). Adenocarcinomas have a high incidence of fatality in patients in US, and a similar trend is developing in other countries (2). At present, lung cancer studies generally incorporate two main objectives: providing an early and sensitive diagnosis, and trying to understand the molecular basis underlying the disease formation. Recently, the availability of the human genome sequence (3) and gene expression profiling techniques (4) have provided new insights, narrowing the gap to achieve these objectives. The challenges that lie ahead include systematically identifying the functions of all cancer associated genes, and continuing the efforts to decipher their regulatory networks. This information will provide a much deeper understanding of the mechanism of cancer cell formation and development, and assist in the identification of potent therapeutic targets for disease control and eradication. Computational methods that are employed to identify cancer associated genes from megabytes of noisy microarray data still require further development. Data normalization procedures may have an important effect on the succeeding downstream data analysis (5–8). Using human housekeeping genes as the least variable set of gene expression profiles is one accepted method (9). Many computational methods have been introduced to determine marker genes for cancer from gene expression datasets (10,11). These methodologies aim to stratify samples into tissue classes or phenotypes based on the ability of sets of differentially regulated genes to discriminate among the samples. Methods such as recursive partitioning (12), expression ratio analysis (13), principal component analysis (14), partial least squares (15), and independent component analysis (16) have been used to identify the minimum set of genes that can achieve this classification. However, the usually small number (tens) of (tissue) samples per class and the large number (tens of thousands) of features (genes) in these datasets cast doubt on the statistical significance of genes identified as discriminating between normal or cancer tissues or cancer subtypes. The effects on the detection of cancer marker genes due to these constraints, which can lead to genes being classified as markers by chance, have been investigated (17). Recently, the use of computational methods to identify regulatory elements has become increasingly important (18). This is partly because the alternative of experimental determination of cis-regulatory elements can be inaccurate, and is often slow and laborious (19). A common way to analyze regulatory relationships among genes using microarray data is to cluster the genes, based on their expression profiles, into sets of putatively co-regulated genes. This assumes that co-regulated genes are likely to have cis-regulatory elements in common (20). However, searching for common sequence signals in genomic regions near these genes can lead to the detection of spurious cis-regulatory elements, as many genes may show similar expression profiles for reasons other than co-regulation (20). Many studies have shown that biologically relevant cis-regulatory elements often occur in groups (21,22). Following this rationale, conserved regulatory motifs correlated to gene expression were discovered by fitting a linear regression model to the expression arrays from Saccharomyces cerevisiae (23) and an extension of this technique was used to identify binding motifs of the transcription factors ROX1p and YAP1p (24). In this work, we performed a microarray based study of a set of normal lung tissues and a set of primary lung adenocarcinomas. Our aims were, first, to distinguish the broadest set of genes (MAD) that showed differential expression levels across the two tissue types and investigate the correlation of their gene expression profiles with the tissue type. Second, we wished to examine the division of genes with the same functional annotation between the MAD set and the remaining genes on the microarray to find functional groups disproportionately represented in MAD. Finally, we attempted to identify the transcription factors, as well as their corresponding binding sites, which regulate the observed expression differences of the genes in the MAD set. The rationale for the first two aims was that, we could make use of the knowledge accumulated by scientists on genes in the MAD set, by using functional annotations assigned through Gene Ontology terms, to investigate the nature of the biological processes that were actually perturbed in cancer cells. It was expected that some functional classes would preferentially be found in the MAD gene set. Instead of clustering genes based solely on their expression profiles, genes were first selected by sharing a gene ontology term and then clustered by an expression profile. The reasoning behind this was that genes with the same function and similar expression profiles were more likely to be under the same regulatory control than genes with differing functions but similar expression profiles. ‘In biblio’ analysis of genes' neighborhoods has been long advocated as an efficient means to permit inductive reasoning by using the knowledge accumulated by the worldwide community of researchers (25). A motif finding algorithm developed by us, MoDEL (26), was used to discover highly conserved DNA regions associated with the genes in a cluster, before these sequences were scanned against the TRANSFAC 8.3 database to detect plausible oncogenic transcription factor binding sites.

MATERIALS AND METHODS

Primary lung adenocarcinoma dataset

Tissue samples for the complete cohort of this study were collected, with informed consent, by the Department of Pathology, The University of Hong Kong, Queen Mary Hospital, Pokfulam, Hong Kong. A total of 58 patients gave samples with normal lung tissue (n = 9) and primary lung adenocarcinomas (n = 49). Identifier code numbers were assigned to each tissue sample and its correlated clinical data. The link between the code numbers and all patient identifiers was destroyed, rendering the samples and clinical data completely anonymous. Clinical data from hospital records included the age and sex of the patient, smoking history, type of resection, post-operative pathological staging, post-operative histopathological diagnosis, patient survival information, time of last follow-up interval or time of death (when known), and site of disease recurrence (when known). Information for the entire dataset is provided as Supplementary Material at . It is noted that the numbers do not always add to 58, as complete information could not be found for all samples. The gender composition of the cohort was 25 males and 33 females. The reported smoking history of the patients was 24 non-smokers, 10 smoking at least 40 packs per year, seven ex-smokers and nine passive smokers. Post-operative pathological staging of these samples revealed 26 stage I, 8 stage II, 14 stage III and 1 stage IV tumors. Tissue samples were snap-frozen in liquid nitrogen within 30 min after dissection and kept at −70°C until use. Tumor samples were examined before use to ensure at least 70% of tumor by area. RNA was extracted following standard protocols and hybridized to Affymetrix HG-U133A GeneChips. Expression values from a total of 22 283 transcript probe sets were collected using Affymetrix scanners and analysis software (Microarray Suite 5.0.1). The raw dataset is publicly available at ArrayExpress (public repository for microarray data accession number: E-MEXP-231) (27,28); or can be downloaded at .

Data re-scaling and feature selection

The raw expression data from each sample was rescaled (normalized) to account for systematic differences in signal intensities among the microarrays, using standard procedures in Affymetrix Microarray Suite 5.0.1. Expression values from each microarray were multiplied by a scaling factor to make the average intensity of a set of house keeping genes on each microarray equal to an arbitrarily defined target intensity of 500. To identify genes that are tissue phenotype related, the mean expression level of all genes in normal tissues and in adenocarcinoma tissues were calculated. If the ratio of the average expression levels of a gene between the two tissue classes exceeded 2-fold, the genes were included in the set MAD.

Gene to tissue correlation

The tissue type distinction is represented by an idealized expression pattern (a vector with size 1 × 58), in which the expression is labeled uniformly high (value = 1) in adenocarcinoma tissue type and labeled uniformly low (value = 0) in normal tissue class. Correlation coefficients were calculated for the comparison of this vector with the expression profiles of each gene in MAD. The distribution of correlation coefficients was counted in bins of 0.2. The result was compared to the corresponding distribution obtained for ten random permutations of the idealized tissue labels to give the average random correlation coefficients for each gene (Figure 1).
Figure 1

Histogram of the cancer associated genes (MAD) correlation to the tissue labels (normal or lung adenocarcinomas). The average histograms generated from 10 separate random permutations of the cancer labels in the original lung adenocarcinoma dataset is also displayed.

Determination of overrepresentation of gene ontology terms in the set MAD

GeneOntology () terms, which classify a gene according to its molecular function, biological process, cellular component and chromosomal localization, were collected for each gene on the Affymetrix HG-U133A microarray from the Affymetrix library files. By using the hypergeometric distribution (Equation 1), genes with each of these functional annotations could be assessed to see if they are overrepresented in the set MAD. Given G annotated genes on a microarray, of which A have a certain function (gene ontology term), and a set of k genes selected independently of the functional annotations (MAD), the probability that n or more of the set of k genes have this function can be calculated by Equation 1 (23). If the P-value of observing the number of genes with a particular gene ontology term in the set MAD was <0.001, the term was considered to be significantly overrepresented in the set MAD. DNA-Chip Analyzer (dChip) (29) was used to perform this task.

Constructing gene relationship trees for overrepresented gene ontology terms

For all possible combinations of gene pairs that belong to each gene ontology term overrepresented in MAD the correlation coefficient, r, of their expression profiles was calculated. A pairwise gene distance matrix Mdistance, using the distance 1-r was formed for the genes. The neighbor-joining algorithm (NJ) (30) was used to construct a gene relationship tree from pairwise gene distance matrix. This was performed to identify gene neighbors whose expression values followed a common trend. The NJ algorithm is a special case of the star decomposition method. Starting from a star tree, the final relationship tree is constructed systematically by linking the least distant pair of nodes (genes in this case). The main advantage of the algorithm is that it permits lineages with largely different branch lengths. The programming script for computing r was implemented in the MatLab technical programming language and the tree was calculated using MEGA2 (31).

Extraction of the upstream regions for putatively co-regulated gene sets

Putatively co-regulated genes from each gene ontology term that was overrepresented in MAD were selected in accordance with two criteria: (i) a distance metric cutoff value (d < 0.20) for all pairwise gene distances within the selected N members of the gene set; and (ii) the minimum mean aggregated pairwise distances [] for the selected N members of the gene set. The rationale for choosing these criteria was to find a single most correlated gene cluster that minimizes the total branch length d. For instance, if there are two gene clusters (each constituted of four and five gene members, respectively) in the tree topology found to be satisfying criterion one, i.e. get sets in which all pairwise gene distances (4C2 = 6 and 5C2 = 10 distances, respectively) satisfy the distance metric cutoff value <0.2, the final gene set selected should be the one with the minimum mean aggregated pairwise distances (criterion two). As a result, a different numbers of genes will be selected from each gene ontology term based on these criteria. For each of the selected genes, the corresponding 5 kb region located directly upstream of the transcription start site was extracted as described previously (32). Several sequence features including sequence gaps, continuity, consistency between the two distinct drafts of human genomes (3,33,34) were taken into consideration. Detailed information can be found in (32).

Identification of conserved regions and detection of associated transcription factors

All 5 kb unaligned DNA sequences associated with each gene ontology term group overrepresented in MAD, were searched using MoDEL (26), to reveal possible highly conserved DNA regions. MoDEL employs an evolutionary algorithm and hill-climbing optimization for global and local exploration of two targeted search spaces, respectively (all possible words and all possible ungapped local multiple alignments). This heuristic algorithm has been shown to have more efficient optimization capabilities than other motif discovery tools (26). The word size was set to be 50 bp in the present study because we found that the conserved regions identified by MoDEL remained rather consistent with different sizes of word or segment length. A 50 bp segment length (the longest implemented in MoDEL) also allows a larger window, whereby the most conserved motifs can be captured together with their less similar surrounding residues. The information content for all conserved regions identified was calculated based on the Kullback–Leibler divergence (relative entropy). All conserved regions identified by MoDEL were scanned against all vertebrate transcription factor position weight matrix profiles contained in the TRANSFAC database version 8.3 (35) to identify all previously known transcription binding sites. To incorporate stronger matches of transcription factor binding sites, stringent settings for the Match program (36) were employed. Both the core matrix and overall matrix similarity were required to be least 0.9 to be considered a match.

RESULTS

Selection of the cancer associated gene set MAD

A total of 3442 genes were found to be either up- or down-regulated by more than 2-fold between the normal and adenocarcinoma tissue sets (Table 1). These genes formed the cancer associated gene set MAD. Of these genes, 1294 showed down-regulation and 2148 showed up-regulation of gene expression levels in adenocarcinomas. At the extreme ends of the fold change range, the receptor for advanced glycation end product (RAGE) was found to be repressed by >32-fold in adenocarcinomas while the D G antigen (GAGED2) was found to be up-regulated by >128-fold. Real-time quantitative RT–PCR analysis (Supplementary Materials) to verify the mRNA transcript levels for carbonic anhydrase IV (CA4) and RAGE were performed in 14 independent tissue samples (seven samples from each tissue phenotype). The abundance of mRNA transcripts for both genes was extremely low in the adenocarcinoma samples. If a gene is not expressed or expressed at very low levels in a sample, then fold change values may become large due to the low denominator. Fold change values must be considered in conjunction with expression levels.
Table 1

Genes that were identified to be down- or up-regulated in adenocarcinomas

Gene description (Gene down-regulated in lung AD)Probe setFold log(AD/N)Mean expression for normal lungMean expression for AD lung
Consensus sequence for Homo sapiens mRNA for receptor for Advanced Glycation End Product (RAGE)217046_s_at−5.523942.8220.51
Homo sapiens fatty acid binding protein 4, adipocyte (FABP4)203980_at−4.7683365.42123.48
Human alpha-globin gene with flanks217414_x_at−4.4199787.41457.42
Homo sapiens mRNA; cDNA DKFZp564N0582 (from clone DKFZp564N0582)209074_s_at−4.294678.2834.58
Homo sapiens carbonic anhydrase IV (CA4)206208_at−4.276275.7814.24
Homo sapiens RAGE mRNA for advanced glycation endproducts receptor, whole CDS210081_at−4.2611593.0883.06
Homo sapiens ficolin (collagen fibrinogen domain-containing) 3 (Hakata antigen) (FCN3)205866_at−4.1661790.3399.70
Human sickle cell beta-globin mRNA209116_x_at−4.15514733.26827.29
Consensus includes gb:BF939489209469_at−4.028330.6820.27
Homo sapiens hemoglobin, gamma A (HBG1)204848_x_at−3.922264.7917.47
Homo sapiens adipose specific 2 (APM2)203571_s_at−3.8983042.43204.08
Homo sapiens hypothetical protein FLJ10970 (FLJ10970)219230_at−3.8841521.30103.06
Consensus includes gb:T50399/UG=Hs.251577 hemoglobin, alpha 1214414_x_at−3.87413447.88917.38
Homo sapiens colony stimulating factor 3 (granulocyte) (CSF3)207442_at−3.873145.319.92
Homo sapiens mutant beta-globin (HBB) gene217232_x_at−3.86415087.911036.22
Gene description (Gene up-regulated in lung AD)Probe SetFold log(AD/N)
Homo sapiens XAGE-1 protein (XAGE-1)220057_at7.3114.79760.58
Human alpha-1 type XI collagen (COL11A1)37892_at6.2086.10451.07
Consensus includes gb:AI697108;/UG=Hs.102482 mucin 5, subtype B, tracheobronchial213432_at6.1924.84354.11
Homo sapiens dipeptidyl peptidase IV (DPP4)203716_s_at5.9326.81415.78
Consensus includes gb:AU159942;/UG=Hs.156346 topoisomerase (DNA) II alpha (170 kDa)201291_s_at5.6203.24159.61
Homo sapiens serine protease inhibitor, Kazal type 1 (SPINK1);/UG=Hs.181286 serine protease inhibitor, Kazal type 1206239_s_at5.27451.742002.30
Consensus includes gb:X98568;/UG=Hs.179729 collagen, type X, alpha 1 (Schmid metaphyseal chondrodysplasia)217428_s_at4.99121.98698.84
Consensus includes gb:AW192795;/UG=Hs.103707 apomucin214303_x_at4.9695.26164.65
Human nephropontin mRNA;/UG=Hs.313 secreted phosphoprotein 1 (osteopontin, bone sialoprotein I, early T-lymphocyte activation 1)209875_s_at4.851121.293499.34
Homo sapiens matrix metalloproteinase 1 (interstitial collagenase) (MMP1)204475_at4.80625.07701.09
Homo sapiens neuromedin U (NMU)206023_at4.7765.80158.88
Homo sapiens cytokine receptor-like factor 1 (CRLF1)206315_at4.73714.22379.16
Homo sapiens, serine (or cysteine) proteinase inhibitor, clade B (ovalbumin), member 3209720_s_at4.5972.1351.63
Homo sapiens multidrug resistance-associated protein homolog MRP3 (MRP3);/UG=Hs.90786 ATP-binding cassette, sub-family C (CFTRMRP), member 3209641_s_at4.57014.73350.01
Consensus includes gb:BE791251;/UG=Hs.25640 claudin 3203953_s_at4.4626.82150.39

The description of each gene, its probe set in HG-U133A GeneChip and log fold change are given in the table. The complete table can be downloaded at .

Functional annotation groups significantly overrepresented in MAD

Down- and up-regulated genes in MAD were treated separately to detect functional annotation groups that may be overrepresented in adenocarcinoma associated genes. Tables 2 and 3, respectively, give the gene ontology terms significantly overrepresented (P < 0.001) in down- and up-regulated genes of MAD. The tables give the number of genes with that gene ontology term on the HG-U133A microarray, the number found, and the P-value of finding at least that number of genes (by random chance) in MAD.
Table 2

The gene ontology terms overrepresented in the set of genes down-regulated by at least 2-fold in adenocarcinomas

Annotation termTotalFoundExpectedP-value
GeneOntology terms
    Globin17120.0E+000.0E+00
    Rhodopsin-like receptor activity384493.8E−041.0E−06
    G-protein chemoattractant receptor activity3482.8E−028.4E−04
    Peptide receptor activity139231.7E−031.2E−05
    G-protein-coupled receptor binding52210.0E+000.0E+00
    Defense/immunity protein activity230390.0E+000.0E+00
    Antimicrobial peptide activity3281.7E−025.4E−04
    Complement activity3281.7E−025.4E−04
    Signal transducer activity25582530.0E+000.0E+00
    Receptor activity15421620.0E+000.0E+00
    Transmembrane receptor activity10831210.0E+000.0E+00
    G-protein coupled receptor activity467610.0E+000.0E+00
    Chemokine receptor activity3482.8E−028.4E−04
    Receptor binding592720.0E+000.0E+00
    Cytokine activity253390.0E+000.0E+00
    Heavy metal binding2389.4E−044.1E−05
    Sugar binding132280.0E+000.0E+00
    Extracellular10851380.0E+000.0E+00
    Extracellular space457720.0E+000.0E+00
    Hemoglobin complex18120.0E+000.0E+00
    Plasma membrane22972190.0E+000.0E+00
    Integral to plasma membrane17021760.0E+000.0E+00
    Oxygen and reactive oxygen species metabolism65154.6E−047.0E−06
    Calcium ion homeostasis2682.9E−031.1E−04
    Cell motility414501.2E−033.0E−06
    Chemotaxis133390.0E+000.0E+00
    Muscle contraction202251.3E−016.2E−04
    Response to stress10251430.0E+000.0E+00
    Defense response10311690.0E+000.0E+00
    Inflammatory response218500.0E+000.0E+00
    Immune response9501530.0E+000.0E+00
    Humoral immune response235380.0E+000.0E+00
    Antimicrobial humoral response (sensu Invertebrata)145241.2E−038.0E−06
    Cellular defense response139450.0E+000.0E+00
    Cell communication36673260.0E+000.0E+00
    Cell adhesion658840.0E+000.0E+00
    Heterophilic cell adhesion97209.7E−051.0E−06
    Signal transduction29472540.0E+000.0E+00
    Cell surface receptor linked signal transduction11241170.0E+000.0E+00
    G-protein coupled receptor protein signaling pathway657770.0E+000.0E+00
    Cytosolic calcium ion concentration elevation49103.2E−026.5E−04
    Cell-cell signaling689643.7E−015.4E−04
    Development19201501.5E+008.1E−04
    Histogenesis and organogenesis125187.5E−026.0E−04
    Muscle development167275.0E−043.0E−06
    Respiratory gaseous exchange36112.2E−046.0E−06
    Chemokine activity52210.0E+000.0E+00
    Circulation142227.2E−035.1E−05
    Peptide receptor activity/G-protein coupled139231.7E−031.2E−05
    Response to external stimulus15912100.0E+000.0E+00
    Response to biotic stimulus11261790.0E+000.0E+00
    Response to wounding356910.0E+000.0E+00
    Response to pest/pathogen/parasite5961230.0E+000.0E+00
    Response to bacteria1961.3E−027.1E−04
    Response to abiotic stimulus577710.0E+000.0E+00
    Morphogenesis11191014.9E−024.4E−05
    Organogenesis1029912.2E−012.2E−04
    Cellular process71405340.0E+000.0E+00
    Membrane42253560.0E+000.0E+00
    Integral to membrane32202810.0E+000.0E+00
    Cell growth97177.3E−037.5E−05
    Humoral defense mechanism (sensu Invertebrata)145241.2E−038.0E−06
    Cell–cell adhesion220306.8E−033.1E−05
    Antimicrobial humoral response145241.2E−038.0E−06
    Cytolysis2082.4E−041.2E−05
    Cytokine binding80142.6E−023.2E−04
    Chemokine binding3482.8E−028.4E−04
    Carbohydrate binding133280.0E+000.0E+00
    Chemoattractant activity52210.0E+000.0E+00
    Response to chemical substance206480.0E+000.0E+00
    Peptide binding213261.3E−016.1E−04
    Taxis133390.0E+000.0E+00
    Chemokine receptor binding52210.0E+000.0E+00
    Innate immune response220500.0E+000.0E+00
    Eicosanoid biosynthesis2571.4E−025.7E−04
Protein domain
    Vertebrate metallothionein1272.4E−052.0E−06
    Aspartic acid and asparagine hydroxylation site143213.5E−022.5E−04
    Rhodopsin-like GPCR superfamily289420.0E+000.0E+00
    Endothelin receptor641.3E−032.1E−04
    Small chemokine, C-C subfamily26110.0E+000.0E+00
    Fos transforming protein1369.5E−047.3E−05
    Thrombospondin, type I52137.8E−041.5E−05
    Globin16120.0E+000.0E+00
    Small chemokine, C-X-C subfamily1861.1E−026.0E−04
    C-type lectin95195.7E−046.0E−06
    Alpha crystallin847.2E−039.0E−04
    Myelin proteolipid protein (PLP)760.0E+000.0E+00
    Zn-binding protein, LIM95182.2E−032.3E−05
    Small chemokine, interleukin-8 like48200.0E+000.0E+00
    EGF-like calcium-binding147215.3E−023.6E−04
    Heat shock protein Hsp20847.2E−039.0E−04
    Fibrinogen, beta/gamma chain, C-terminal globular38103.4E−038.9E−05
    P2 purinoceptor2174.3E−032.1E−04
    Myoglobin963.6E−054.0E−06
    Beta haemoglobin870.0E+000.0E+00
    Alpha haemoglobin653.6E−056.0E−06
    Pi haemoglobin653.6E−056.0E−06
    Small chemokine, C-X-C/Interleukin 81881.1E−046.0E−06
    Metallothionein superfamily1272.4E−052.0E−06
    Orphan nuclear receptor959.1E−041.0E−04
    Immunoglobulin C-2 type223316.2E−032.8E−05
    Immunoglobulin subtype368483.7E−041.0E−06
    PMP-22/EMP/MP20 family847.2E−039.0E−04
    L1 transposable element847.2E−039.0E−04
    EGF-like domain431461.4E−013.3E−04
    Type I EGF169236.7E−024.0E−04
    AIG1 family641.3E−032.1E−04
    BRICHOS domain1380.0E+000.0E+00
    Immunoglobulin-like678756.8E−041.0E−06
    LST-1660.0E+000.0E+00
    Thrombospondin, subtype 12785.0E−031.8E−04
    Saposin-like type B, 2751.3E−041.9E−05
    Saposin B1256.5E−035.4E−04
Pathway
    GPCRs_Class_A_Rhodopsin-like212340.0E+000.0E+00
    Peptide_GPCRs88200.0E+000.0E+00
    MAP00590//Prostaglandin and leukotriene metabolism4193.2E−027.7E−04
    GPCRs_Class_B_Secretin-like34109.2E−042.7E−05
Chromosomal location
    12p301322.4E−018.1E−04
    8p21117181.8E−021.5E−04
    17q2368142.2E−033.2E−05
    16q1337123.7E−051.0E−06

For each gene ontology term, the total number of genes with this term in the HG-U133A GeneChip, the total number of genes carrying that term in MAD, the P-value of this and the expected number of genes are tabulated. The member genes for each gene ontology term can be downloaded at .

Table 3

The gene ontology terms overrepresented in the set of genes up-regulated by at least 2-fold in adenocarcinomas

Annotation termTotalFoundExpectedP-value
Gene Ontology term
    DNA replication and chromosome cycle233540.0E+000.0E+00
    Cell cycle checkpoint50175.5E−041.1E−05
    S phase of mitotic cell cycle183381.1E−026.1E−05
    M phase of mitotic cell cycle149460.0E+000.0E+00
    Nucleotide binding17372352.1E−011.2E−04
    Mitotic cell cycle421970.0E+000.0E+00
    M phase201520.0E+000.0E+00
    Nuclear division195500.0E+000.0E+00
    Chromatin117291.8E−031.5E−05
    Nucleosome60163.0E−025.0E−04
    Cytokinesis85246.8E−048.0E−06
    Catalytic activity48876380.0E+000.0E+00
    Carboxypeptidase A activity1885.5E−033.1E−04
    Extracellular matrix structural constituent89214.0E−024.5E−04
    Collagen54230.0E+000.0E+00
    ATP binding12801774.0E−013.1E−04
    Extracellular matrix345652.1E−036.0E−06
    Collagen59240.0E+000.0E+00
    Fibrillar collagen23140.0E+000.0E+00
    Chromosome147321.3E−028.8E−05
    Spindle64211.3E−042.0E−06
    Intermediate filament76192.9E−023.8E−04
    DNA metabolism606973.1E−025.1E−05
    DNA replication178362.9E−021.6E−04
    DNA dependent DNA replication94231.3E−021.4E−04
    DNA replication initiation25116.3E−042.5E−05
    Amino acid and derivative metabolism240540.0E+000.0E+00
    Amino acid metabolism197431.2E−036.0E−06
    Oncogenesis521846.6E−021.3E−04
    Cell cycle8711450.0E+000.0E+00
    Chromosome segregation35112.9E−028.4E−04
    Mitosis145450.0E+000.0E+00
    Regulation of mitosis35126.9E−032.0E−04
    Mitotic checkpoint1671.3E−028.3E−04
    Ectoderm development98261.1E−031.1E−05
    Cell proliferation13561901.2E−018.9E−05
    Epidermal differentiation80222.2E−032.8E−05
    Glutamine family amino acid metabolism46152.9E−036.3E−05
    Amine metabolism283600.0E+000.0E+00
    Histogenesis131284.3E−023.3E−04
    Glucuronosyltransferase activity1885.5E−033.1E−04
    Transferase activity16342241.3E−018.1E−05
    Transferase activitytransferring glycosyl groups225427.0E−023.1E−04
    Transferase activitytransferring hexosyl groups148342.5E−031.7E−05
    Other carbon–nitrogen ligase activity2592.1E−028.3E−04
    Purine nucleotide binding17232332.3E−011.4E−04
    Adenyl nucleotide binding12921793.4E−012.6E−04
    Intermediate filament cytoskeleton76192.9E−023.8E−04
Protein domain
    Fibrillar collagen, C-terminal23140.0E+000.0E+00
    Endoplasmic reticulum targeting sequence76192.0E−022.6E−04
    Epsin N-terminal homology1671.1E−026.9E−04
    MCM family1182.2E−052.0E−06
    Prolyl oligopeptidase1268.6E−037.1E−04
    Intermediate filament protein67173.0E−024.4E−04
    von Willebrand factor, type D967.7E−048.6E−05
    UDP-glucoronosyl/UDP-glucosyl transferase1576.4E−034.3E−04
    Prolyl endopeptidase, serine active site854.4E−035.5E−04
    Immunoglobulin V-type146326.3E−034.3E−05
    Cyclin, C-terminal1991.0E−035.4E−05
    Disulphide isomerase1278.4E−047.0E−05
    Cyclin44144.7E−031.1E−04
    Cyclin, N-terminal domain34123.7E−031.1E−04
    Histone core2591.7E−026.7E−04
    Collagen triple helix repeat106283.2E−043.0E−06
    Collagen helix repeat69226.9E−051.0E−06
Pathway
    Cell_cycle133435.7E+004.3E−02
    Glutamate_metabolism27123.2E−011.2E−02
    MAP00251//glutamate metabolism43156.5E−011.5E−02
    Androgen_and_estrogen_metabolism1571.1E−017.0E−03
    MAP00150//androgen and estrogen metabolism32113.5E−011.1E−02
Chromosomal location
    79611298.7E−019.1E−04
    1q9201264.4E−014.8E−04
    8q388675.8E−031.5E−05

Details are as described in Table 2.

For genes down-regulated in adenocarcinomas, several gene ontology terms related to immune responses were overrepresented, indicating that there appeared to be a depression in defense mechanisms in general, for the adenocarcinoma tissue samples (Table 2). In addition, genes associated with ‘signal transducer activity’ (e.g. TEK tyrosine kinase, G protein-coupled receptor kinase) were also identified to be significantly overrepresented in down-regulated genes in MAD, suggesting the blockage of signal transduction genes in adenocarcinoma cells. Many gene ontology terms that were overrepresented in the up-regulated genes of MAD were associated with the cell cycle and cell replication machinery (Table 3) as might be expected from accelerated cancer cell proliferation.

Construction of relationship trees and determination of putatively co-regulated genes

After obtaining the constituent member genes for each gene ontology term overrepresented in MAD, we investigated their pairwise gene expression relationships. Supplementary Material figure 2 shows an example of such a study for the gene ontology term ‘DNA replication and chromosomal cycle’ with the GenBank accession numbers for each tree branch corresponding to the genes in MAD that are assigned this ontology term. The branch distances displayed were used to derive the putatively co-regulated gene set (marked by an asterisk) according to the two criteria stated in the Materials and Methods section. In this example, the putatively co-regulated genes were: (i) MCM2—mini-chromosome maintenance deficient 2; (ii) replication factor C (activator 1) 4; and (iii) CDC45—cell division cycle 45-like.

Identification of conserved DNA motifs and transcription factors associated with a GO term

Conserved regions, within 5 kb of the transcription start site, of the putatively co-regulated genes associated with each gene ontology term overrepresented in MAD were identified using MoDEL (30). Example results from four gene ontology terms: (i) DNA replication and chromosomal cycle; (ii) nuclear division; (iii) cellular defense response and (iv) signal transduction, are shown in Table 4. The first two terms are associated with genes that were up-regulated in adenocarcinoma tissues, whereas the latter two terms are associated with down-regulated genes. Conserved regions are presented using IUPAC uncertainty codes, with highly conserved residues shown in bold, along with their start position relative to the transcription start site. The occurrence of each of these 50mers in regions 5 kb upstream of all human genes (32) is shown along with the proportion of those genes that have the same GO term and regulation pattern of the gene in the table. The final column reports the transcription factors (from TRANSFAC 8.3) that may bind to the conserved region based on matches to their binding site motifs. The complete data for Table 4 can be found at .
Table 4

Highly conserved DNA regions, detected with MoDEL, in regions 5 kb directly upstream of the transcription start site in putatively co-regulated gene sets

Gene OntologyLocationbFrequencycSimilar regulationdPutative transcription factorse
Gene up-regulated in lung adenocarcinomas cells
DNA replication and chromosome cyclea
NM 004526gggcgt GGTGGCTCACGCCTGTAATCCTAGCACTATGGGAGGCCAAGGCAGGCGGAt ccact−24671100%Whn,AhR,GATA-1,PITX2,HIF-1
NM 002916aggcac GGTGGCTCACACCTGTAATCCCAGCACTTTGGGAGGCCAAGGTGGGTGGAt gccga−2920775%MyoD,E47,AREB6,E12,USF,GATA-1,PITX2
NM 003504gggtgc GGTGGCTCACGCCTATAATCCCAGCACTTTGGGAGGCAGAGGCAGGTGGA ttacct−31791100%Whn,AhR,GATA-1,PITX2,Major,MyoD,E47,TFII-I,AREB6,TAL1,E12,USF,HIF-1
ProfileRGGYRY GGTGGCTCACRCCTRTAATCCYAGCACTWTGGGAGGCMRAGGYRGGYGGA TBMMSW
Nuclear divisiona
NM 004701agtagt CCCAGCTACTCGGGTGGCTGAGGTAGGAGGATCACTTGAGCCCGGGGGAT tgaggc−37351100%HES1,SREBP-1,TFII-I,DEC,USF,Nkx2-5,GATA-1
NM 001211tatagt CCCAGCTACATGGGAGGATGAGGCAGGAAGATCGCTTGAACCTGGGAGGT ggaggt−37051100%TFII-I,Elk-1,NERF1a,c-Ets-1(p54),68,AREB6
NM 020242tgtagt CCCAGCTGCTTGAGAGGCTGAGGCAGGAGGATCACTTGAGCCCAGGAGGT caaggc−20161100%TAL1,HEB,TFII-I,Zta,c-Ets-1(p54),DEC,SREBP-1,USF,Nkx2-5,RORalpha1
NM 022346ctgtaa CCCAGCTACTTGGGAGACTGAGGCGGGAGAATCGCTTCAACCCGGGAGGC agaggt−20711100%
NM 003981tgtaat CTGAGCTACTTGGGAGGCTGAAGCAGGAGAATCACTTGAACCTGGGAGGC ggaggt−8661100%TFII-I,DEC,Nkx2-2,SREBP-1,USF,Nkx2-5,AREB6,HIF-1
NM 001237ttgaac TGCAAGAACAGCCGCCGCTCCGGCGGGCTGCTCGCTGCATCTCTGGGCGT ctttgg−1671100%
NM 001813tgtaat CCCAGCTACTGGGGAGGCTGAGGCAGGAGAATCACTTGAATGCAGGAGGT ggaggc−7151100%TFII-I,Zta,DEC,Nkx2-2,SREBP-1,USF,TTF1,Nkx2-5,c-Ets-1,c-Ets-1(p54),AREB6
NM 002358tgtagt CTCAGCTACTTGGGAGTCCGAGGCAGGAGAATTGCTTGAACCTGGGAGGC agaggt−48811100%TFII-I,C/EBPdelta,AREB6
NM 022346ctgtaa CCCAGCTACTTGGGAGACTGAGGCGGGAGAATCGCTTCAACCCGGGAGGC agaggt−20711100%
ProfileHDKWRH YBSARSWRCWBSVGHSDMYSMRGYRGGMDRMTYRCTKSADYBYDGGRSRY NDWKGB
Gene down-regulated in lung adenocarcinomas cells
Cellular defense responsea
NM 005874cttgat GGTCCCGGGACCCTGTGGCATCTCACCTCTGGCCTCTGTTCTTTCTTGTG agtccg−7151100%USF,AREB6,GR
NM 000265tggcag GATCTCGGCTCACTGCAACCTCCACCTCCCTGGTTCAAGTGATTCTCCTG tcttac−39972100%RFX1,TFII-I,AREB6,DEC,SREBP-1,Nkx2-2,Nkx2-5,USF, HIF-1
NM 016382tggcgt GATCTCGGCTCACTGCAACCTCCACCTCCTGGATTCAAGTGATTCTCCTG cctcag−39071100%RFX1,TFII-I,AREB6,c-Ets-1(p54),GATA-1,DEC,TTF1,SREBP-1,Nkx2-2,Nkx2-5,USF,Zta
ProfileYKKSRK GRTCYCGGSWCMCTGYRRCMTCYMMCYYCYKGVYTCWRKTSWTTCTYSTG HSTYMS
Signal transductiona
NM 000459tcagga GGCTGAGGCAGAAGAATCGCTTGAACCCAGGAGGCGGACGTTGCAGTGAG ccgaga−25971100%Zta,c-Ets-1(p54),RFX1
NM 005308ttggga GGCTGAAGTACAAGAATCATTTGAACCTGGGAGGCCGAGGTTGCAGTGAG ccgaga−24011100%GATA-1,AREB6,RFX1
NM 000115ttggga GGCTGAGGCAGGAGAATCACTTGAACCTGGGAGCCGGAGGTTGCAGTGAG ctgaga−1967250%TFII-I,Zta,DEC,SREBP-1,Nkx2-2,USF,Nkx2-5,AREB6,c-Ets-1(p54)
NM 005424gccagt GGTGGCAAGAGGTGGAGCAACGGGTGCCAGGGCAGGGAGAGGTGAGTCTG ggaggg−10391100%AREB6,v-Myb,MAZ,Hand1:E47
NM 003991tgggcg CGCTGCGGGAGCTGTAGCTCAGCCAGCCAGGGAGTAGCGGCTTTCATCCG ccggga−3401100%SMAD-3
NM 005795ttggga GGTTGAGGCAGGAGAATTGCTTGAACCCGGGAGGTGGAGGTTGCAGTGAG ctgaga−19301100%Zta,TFII-I,C/EBPdelta,c-Ets-1(p54),AREB6
NM 003357ttggga GGCTGAGGCAGGAGAATCGTCTGAACTCGGGAGGTGGAGGTTGCAGTGAG ccgctg−46241100%Zta,TFII-I,c-Ets-1(p54),AREB6,RFX1
NM 005856ttgtga GGCTGAGGCAGGAGAATCGCTTGAATCCAGGAGGTGGAGGTTGCAGTGAG ccggga−36013100%Zta,TFII-I,GATA-1,c-Ets-1(p54),AREB6,RFX1
NM 004844acagga GGCTGAGGCAGGAGAATAGCTTGAACCCAGGAGGCGGAGGTTGCAGTGAG ctaaca−48243100%Zta,TFII-I,c-Ets-1(p54)
ProfileDBVDSD SGYKGMRRBASVWGDAKHDHHKSVWBYYRGGRVVBVGMSRBKKBMRTSHG SBRVBR

The complete table can be downloaded at .

aOverrepresented gene ontology terms.

bLocation of conserved sequence upstream of the transcription start site.

cFrequency of occurrence of conserved sequence in 5 kb upstream regions of genes given in (32).

dPercentage of the genes with a matched upstream sequence and their expression trends.

eTranscription factor name from TRANSFAC (8.3) FACTOR table (35,73).

DISCUSSION

This study first identified a large set of genes (MAD) showing a 2-fold differential behavior in adenocarcinoma cells when compared with normal lung tissue. Of these genes, 2528 genes (73.45%) were also identified passing the t-test criteria (P < 0.005, complete t-test gene list available at ). Transcription factors with binding site motifs that matched conserved DNA regions upstream of genes in MAD were then identified, as these may be the factors that regulate the oncogenic process. This was achieved by incorporating both experimentally determined gene expression data and bioinformatic tools. Below, we will discuss the functional annotation groups (gene ontology terms) that were overrepresented in the cancer associated genes and their putative regulatory transcription factors. Only some salient findings can be presented due to the size of the dataset and full details are provided as Supplementary Material. In a separate study, we identified 88 lung cancer associated genes (data not shown) from our microarrays, using a feature partitioning method we developed earlier (37). However, here, we aimed to identify the broadest set of cancer associated genes (MAD) by using fold-ratio analysis, and to examine their functional annotations in order to understand the biological processes that are altered in cancer when compared with normal tissue. A broad gene set was important to ensure statistical validity when determining the functional groups (gene ontology terms) that were overrepresented in the gene population in MAD. More than three thousand genes were found to be up- or down-regulated by >2-fold and all 88 cancer associated genes identified using the earlier method (37) were found in this set. In previous works (38–40), differential gene expression in cancer was reported but relatively little elaboration of the genes' functions, or the regulatory cascades and biological processes underlying the observations was made. Here, we found that many gene ontology terms disproportionately occurred (P < 0.001) among the sets of genes that were either substantially up- or down-regulated in adenocarcinomas. This gave evidence of the systematic up- or down-regulation of several biological processes directly linked to oncogenesis. Such processes included increased cell multiplication, angiogenesis, vascularization, and glucose and amino acid metabolism. Glucose metabolism is crucial because cancer cell growth depends on glucose availability, rather than respiration, for biomass construction (41). Increased expression of glycolytic enzymes, including pyruvate carboxylase, citrate synthase, aconitate hydratase, oxalosuccinate decarboxylase, glucose-6-phosphate isomerase, fructose-bisphosphate aldolase, glucose transporter (GLUT) and l-lactate dehydrogenase were observed in the microarray data. This is consistent with the fermentation metabolism (needed for ATP synthesis in the absence of efficient respiration), and with entry into a tricarboxylic acid pathway for glutamate and aspartate synthesis (i.e. biomass construction) rather than respiration. Unlike mostly resting normal cells, where oxygen is used in oxidative phosphorylation for ATP synthesis and cell maintenance, cancer cells metabolize glucose at a much higher rate, in order to generate ATP and use pyruvate as the substrate to generate lactate to replete the NAD pool (Warburg's effect), while stopping the cycling of the tricarboxylic acid pathway (42,43). The major outcome of this metabolic shift is, by preventing the tricarboxylic acid pathway cycling, to produce biomass rather than energy. This effect, overlooked for some time, was discovered >70 years ago (41). Much effort has been initiated to identify the transcription factor(s) that facilitate this change of course in cancer cells (from aerobic slow growth or resting state into anaerobic use of glucose while growing) by up-regulating the expression and activity of all enzymes directly related to this essential metabolic pathway. In recent publications, several transcription factors [hypoxia inducible factor 1 (HIF-1) (44); Myc (45); Ras (46); v-SRC(47); p53(48) and pVHL(49)] were reported to play a role in the regulation of the expression of these glycolytic enzymes. From the genes in MAD associated with each overrepresented gene ontology term, a subset of genes with more consistent expression profiles was identified and the upstream regions of these genes were searched for conserved elements. Such conserved DNA regions, if they exist, are likely to be evolutionarily significant (50–54). Wasserman et al. (55) showed that a large proportion (>98%) of experimentally defined transcription factor binding sites are restricted to the most conserved residues within their own promoter regions. Earlier studies have used databases such as TRANSFAC to search for transcription factor binding sites in the upstream regions of genes; however, this can lead to many false positives (56,57). Clustering of genes based on expression profiles has been used to select sets of genes more likely to be co-regulated (20); however, with increasing numbers of genes in the clusters, the number of false positive identifications increases. One reason for this is the inclusion of genes in the cluster that are not actually co-regulated, hampering the correct detection of conserved DNA regions by most motif discovery tools (21,22). Methods to evaluate putative regulatory sites and newly detected motifs have also been proposed (58). To address this issue, we combined the gene expression correlation coefficients and gene functional classes of all the cancer-associated genes (MAD) to select a more consistent set of likely co-regulated genes. These genes not only had a consistent expression pattern with the highest possible pairwise gene correlation, but also shared the same functional role. No limit was placed on the number of genes that would be selected from each functional group, and all genes with expression profiles within a cutoff value (d < 0.20) were selected. These criteria were motivated by there being many examples, which show that transcription factors have multiple target genes, of which a significant portion is involved in a common metabolic pathway. For instance, the CAP transcription factor in Escherichia coli has been shown to mediate the regulation of dozens of genes involved in glucose metabolism (59,60). In humans, the GATA binding protein 1 (globin transcription factor 1, GATA-1) plays an important role in erythroid development by regulating hemoglobin production (61). The majority of genes that are regulated by this transcription factor contain the gene ontology term ‘hemoglobin’. Moreover, growth factor independent 1 (Gfi-1) acts on a subset of genes involved in the differentiation of the hematopoietic lineage (62). MoDEL, the motif discovery program used here, has been demonstrated extensively and compared with other existing motif finding algorithms by analyzing sets of complex natural amino acid sequences (e.g. HTH protein motifs) and artificial datasets (planted motifs) (26). It was shown to have a more efficient optimization method than other local multiple alignment methods. Unlike algorithms that search for motifs by exhaustive enumeration of overrepresented words (63), MoDEL looks for a set of conserved occurrences based on information content (26). The objective of MoDEL is to identify exactly one occurrence per sequence in such a way that all chosen occurrences are maximally similar across the sequence set. A validation of MoDEL on the CAP-mediated gene set (59) in bacteria successfully extracted the conserved regions that incorporate the CAP binding sites (Supplementary Material). Having identified conserved DNA regions associated with genes with the same functional annotation and similar expression profiles, in silico pattern-based scanning against the TRANSFAC 8.3 database for transcription factors with binding site motifs in these conserved DNA regions was performed. Among the transcription factors identified as putative regulatory factors for these genes (Table 4), some had been reported in previous publications to promote or suppress cancer formation, whereas the remaining transcription factors have generally not been sufficiently characterized in vivo. Four of these appear to be particularly significant, namely: HIF-1, Gfi-1, nuclear factor TG-interacting factor (TGIF) and erythroid transcription factor (GATA-1). HIF-1 is a regulatory heterodimer consisting of two subunits; HIF-1β is constitutively expressed in all conditions, whereas HIF-1α is rapidly degraded under normal conditions but is stabilized under hypoxia (64). Despite an average up-regulation of this protein (HIF-1α) by ∼30% in our dataset, our initial screening for cancer gene markers did not reveal this protein because the expression change was too small to be selected. From our microarray findings, the up-regulation of this protein did not result in a systematic activation of gene clusters with a specific function. However, the fact that HIF-1 binding sites were found to be enriched in some down-regulated genes that belonged to the cellular defense response gene ontology term (Table 4), suggested that this protein might be one of the cellular components responsible for the suppression of the defense response of hypoxic cancer cells. Other genes related to growth factor, protease and apoptosis pathways, e.g. epidermal growth factor receptor, carbonic anhydrase IX, p53-, matrix metalloproteinase 9, that were known to be dependent on HIF-1α for their activation (65) had fold changes of 2.41, 2.8, 6.5 and 2.51, respectively, in our dataset. Gfi-1 is a zinc finger protein that binds DNA and functions as a transcriptional repressor through its unique repressor domain, SNAG (66). In our arrays, this gene was down-regulated in adenocarcinoma cells by an average of 69%, and it was observed that genes that contain activation sites for Gfi-1 were mostly up-regulated in adenocarcinoma cells. One example is the pro-apoptotic regulator gene Bax which was up-regulated by 2.3-fold in adenocarcinoma cells but was shown to be down-regulated by Gfi-1 in immortalized T-cell lines and primary transgenic thymocytes (67). TGIF is a transcriptional core-repressor that directly associates with Smad (Sma- and Mad-related protein) proteins and inhibits Smad-mediated transcriptional activation (68). The gene responses activated by Smad underlie both proliferative and anti-proliferative events that contribute to cancer (69,70). Originally, TGIF was isolated as a ubiquitously expressed homeodomain protein that can bind to the retinoid X receptor (RXR) response element (71). Based on our analysis, this gene was up-regulated in lung cancer cells by an average of 2.6-fold while the RXR gene was repressed by an average of 25%. GATA-1 is a factor that had been shown to be important in the regulation of globin and non-globin genes in erythroid, megakaryocytic and mast cell lineages (72). From our arrays, this gene was down-regulated by an average of ∼40% in cancer cells. This is consistent with our findings that members in globin gene family (α, β and γ) were all repressed in adenocarcinomas, despite their weak association with primary lung cancers (Table 2). In conclusion, by investigating the statistical distribution of the functional annotations attached to cancer associated genes (MAD) derived from lung tissue microarrays, we have identified functions, corresponding to several key biological systems, which are overrepresented in cancer associated genes (Tables 2 and 3). The congruence of these functions with known cancer cell oncogenic processes suggests the up- or down-regulation of genes in MAD is linked to cancer-related metabolism processes. Subsequently, we clustered the genes in MAD into putatively co-regulated gene sets by assuming that co-regulated genes will share common functional roles and exhibit very similar expression profiles. Conserved DNA segments in the upstream regions of these putatively co-regulated gene sets were found and transcription factors that recognize these DNA regions were identified (Table 4). A literature search on these transcription factors, which are putative regulatory factors in adenocarcinoma development, substantiated that the majority had been previously documented experimentally to be oncogenic transcription factors. These transcription factors, together with their conserved binding sites, suggest new candidates for therapeutic intervention in the treatment of lung adenocarcinomas.

SUPPLEMENTARY MATERIAL

Supplementary Material is available at NAR Online.
  69 in total

1.  A Smad transcriptional corepressor.

Authors:  D Wotton; R S Lo; S Lee; J Massagué
Journal:  Cell       Date:  1999-04-02       Impact factor: 41.582

Review 2.  Indigo: a World-Wide-Web review of genomes and gene functions.

Authors:  P Nitschké; P Guerdoux-Jamet; H Chiapello; G Faroux; C Hénaut; A Hénaut; A Danchin
Journal:  FEMS Microbiol Rev       Date:  1998-10       Impact factor: 16.408

3.  The statistical significance of nucleotide position-weight matrix matches.

Authors:  J M Claverie; S Audic
Journal:  Comput Appl Biosci       Date:  1996-10

4.  The gcm-motif: a novel DNA-binding motif conserved in Drosophila and mammals.

Authors:  Y Akiyama; T Hosoya; A M Poole; Y Hotta
Journal:  Proc Natl Acad Sci U S A       Date:  1996-12-10       Impact factor: 11.205

5.  The Gfi-1 protooncoprotein represses Bax expression and inhibits T-cell death.

Authors:  H L Grimes; C B Gilks; T O Chan; S Porter; P N Tsichlis
Journal:  Proc Natl Acad Sci U S A       Date:  1996-12-10       Impact factor: 11.205

6.  Partnership between DPC4 and SMAD proteins in TGF-beta signalling pathways.

Authors:  G Lagna; A Hata; A Hemmati-Brivanlou; J Massagué
Journal:  Nature       Date:  1996-10-31       Impact factor: 49.962

7.  MADR2 maps to 18q21 and encodes a TGFbeta-regulated MAD-related protein that is functionally mutated in colorectal carcinoma.

Authors:  K Eppert; S W Scherer; H Ozcelik; R Pirone; P Hoodless; H Kim; L C Tsui; B Bapat; S Gallinger; I L Andrulis; G H Thomsen; J L Wrana; L Attisano
Journal:  Cell       Date:  1996-08-23       Impact factor: 41.582

8.  Post-transcriptional regulation of vascular endothelial growth factor mRNA by the product of the VHL tumor suppressor gene.

Authors:  J R Gnarra; S Zhou; M J Merrill; J R Wagner; A Krumm; E Papavassiliou; E H Oldfield; R D Klausner; W M Linehan
Journal:  Proc Natl Acad Sci U S A       Date:  1996-10-01       Impact factor: 11.205

9.  Cancer statistics, 2003.

Authors:  Ahmedin Jemal; Taylor Murray; Alicia Samuels; Asma Ghafoor; Elizabeth Ward; Michael J Thun
Journal:  CA Cancer J Clin       Date:  2003 Jan-Feb       Impact factor: 508.702

10.  TRANSFAC: transcriptional regulation, from patterns to profiles.

Authors:  V Matys; E Fricke; R Geffers; E Gössling; M Haubrock; R Hehl; K Hornischer; D Karas; A E Kel; O V Kel-Margoulis; D-U Kloos; S Land; B Lewicki-Potapov; H Michael; R Münch; I Reuter; S Rotert; H Saxel; M Scheer; S Thiele; E Wingender
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

View more
  18 in total

1.  Gfi1 coordinates epigenetic repression of p21Cip/WAF1 by recruitment of histone lysine methyltransferase G9a and histone deacetylase 1.

Authors:  Zhijun Duan; Adrian Zarebski; Diego Montoya-Durango; H Leighton Grimes; Marshall Horwitz
Journal:  Mol Cell Biol       Date:  2005-12       Impact factor: 4.272

2.  Gene expression profiling and network analysis reveals lipid and steroid metabolism to be the most favored by TNFalpha in HepG2 cells.

Authors:  Amit K Pandey; Neha Munjal; Malabika Datta
Journal:  PLoS One       Date:  2010-02-04       Impact factor: 3.240

3.  Asymmetric microarray data produces gene lists highly predictive of research literature on multiple cancer types.

Authors:  Noor B Dawany; Aydin Tozeren
Journal:  BMC Bioinformatics       Date:  2010-09-27       Impact factor: 3.169

4.  Identification of novel deregulated RNA metabolism-related genes in non-small cell lung cancer.

Authors:  Iñaki Valles; Maria J Pajares; Victor Segura; Elisabet Guruceaga; Javier Gomez-Roman; David Blanco; Akiko Tamura; Luis M Montuenga; Ruben Pio
Journal:  PLoS One       Date:  2012-08-02       Impact factor: 3.240

5.  Small interfering RNA against transcription factor STAT6 leads to increased cholesterol synthesis in lung cancer cell lines.

Authors:  Richa Dubey; Ravindresh Chhabra; Neeru Saini
Journal:  PLoS One       Date:  2011-12-05       Impact factor: 3.240

6.  Topology-based cancer classification and related pathway mining using microarray data.

Authors:  Chun-Chi Liu; Wen-Shyen E Chen; Chin-Chung Lin; Hsiang-Chuan Liu; Hsuan-Yu Chen; Pan-Chyr Yang; Pei-Chun Chang; Jeremy J W Chen
Journal:  Nucleic Acids Res       Date:  2006-08-16       Impact factor: 16.971

7.  CRSD: a comprehensive web server for composite regulatory signature discovery.

Authors:  Chun-Chi Liu; Chin-Chung Lin; Wen-Shyen E Chen; Hsuan-Yu Chen; Pei-Chun Chang; Jeremy J W Chen; Pan-Chyr Yang
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

8.  A benchmark for statistical microarray data analysis that preserves actual biological and technical variance.

Authors:  Benoît De Hertogh; Bertrand De Meulder; Fabrice Berger; Michael Pierre; Eric Bareke; Anthoula Gaigneaux; Eric Depiereux
Journal:  BMC Bioinformatics       Date:  2010-01-11       Impact factor: 3.169

9.  Identification of importin 8 (IPO8) as the most accurate reference gene for the clinicopathological analysis of lung specimens.

Authors:  Paul A Nguewa; Jackeline Agorreta; David Blanco; Maria Dolores Lozano; Javier Gomez-Roman; Blas A Sanchez; Iñaki Valles; Maria J Pajares; Ruben Pio; Maria Jose Rodriguez; Luis M Montuenga; Alfonso Calvo
Journal:  BMC Mol Biol       Date:  2008-11-17       Impact factor: 2.946

10.  Identification of transcription factor and microRNA binding sites in responsible to fetal alcohol syndrome.

Authors:  Guohua Wang; Xin Wang; Yadong Wang; Jack Y Yang; Lang Li; Kenneth P Nephew; Howard J Edenberg; Feng C Zhou; Yunlong Liu
Journal:  BMC Genomics       Date:  2008       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.