Literature DB >> 24918550

dbCerEx: a web-based database for the analysis of cervical cancer transcriptomes.

Limin Zhou1, Wei Zheng2, Majing Luo3, Jing Feng4, Zhichun Jin1, Yan Wang1, Dunlan Zhang1, Qiongxiu Tang1, Yan He5.   

Abstract

BACKGROUND: Cervical cancers are ranked the second-most hazardous ailments among women worldwide. In the past two decades, microarray technologies have been applied to study genes involved in malignancy progress. However, in most of the published microarray studies, only a few genes were reported leaving rather a large amount of data unused. Also, RNA-Seq data has become more standard for transcriptome analysis and is widely applied in cancer studies. There is a growing demand for a tool to help the experimental researchers who are keen to explore cervical cancer gene therapy, but lack computer expertise to access and analyze the high throughput gene expression data. DESCRIPTION: The dbCerEx database is designed to retrieve and process gene expression data from cervical cancer samples. It includes the genome wide expression profiles of cervical cancer samples, as well as a web utility to cluster genes with similar expression patterns. This feature will help researchers conduct further research to uncover novel gene functions.
CONCLUSION: The dbCerEx database is freely available for non-commercial use at http://128.135.207.10/dbCerEx/, and will be updated and integrated with more features as needed.

Entities:  

Mesh:

Year:  2014        PMID: 24918550      PMCID: PMC4053392          DOI: 10.1371/journal.pone.0099834

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Cervical cancers account for the second-most gynecological cancer death cases worldwide, and this situation is worse in developing countries due to the lack of adequate organized screening programs. It is believed that Human Papilloma Virus (HPV) infections are the major causes of invasive cervical cancer [1]. Whole- genome expression profiling has revolutionized in the way we study disease and basic biology. Since 1997, the number of published results based on an analysis of gene expression microarray data has grown from 30 to over 5,000 publications per year [2]. DNA microarray technologies aim at simultaneous measurements of the expression of thousands of genes in one single experiment. Over the past few years, this technology has facilitated better understanding of the complex and heterogeneous molecular characteristics of cancers and helped to improve treatment in cancers. For example, HOXC10 gene at first was identified to belong to the 171 significantly up-regulated genes in the cervical squamous cell carcinomas (SCC) relative to normal cervix samples from DNA microarray, which was later identified as a key mediator of invasion in cervical cancer [3]. Archival RNA samples of 25 patients were hybridized to Stanford microarray chips to build a seven gene scoring system [4]. This gene expression pattern could help to identify patients with cervical cancer who can be treated with radiotherapy alone. The specific expression profiles of candidate genes were selected to identify historical subtypes of cervical cancer [5]. Furthermore, numerous candidate biomarkers and therapeutic targets have been identified in other cancers. However, for most of the published microarray studies, only subsets of genes have been reported to demonstrate the authors’ hypothesis. The complete microarray datasets are stored in an unsystematic manner, and useful only to those with computational expertise. Also, RNA-Seq data has become more standard for transcriptome analysis and is widely applied in cancer studies. While for most of the experimental researchers, there also remain difficulties to utilize these cancer microarray databases and RNA-Seq data to solve biological questions. For example, if one novel gene of interest has a correlated (positive or negative) expression pattern with an apoptosis-related gene, it indicates that they may share the same regulatory mechanism, which could provide the potential research proposal for the novel gene. Here we present dbCerEx, a database of gene expression profiles generated from DNA microarray experiments and RNA-Seq data. The database is provided with an integrated web-based utility, which has made the data easily accessible to the cervical cancer research community. According to this method, the experimental researchers could identify novel cervical cancer related genes and explore the relationships among them.

Construction and Content

Microarray and RNA-Seq Data

The microarray expression data (GSE matrix files) and platform annotation (GPL files) were retrieved from Gene Expression Omnibus (GEO) database [6] via a R [7]/Bioconductor [8] ‘GEOquery’ package [9]. The RNA-Seq data were retrieved from The Cancer Genome Atlas (TCGA) Data Portal [10], which contains clinical information, genomic characterization data and high level sequence analysis of the tumor genomes. The data was then log (base 2) transformed and median centred. To avoid computational error during calculation, the row that contained ‘NA’ value would be omitted. The experiments were processed via various platforms (Table 1). To make the expression data searchable regardless of the platforms, the probes were remapped to official gene symbols. However, instead of gene symbol assignment information, some GPL files provided only NCBI GenBank [11] or NCBI Refseq [12] Accession Numbers mapping to probes. To solve this problem, the ‘gene2refseq’ and ‘gene2accesion’ files were retrieved from the NCBI ftp server via ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/. A Perl script was used to map gene symbols to these GenBank or RefSeq Accession Numbers, and eventually to the microarray probes. The gene expression flat files were stored for later accessing.
Table 1

List of GEO accession number, published year and expression platforms of microarray experiments and RNA-Seq data used in this study.

GEO Acc.* YearExpression PlatformSample InformationReference
1GSE57872006Affymetrix HumanGenome U133 Plus2.0 ArraySixty-six flash-frozen punch biopsies wereobtained from 16 patients with cervical cancer. [13]
2GSE35782007GE Healthcare/AmershamBiosciencesCodeLink HumanWholeGenome BioarrayTwenty-eight squamous cell carcinoma ofcervix from 24 patients were taken as biopsy sample before treatment and during treatment [14]
3GSE67912007Affymetrix HumanGenome U133Plus 2.0 Array84 cervical cancers, head and neck cancersand site-matched normal epithelialsamples from 20 patients [15]
4GSE103722008SentrixHuman-6 ExpressionBeadChip32 snap-frozen tissues from 68 cervical carcinomaspatients who underwent radical hysterectomywith bilateral lymphadenectomy between 1991 and 2005. [16]
5GSE97502008Affymetrix HumanGenome U133A ArrayA total of 66 samples were included, which include 33primary tumors, 9 cell lines, and 24 normal cervicalepithelium. [17]
6GSE201672010GE Healthcare/AmershamBioscienceCodeLink HumanWhole GenomeBioarrayA total of 80 cevical cancer samples of following histology were included in this study: 54 squamous cellcarcinoma, 18 adenosquamous carcinomas, 6 adenocarcinoma,and 2 others [5]
7GSE295702012Affymetrix HumanGene 1.0 ST Array[transcript (gene)version]The polymorphism of mtDNA D-Loop was investigatedin 187 cervical cancer patients and 270healthy controls.
8GSE390012013Affymetrix HumanHG-FocusTarget Array43 HPV16-positive cevical cancer and 12 healthy cervical epitheliums using the HG-Focus microarray
9GSE274692013Illumina HumanWG-6 v3.0Expressionbeadchip82 patients with cervical cancer, stage 1b bulkythrough 4a, were included [18]
10TCGA-CESC2014RNASeqTCGAThe total number of Cervical squamous cellcarcinoma and endocervical adenocarcinomasamples is 190. [10]

*NCBI Gene Expression Omnibus Accession number, it can be used to retrieve the microarray experiment data via http://www.ncbi.nlm.nih.gov/geo/.

*NCBI Gene Expression Omnibus Accession number, it can be used to retrieve the microarray experiment data via http://www.ncbi.nlm.nih.gov/geo/.

Predefined Gene Set

One important feature of this database is that it enables users to search similar gene candidates with genes they are studying based on the expression patterns. Relying on this method, researchers may find mechanisms among these genes, which may become a promising approach to discovering novel gene function. The gene sets predefined in the databases were retrieved from various sources and divided into two main categories: Gene Ontology (GO) [19] and Pathway. As shown in Table 2, the GO set consists of Biological process, Molecular functions and Cellular Component. While the Pathway set consists of KEGG [20], BIOCARTA (www.biocarta.com) and REACTOME [21]. Human species of the gene sets were used in this work.
Table 2

Predefined Gene Sets.

CategoryGene set titleNumber of gene sets
Pathway BIOCARTA217
KEGG186
REACTOME674
Gene Ontology Biological process825
Cellular component233
Molecular function396

Gene Expression Cluster Analysis

The unsupervised hierarchical clustering algorithm was introduced to find the similar genes based on expression patterns. This attempt was processed using a combination of distance metrics and linkages. In this study, the distance from gene x to gene y defined as 1-rxy, where rxy represents the Pearson Correlation of gene x and y:

Database Implementation

The dbCerEx database is a web-based utility combining a MySQL (http://www.mysql.com/) database management system [MySQL 5.5.32 (Community Server) with InnoDB engine]. The front-end web interface is enhanced by a java script framework, Bootstrap 2.3.1 (http://getbootstrap.com/). The PHP [version 5.3.10] (http://www.php.net/) applications receive the query from the user, are connected to the database to gather data, call external Perl and R scripts to process statistical analyze and generate HTML pages displaying results.

Utility and Discussion

The dbCerEx database is provided by a web-based interface. Users can start the search by entering one interested gene in the top input box, and then click on ‘Search’ button. A gene list will be shown in a new page for all the genes related to input gene keyword. Users can select a gene from the list according to the description to do expression analysis. By clicking a gene, a general summary including full name, aliases and external links such as HNGC, Entrez Gene, Ensembl. MIM and Genecard for this gene will be shown. In the same page, users are allowed to set the parameters of expression analysis in cervical cancer. Users can enter an interested gene set by hand or from the gene set list such as KEGG, BIOCARTA, REACTOME and Gene Ontology. Users can select dataset from the precompiled cervical cancer expression datasets from microarray and RNASeq, or just provide a GEO accession number. By clicking the Submit Query button, the samples for the selected dataset will be listed. Users can select all or some interested samples to do expression analysis. A heatmap displaying the hierarchical clustering of genes and samples will be shown (Figure 1). In addition, a heatmap that includes the significantly positively or negatively correlated genes with the interested gene will be also offered (Figure 2). The pearson correlation and p value will be shown as a table at the right side of the heatmap.
Figure 1

A heatmap showing the hierarchical clustering of the interested gene and geneset.

Figure 2

A heatmap showing the genes that are positively or negatively correlated with the interested gene.

The genes that have significant pearson correlation with the interested gene were selected to plot a heatmap. The samplers are in the column, and ordered by the expression of the interested gene.

A heatmap showing the genes that are positively or negatively correlated with the interested gene.

The genes that have significant pearson correlation with the interested gene were selected to plot a heatmap. The samplers are in the column, and ordered by the expression of the interested gene.

Conclusion

We present dbCerEx, a database containing cervical cancer gene expression profiles. In addition, it provides a novel utility for gene expression similarity search within certain interested gene sets. It is believed that dbCerEx is a powerful platform for bioinformatics discovery that brings cervical cancer microarray data and RNA-Seq data, and analysis of the cervical cancer research community with easy reach.

Availability and Requirements

The dbCerEx database website is available free of charge as a web application at: http://128.135.207.10/dbCerEx/.
  19 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  GenePattern 2.0.

Authors:  Michael Reich; Ted Liefeld; Joshua Gould; Jim Lerner; Pablo Tamayo; Jill P Mesirov
Journal:  Nat Genet       Date:  2006-05       Impact factor: 38.330

3.  GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor.

Authors:  Sean Davis; Paul S Meltzer
Journal:  Bioinformatics       Date:  2007-05-12       Impact factor: 6.937

4.  Human papillomavirus is a necessary cause of invasive cervical cancer worldwide.

Authors:  J M Walboomers; M V Jacobs; M M Manos; F X Bosch; J A Kummer; K V Shah; P J Snijders; J Peto; C J Meijer; N Muñoz
Journal:  J Pathol       Date:  1999-09       Impact factor: 7.996

5.  Gene expression profiling in cervical cancer: an exploration of intratumor heterogeneity.

Authors:  Barbara Bachtiary; Paul C Boutros; Melania Pintilie; Willa Shi; Carlo Bastianutto; Jian-Hua Li; Joerg Schwock; Wendy Zhang; Linda Z Penn; Igor Jurisica; Anthony Fyles; Fei-Fei Liu
Journal:  Clin Cancer Res       Date:  2006-10-01       Impact factor: 12.531

6.  Bioconductor: open software development for computational biology and bioinformatics.

Authors:  Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal:  Genome Biol       Date:  2004-09-15       Impact factor: 13.583

7.  Elevated expression of SerpinA1 and SerpinA3 in HLA-positive cervical carcinoma.

Authors:  J N Kloth; A Gorter; G J Fleuren; J Oosting; S Uljee; N ter Haar; E J Dreef; G G Kenter; E S Jordanova
Journal:  J Pathol       Date:  2008-07       Impact factor: 7.996

8.  Gene expression analysis of preinvasive and invasive cervical squamous cell carcinomas identifies HOXC10 as a key mediator of invasion.

Authors:  Yali Zhai; Rork Kuick; Bin Nan; Ichiro Ota; Stephen J Weiss; Cornelia L Trimble; Eric R Fearon; Kathleen R Cho
Journal:  Cancer Res       Date:  2007-11-01       Impact factor: 12.701

9.  Fundamental differences in cell cycle deregulation in human papillomavirus-positive and human papillomavirus-negative head/neck and cervical cancers.

Authors:  Dohun Pyeon; Michael A Newton; Paul F Lambert; Johan A den Boon; Srikumar Sengupta; Carmen J Marsit; Craig D Woodworth; Joseph P Connor; Thomas H Haugen; Elaine M Smith; Karl T Kelsey; Lubomir P Turek; Paul Ahlquist
Journal:  Cancer Res       Date:  2007-05-15       Impact factor: 12.701

10.  The radiation-induced cell-death signaling pathway is activated by concurrent use of cisplatin in sequential biopsy specimens from patients with cervical cancer.

Authors:  Mayumi Iwakawa; Tatsuya Ohno; Kaori Imadome; Miyako Nakawatari; Ken-ichi Ishikawa; Minako Sakai; Shingo Katoh; Hitoshi Ishikawa; Hirohiko Tsujii; Takashi Imai
Journal:  Cancer Biol Ther       Date:  2007-03-05       Impact factor: 4.742

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.