Literature DB >> 29088455

HCMDB: the human cancer metastasis database.

Guantao Zheng1,2, Yijie Ma1, Yang Zou1, An Yin2, Wushuang Li1, Dong Dong1.   

Abstract

Metastasis is the main event leading to death in cancer patients. Over the past decade, high-throughput technologies have provided genome-wide view of transcriptomic changes associated with cancer metastases. Many microarray and RNA sequencing studies have addressed metastases-related expression patterns in various types of cancer, and the number of relevant works continues to increase rapidly. These works have characterized genes that orchestrate the metastatic phenotype of cancer cells. However, these expression data have been deposited in various repositories, and efficiently analyzing these data is still difficult because of the lack of an integrated data mining platform. To facilitate the in-depth analyses of transcriptome data on metastasis, it is quite important to make a comprehensive integration of these metastases-related expression data. Here, we presented a database, HCMDB (the human cancer metastasis database, http://hcmdb.i-sanger.com/index), which is freely accessible to the research community query cross-platform transcriptome data on metastases. HCMDB is developed and maintained as a useful resource for building the systems-biology understanding of metastasis.
© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29088455      PMCID: PMC5753185          DOI: 10.1093/nar/gkx1008

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Metastasis is the spread of a cancer from one organ to another without being directly connected with it, and it is the principal cause of cancer-related death (1–4). It is of great importance to evaluate the presence of metastasis in cancer patients, when choosing appropriate treatment strategies. A wide variety of approaches have been employed to identify and characterize genes involved in cancer metastasis at the transcriptional level (5–9). It is necessary to characterize the complicated molecular mechanism by integrating different data sources. Recently, transcriptome data generated by microarray and RNA sequencing technologies have been widely used to explore the molecular nature of metastasis (9–14). Researches on the expression profile of metastases of multiple tumor types have produced a large amount of data at the transcription level. Members of new classes of non-coding RNAs, such as long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) have important roles in tumorigenesis and metastases (5,15–18). High-throughput technologies enable the exploration of transcriptomic changes associated with tumor progression and metastasis at the whole-genome level, which greatly assists our efforts to uncover the underlying molecular mechanism of cancer metastasis (6,19). Furthermore, metastasis contributes to most of the cancer-related death, which emphasizes the importance of metastasis risk prediction. Some works identified gene expression signatures that distinguished primary site from metastatic tumors based on cancer metastasis expression profiles (9,10,20), which would be developed as useful biomarkers for early detection, diagnosis and treatment of cancers. Most of these high-throughput data have been deposited into NCBI Gene Expression Omnibus (GEO) (21) and the Sequence Read Archive (SRA) (22). The Cancer Genome Atlas (TCGA) dataset also stored expression data of metastasis of various tumor types and matched/unmatched primary tumors. These databases mainly serve as raw data archives, and cannot provide the full utility of transcriptome data for users. It still requires highly developed bioinformatics skills to manipulate metastasis-related data pipelines. These valuable resources provided an opportunity to explore metastasis-associated genes from large amount of samples. Recently, many cancer-related databases have been developed to store cancer-related gene expression and functional information. For example, cBioPortal (23) provided multidimensional cancer genomics data sets, however, only few metastasis-related data were involved. Cancer RNA-seq Nexus (24) only integrated recently published cancer RNA-seq data, and the metastasis-related classes of differential expression analyses were not well defined. dbMEMC (25) is a database of differentially expressed miRNAs in human cancers, and contains few metastasis-related miRNAs. CMGene (26) is a literature-based database for cancer metastasis genes, and is not specially for differentially expressed genes of cancer metastasis. These databases provided some information on cancer metastasis, however, our knowledge of metastasis-related gene at the transcriptome level remains limited. To this end, we presented the human cancer metastasis database (HCMDB), the first public database providing published cancer metastasis expression profiles. It also permits the analysis and visualization of mRNA/lncRNA co-expression networks and miRNA regulatory networks. HCMDB is freely accessible to cancer metastasis research community to query and analyze metastasis-related expression data. We expected this database could facilitate the identification of cancer metastasis associated genes and benefit the examination of their roles in physiological and pathological processes of cancer metastasis. All the data in HCMDB is freely available to the public domain through http://hcmdb.i-sanger.com/download.

DATA COLLECTION AND DATABASE CONTENT

Figure 1 depicts the architecture of the HCMDB database. Systematic data searching was conducted for metastasis-related expression profiles using the following keywords: ‘cancer’, ‘tumor’, ‘metastasis’ and ‘epithelial-mesenchymal transition’, in combination with ‘long non-coding RNA’, ‘lncRNA’ and ‘microRNA’ from GEO (https://www.ncbi.nlm.nih.gov/geo/) and SRA database (https://www.ncbi.nlm.nih.gov/sra/). The eligible data are limited to human studies published before June 2017. A total of 620 data sets were initially collected. Then, we manually curated these datasets to ensure that the data contains both primary tumors and metastases. Only the datasets have enough samples (at least three) for both primary tumors and metastases were left. Moreover, the TCGA clinical information was retrieved, and corresponding metastasis-related expression data were also enrolled (82 datasets). At last, the gene expression profiles of 29 primary tumor types from 455 experiments remained, including a total of 11 425 samples (Table 1).
Figure 1.

The framework for constructing HCMDB. The transcriptome data were derived from GEO, SRA, TCGA database. A series of filters were used to ensure the data quality of HCMDB. Metastasis-related literatures were manually to annotate the expression dysregulated genes.

Table 1.

Experiment and sample size in the current version of HCMDB

Primary cancersNo. of experimentsNo. of samplesData sources
Bladder cancer17423TCGA
Brain cancer122GEO
Breast cancer933054GEO, TCGA
Cervical cancer11968GEO, TCGA
Colorectal cancer752440GEO, TCGA, SRA
Esophagus cancer12171TCGA
Ewing's sarcoma137GEO
Eye cancer3121GEO, TCGA
Gastric cancer3404TCGA
Head and neck cancer230GEO
Kindey cancer25353GEO, TCGA
Laryngeal cancer215GEO
Liver cancer30273GEO, SRA
Lung cancer446GEO
Midgut carcinoid tumor439GEO
Nasopharynx cancer222GEO
Oral cancer227GEO
Osteosarcoma330GEO
Ovarian cancer118GEO
Pancreatic cancer27293GEO, TCGA
Pancreatic neuroendocrine tumor1894GEO
Penis cancer333GEO
Prostate cancer49863GEO, TCGA, SRA
Skin cancer38598GEO
Small intestine cancer1187GEO
Synovial sarcoma134GEO
Testicular cancer2142GEO, TCGA
Thymoma2121TCGA
Thyroid cancer15667GEO, TCGA
Total45511 425
The framework for constructing HCMDB. The transcriptome data were derived from GEO, SRA, TCGA database. A series of filters were used to ensure the data quality of HCMDB. Metastasis-related literatures were manually to annotate the expression dysregulated genes. The final dataset contains 351 mRNA expression profiles and 160 miRNA expression profiles. The TCGA data were downloaded from https://portal.gdc.cancer.gov/. For SRA data, an in-house bioinformatics pipeline was employed. Briefly, we evaluated RNA-seq quality using FastQC (version 10.10.1), and sequencing reads were aligned to the human genome (hg19) using Bowtie (27) software (version 1.1.1) with default parameters. Then, we re-assembled a transcriptome using Cufflinks (28) (version 2.2.1). The FPKM (fragments per kilobase of transcript per million mapped reads) values were calculated. To date, lncRNA expression profiles of cancer metastases remain largely unknown. The lncRNA expression of metastases could be detected by mining gene expression microarray data because many lncRNA probes can be identified on the commonly used microarray platforms. So, we used the data mining method to detect lncRNA profiling on the microarray platform. A total of 335 lncRNA expression profiles were involved. For each experiment, the users can compare the expression profiles between different groups. The limma package (29) nested in R (http://www.r-project.org/) was employed to detect differentially expressed genes between different types of samples. Those genes with F.D.R.<0.05 were selected as candidates having significantly different expressions. To explore the biological implications of lncRNA, mRNA-lncRNA co-expressions were calculated using Pearson correlation coefficient. To identify miRNA target genes, PITA (30), miRanda (31) and RNAhybrid (32) software were employed. Those genes that identified by at least two software were regarded as miRNA target genes. The average number of target genes per miRNA is 244. To further annotate metastasis-related genes, we searched Pubmed database (https://www.ncbi.nlm.nih.gov/pubmed/) using the keywords of ‘metastasis’ and corresponding gene symbols. All selected literatures were manually curated by at least two researchers. More than 7000 published papers were systematically reviewed, and 2183 metastasis-related genes were manually curated.

DATABASE CONSTRUCTION

We developed a user-friendly web interface to present HCMDB. A MySQL relational database was set up to store the data. The web interface for browsing and searching was implemented by PHP and JavaScript. The data processing was implemented by our in-house R scripts. The web service is based on an Apache Tomcat web server.

DATABASE FEATURES AND UTILITY

Experimental description

For each experiment presented, we carefully reviewed the experiment design of original paper and all the samples profiled. We retrieved the involved samples and classified them as one of the following categories: primary cancer versus metastasis (33,34), primary cancer with metastasis vs. primary cancer without metastasis (35,36), primary cancer versus respective normal tissues (37,38), metastasis vs. respective normal tissues (39,40), different metastatic tumors of same origin (41), metastatic tumors of diverse origins (42). Most of the groups were defined according to the original experiment design. Primary cancer versus metastasis accounts for the most of the total experiments (Figure 2, 34%). After the samples were assigned to different groups, differential gene expression was assessed with limma package. For each experiment, the detailed information was provided to delineate the experimental design, cancer type and the total number of genes identified.
Figure 2.

The percentage of categories of differential expression analyses.

The percentage of categories of differential expression analyses.

Database query and search platform

A user-friendly web interface was developed to present the HCMDB (Figure 3). We herein provided several ways to allow database query. First, a search engine was provided in HCMDB using gene names from the ‘Search’ page. Users can input their interested gene names in the textbox (including mRNAs, lncRNAs and miRNAs), and all the items that contain the query genes in the database can be derived. The searching result page lists the related experiment ID, primary cancer type, metastasis, experiment design, number of samples. A box plot was provided for the comparison of gene expression between different groups. Moreover, manually curated literatures related to the focal genes were also shown in the result page. Second, users can select particular primary cancer type and browse related experiments from the ‘Browse’ page. We also provided other optional filters to help users focus on the most interested results, such as sample size (<10, 10–50, 50–100, >100) and gene type (mRNA, lncRNA, miRNA).
Figure 3.

The schematic workflow of HCMDB.

The schematic workflow of HCMDB.

Detailed experiment page

By clicking the hyperlink of a particular experiment ID, the detailed result of a specific experiment can be provided. This page mainly consists of four different sections: experiment description, expression profile, functional categories and regulatory network. In the ‘experiment description’ section, a detailed experiment description was listed. In addition, the number of mRNA/lncRNA/miRNA was also provided. In the ‘expression profile’ section, a list of gene name, differential expression result derived from limma software, such as P-value, F.D.R., log Fold Change were displayed. HCMDB creates a heatmap to facilitate users to examine differential expression profiles between different groups. In the ‘functional categories’ section, HCMDB provides the over-represented GO categories and KEGG pathways of differentially expressed genes. Fisher's exact test was used to identity over-represented GO and KEGG categories. In the ‘regulatory network’ section, HCMDB presents and visualizes the mRNA-mRNA and mRNA-lncRNA co-expression network, and force-directed and circus layout were employed. If the experiment contains both mRNA and miRNA expression profiles, the miRNA-targets regulatory network was also provided.

DISCUSSION AND CONCLUSION

Integrated analysis of multi-dimensional transcriptomic data is important to our understanding of cancer metastasis (43–49). The decreasing cost of large-scale technologies has led to tremendous amount of transcriptome data from metastasis studies. These data provided us great opportunities to perform gene expression quantification analysis in metastasis. Moreover, these data would help us identify gene expression signature associated with metastasis. Here, we provided the HCMDB to integrate these resources and facilitate the study of gene expression dysregulation in metastasis. HCMDB is freely accessible to the research community to query and analyze transcriptome data on metastasis. Although some other related works provided some information on human cancer metastasis, such as cBioPortal (23), Cancer RNA-seq Nexus (24), dbMEMC (25) and CMGene (26), there are several advanced features that distinguish HCMDB from other data sources. First, it is a database specific to metastasis-related transcriptome data, and contains a greater number of gene-metastasis associations. Second, the lncRNA expression profiles of metastasis and primary tumors were comprehensively involved. Third, several classes of differential expression analyses were defined when examining the dysregulation pattern. These included primary cancer versus respective normal tissues, metastasis versus respective normal tissues, primary cancer versus metastasis, primary cancer with metastasis versus primary cancer without metastasis, different metastatic tumors of same origin, metastatic tumor of diverse origins. It provided a better way to comprehensively understand the gene expression pattern in metastasis, and the data have implications for our understanding how metastasis arise, and suggest ways in which expression signatures can be improved. The cancer metastasis might be accomplished by the requirement of multiple genetic or genomics alterations. Recent papers have documented that somatic mutations and copy number alterations may play important roles in driving the development of cancer metastasis (50,51). For example, it has been documented that several recurrent copy number alterations might be the driver alterations in metastasis among superficial esophageal squamous cell carcinoma (52). A broader understanding of the genomic changes arising during metastasis will allow us to identify new metastasis-related mechanism and to discover new biomarkers for the diagnostic signatures of metastasis. HCMDB will be continuously updated and provide a unique resource in the following directions: (i) integrate more upcoming metastasis-related transcriptome data. (ii) Collect more comprehensive metastasis genetic and genomic data, including somatic mutation data, copy number alteration data. (iii) Add high-quality drug pharmacological data. Through our efforts, we expect that HCMDB will contribute to research into further understanding of the metastasis transcriptomic regulation mechanisms, and even toward the diagnosis and treatment of cancer metastasis.
  52 in total

Review 1.  The hallmarks of cancer.

Authors:  D Hanahan; R A Weinberg
Journal:  Cell       Date:  2000-01-07       Impact factor: 41.582

2.  A molecular signature of metastasis in primary solid tumors.

Authors:  Sridhar Ramaswamy; Ken N Ross; Eric S Lander; Todd R Golub
Journal:  Nat Genet       Date:  2002-12-09       Impact factor: 38.330

3.  Transcriptional Profiling of Breast Cancer Metastases Identifies Liver Metastasis-Selective Genes Associated with Adverse Outcome in Luminal A Primary Breast Cancer.

Authors:  Siker Kimbung; Ida Johansson; Anna Danielsson; Srinivas Veerla; Suzanne Egyhazi Brage; Marianne Frostvik Stolt; Lambert Skoog; Lena Carlsson; Zakaria Einbeigi; Elisabet Lidbrink; Barbro Linderholm; Niklas Loman; Per-Olof Malmström; Martin Söderberg; Thomas M Walz; Mårten Fernö; Thomas Hatschek; Ingrid Hedenfalk
Journal:  Clin Cancer Res       Date:  2015-08-14       Impact factor: 12.531

Review 4.  How epigenetics can explain human metastasis: a new role for microRNAs.

Authors:  Amaia Lujambio; Manel Esteller
Journal:  Cell Cycle       Date:  2009-02-01       Impact factor: 4.534

5.  A genomic and transcriptomic approach for a differential diagnosis between primary and secondary ovarian carcinomas in patients with a previous history of breast cancer.

Authors:  Jean-Philippe Meyniel; Paul H Cottu; Charles Decraene; Marc-Henri Stern; Jérôme Couturier; Ingrid Lebigot; André Nicolas; Nina Weber; Virginie Fourchotte; Séverine Alran; Audrey Rapinat; David Gentien; Sergio Roman-Roman; Laurent Mignot; Xavier Sastre-Garau
Journal:  BMC Cancer       Date:  2010-05-21       Impact factor: 4.430

Review 6.  The pathogenesis of cancer metastasis.

Authors:  G Poste; I J Fidler
Journal:  Nature       Date:  1980-01-10       Impact factor: 49.962

7.  Molecular signature associated with bone marrow micrometastasis in human breast cancer.

Authors:  Ute Woelfle; Jacqueline Cloos; Guido Sauter; Lutz Riethdorf; Fritz Jänicke; Paul van Diest; Ruud Brakenhoff; Klaus Pantel
Journal:  Cancer Res       Date:  2003-09-15       Impact factor: 12.701

8.  Basal-like Breast cancer DNA copy number losses identify genes involved in genomic instability, response to therapy, and patient survival.

Authors:  Victor J Weigman; Hann-Hsiang Chao; Andrey A Shabalin; Xiaping He; Joel S Parker; Silje H Nordgard; Tatyana Grushko; Dezheng Huo; Chika Nwachukwu; Andrew Nobel; Vessela N Kristensen; Anne-Lise Børresen-Dale; Olufunmilayo I Olopade; Charles M Perou
Journal:  Breast Cancer Res Treat       Date:  2011-11-03       Impact factor: 4.872

Review 9.  The non-coding transcriptome as a dynamic regulator of cancer metastasis.

Authors:  Francesco Crea; Pier Luc Clermont; Abhijit Parolia; Yuzhuo Wang; Cheryl D Helgason
Journal:  Cancer Metastasis Rev       Date:  2014-03       Impact factor: 9.264

10.  Cancer RNA-Seq Nexus: a database of phenotype-specific transcriptome profiling in cancer cells.

Authors:  Jian-Rong Li; Chuan-Hu Sun; Wenyuan Li; Rou-Fang Chao; Chieh-Chen Huang; Xianghong Jasmine Zhou; Chun-Chi Liu
Journal:  Nucleic Acids Res       Date:  2015-11-23       Impact factor: 19.160

View more
  47 in total

1.  Ex vivo screen identifies CDK12 as a metastatic vulnerability in osteosarcoma.

Authors:  Ian Bayles; Malgorzata Krajewska; W Dean Pontius; Alina Saiakhova; James J Morrow; Cynthia Bartels; Jim Lu; Zachary J Faber; Yuriy Fedorov; Ellen S Hong; Jaret M Karnuta; Brian Rubin; Drew J Adams; Rani E George; Peter C Scacheri
Journal:  J Clin Invest       Date:  2019-10-01       Impact factor: 14.808

2.  The novel lncRNA CALIC upregulates AXL to promote colon cancer metastasis.

Authors:  Yoshihiro Kawasaki; Masaya Miyamoto; Takeaki Oda; Kosuke Matsumura; Lumi Negishi; Ryuichiro Nakato; Sakiko Suda; Naoko Yokota; Katsuhiko Shirahige; Tetsu Akiyama
Journal:  EMBO Rep       Date:  2019-07-29       Impact factor: 8.807

3.  CircRNA circ_0006156 inhibits the metastasis of prostate cancer by blocking the ubiquitination of S100A9.

Authors:  Yuwei Zhang; Fengping Liu; Yangkun Feng; Xinyu Xu; Yang Wang; Sha Zhu; Jian Dong; Shanchao Zhao; Bin Xu; Ninghan Feng
Journal:  Cancer Gene Ther       Date:  2022-06-27       Impact factor: 5.987

4.  ctcRbase: the gene expression database of circulating tumor cells and microemboli.

Authors:  Lei Zhao; Xiaohong Wu; Tong Li; Jian Luo; Dong Dong
Journal:  Database (Oxford)       Date:  2020-01-01       Impact factor: 3.451

5.  Nrf2 Activation Promotes Lung Cancer Metastasis by Inhibiting the Degradation of Bach1.

Authors:  Luca Lignitto; Sarah E LeBoeuf; Harrison Homer; Shaowen Jiang; Manor Askenazi; Triantafyllia R Karakousi; Harvey I Pass; Arjun J Bhutkar; Aristotelis Tsirigos; Beatrix Ueberheide; Volkan I Sayin; Thales Papagiannakopoulos; Michele Pagano
Journal:  Cell       Date:  2019-06-27       Impact factor: 41.582

6.  Identification and preliminary validation of a four-gene signature to predict metastasis and survival in osteosarcoma.

Authors:  Yiming Zhang; Xuan Lei; Rong He; Lianghao Mao; Pan Jiang; Chenlie Ni; Xinyu Zhong; Zhengyu Yin; Xuan Wu; Dapeng Li; Qiping Zheng
Journal:  Am J Transl Res       Date:  2021-11-15       Impact factor: 4.060

7.  Preliminary Radiogenomic Evidence for the Prediction of Metastasis and Chemotherapy Response in Pediatric Patients with Osteosarcoma Using 18F-FDF PET/CT, EZRIN and KI67.

Authors:  Byung-Chul Kim; Jingyu Kim; Kangsan Kim; Byung Hyun Byun; Ilhan Lim; Chang-Bae Kong; Won Seok Song; Jae-Soo Koh; Sang-Keun Woo
Journal:  Cancers (Basel)       Date:  2021-05-28       Impact factor: 6.639

8.  Pathogen Moonlighting Proteins: From Ancestral Key Metabolic Enzymes to Virulence Factors.

Authors:  Luis Franco-Serrano; David Sánchez-Redondo; Araceli Nájar-García; Sergio Hernández; Isaac Amela; Josep Antoni Perez-Pons; Jaume Piñol; Angel Mozo-Villarias; Juan Cedano; Enrique Querol
Journal:  Microorganisms       Date:  2021-06-15

9.  Breast tumor stiffness instructs bone metastasis via maintenance of mechanical conditioning.

Authors:  Adam W Watson; Adam D Grant; Sara S Parker; Samantha Hill; Michael B Whalen; Jayati Chakrabarti; Michael W Harman; Mackenzie R Roman; Brittany L Forte; Cody C Gowan; Raúl Castro-Portuguez; Lindsey K Stolze; Christian Franck; Darren A Cusanovich; Yana Zavros; Megha Padi; Casey E Romanoski; Ghassan Mouneimne
Journal:  Cell Rep       Date:  2021-06-29       Impact factor: 9.423

10.  Downregulation of m6A Reader YTHDC2 Promotes the Proliferation and Migration of Malignant Lung Cells via CYLD/NF-κB Pathway.

Authors:  Jin Wang; Lirong Tan; Beibei Jia; Xiaofan Yu; Ruixin Yao; Nan OUYang; Xueting Yu; Xiyuan Cao; Jian Tong; Tao Chen; Rui Chen; Jianxiang Li
Journal:  Int J Biol Sci       Date:  2021-06-22       Impact factor: 6.580

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.