Literature DB >> 26496946

Colorectal cancer atlas: An integrative resource for genomic and proteomic annotations from colorectal cancer cell lines and tissues.

David Chisanga1, Shivakumar Keerthikumar2, Mohashin Pathan2, Dinuka Ariyaratne2, Hina Kalra2, Stephanie Boukouris2, Nidhi Abraham Mathew2, Haidar Al Saffar2, Lahiru Gangoda2, Ching-Seng Ang3, Oliver M Sieber4, John M Mariadason5, Ramanuj Dasgupta6, Naveen Chilamkurti1, Suresh Mathivanan7.   

Abstract

In order to advance our understanding of colorectal cancer (CRC) development and progression, biomedical researchers have generated large amounts of OMICS data from CRC patient samples and representative cell lines. However, these data are deposited in various repositories or in supplementary tables. A database which integrates data from heterogeneous resources and enables analysis of the multidimensional data sets, specifically pertaining to CRC is currently lacking. Here, we have developed Colorectal Cancer Atlas (http://www.colonatlas.org), an integrated web-based resource that catalogues the genomic and proteomic annotations identified in CRC tissues and cell lines. The data catalogued to-date include sequence variations as well as quantitative and non-quantitative protein expression data. The database enables the analysis of these data in the context of signaling pathways, protein-protein interactions, Gene Ontology terms, protein domains and post-translational modifications. Currently, Colorectal Cancer Atlas contains data for >13 711 CRC tissues, >165 CRC cell lines, 62 251 protein identifications, >8.3 million MS/MS spectra, >18 410 genes with sequence variations (404 278 entries) and 351 pathways with sequence variants. Overall, Colorectal Cancer Atlas has been designed to serve as a central resource to facilitate research in CRC.
© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2015        PMID: 26496946      PMCID: PMC4702801          DOI: 10.1093/nar/gkv1097

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Colorectal cancer (CRC) is the third most common form of cancer and has the fourth highest mortality rate in the world (1). In order to advance our understanding of the initiation and progression of this disease, biomedical researchers have performed global analyses of the genome, epigenome, transcriptome, proteome and metabolome of CRC patient samples and representative cell lines (2–5). According to The Cancer Genome Atlas Network (3), APC, TP53, KRAS, PIK3CA, FBXW7, SMAD4, TCF7L2 and NRAS are the most frequently mutated genes in CRC. Identification of these mutations and associated pathways has advanced our understanding of CRC, is enabling the sub-classification of this disease and is unveiling potential new avenues for treatment. Due to the significant advancements in high-throughput technologies, vast amounts of multidimensional data relevant to the biology of CRC have been generated. To extract meaningful biological insights from these data, researchers previously needed to collate data from a large number of studies. To facilitate this process, a series of databases have been created. For example, cancer gene mutations are currently catalogued in databases including TCGA (3), COSMIC (6), TumorPortal (7), IntOGen (8), Network of Cancer Genes (9) and TSGene (10). These databases provide valuable information of gene variations for a number of tumor types including CRC, however, they are not specifically designed to integrate sequence variations with proteomic data. NetGestal (11) is a web-based framework that allows for integration of OMIC data from multiple species in the context of biological networks (12) and contains data pertaining to human CRC from TCGA. However, there is currently no user-friendly online resource specifically pertaining to CRC which catalogues genomic and proteomic data from literature, databases and TCGA, integrates the sequence variations with protein domain, post-translational modifications and protein–protein interactions. Here, we describe Colorectal Cancer Atlas (http://www.colonatlas.org), an integrated web-based resource which catalogues genomic and proteomic data from CRC tissues and cell lines. Data catalogued include; quantitative and non-quantitative protein expression, sequence variations, cellular signaling pathways, protein–protein interactions, Gene Ontology terms, protein domains and post-translational modifications (PTMs). Data pertaining to genomic sequence variations and protein expression have been manually curated from the scientific literature and collated from other publicly available databases. Colorectal Cancer Atlas is designed to enable a user to search for a specific mutation in any particular cell line, and search for cell lines with and without specific mutations. Currently, Colorectal Cancer Atlas contains data for >13 711 primary CRC tissues, >165 CRC cell lines, 62 251 protein identifications, >8.3 million MS/MS spectra, >18 410 genes with sequence variations, 404 278 sequence variation entries, 351 pathways with sequence variants, 88 819 PTMs and 253 700 protein–protein interactions (Table 1).
Table 1.

Colorectal cancer atlas statistics

Protein entries62 251
MS/MS spectra8 378 422
Primary tissues13 711
Cell lines165
Genes with sequence variants18 410
Gene sequence variants404 278
Pathways with genes having sequence variants351
Pathways with genes having no sequence variants1657
Cell lines with drug sensitivity27
PTMs88 819
PTMs affected by sequence variants1631
Protein–protein interactions253 700

DATABASE ARCHITECTURE AND WEB INTERFACE

Colorectal Cancer Atlas is a web-based application developed using Zope2 (version 2.8.7–1), a python-based web framework. The back end database is MySQL (version 5.0.95), a well-established open source database. The web pages were developed using Hyper Text Markup Language (HTML) in combination with JavaScript for front end functionality, while Python (version 2.4.3), a scripting language was used for database connectivity. JavaScript modules include DataTables (version 1.10.4) for the development of interactive data tables, Data-Driven Documents (D3JS) for the development of interactive protein–protein interaction networks, and Highcharts (version 4.1.6) for the development of interactive heat maps and column charts.

GENOMIC DATA SETS

Colorectal Cancer Atlas catalogues gene sequence variations present in primary CRC tissues and cell lines which were collated by manual curation of the scientific literature. In addition, the database contains genomic variations identified in CRC cell lines sequenced in-house. For cell lines, where available, the gender and age of the patient is provided, along with the specific cell type, doubling time, culture properties and stage of cancer. This information was obtained from the Cancer Cell Line Encyclopedia (13), ATCC (http://www.atcc.org), COSMIC database and literature. Sequence variation details including the type of sequence variants, putative mutational effects, nucleotide change and amino acid changes are displayed.

PROTEOMIC DATA SETS

Colorectal Cancer Atlas also catalogues proteomic data collated from multiple resources including the scientific literature (e.g. Zhang et al. (5)), Human Protein Atlas (14), Human Proteinpedia (15) and Human Protein Reference Database (16). Experimental techniques used in generating these data included mass spectrometry, Western blotting, immunohistochemistry, confocal microscopy, immunoelectron microscopy and fluorescence-activated cell sorting (FACS). In addition, publicly available label-free quantitative mass spectrometry data for CRC cell lines and tissues were re-analyzed using an in-house proteomics pipeline in order to provide standardized data. The proteomics pipeline involved conversion of raw mass spectrometry data files into the Mascot Generic File Format (MGF) using MsConvert with peak picking (17). The MGF files were then searched using X! Tandem (Sledgehammer edition version 2013.09.01.1) (18) against a target and decoy Human RefSeq protein database. Peptides were further filtered using <5% false discovery rate (FDR) as a cut-off, and quantified using the Normalized Spectral Abundance Factor (NSAF) method (19).

COLORECTAL CANCER ATLAS PROVIDES AN INTEGRATED VIEW OF MULTIPLE DATA TYPES

Colorectal Cancer Atlas provides an integrated view of the sequence variations and the proteomic data. Mass spectrometry-based quantitative proteomic data are depicted as heat maps and column charts in the respective molecular pages (Figure 1), and users are able to filter the data sets based on the FDR. The database also contains protein expression data generated using immunohistochemistry, Western blotting, FACS, confocal and immunoelectron microscopy. The database also includes protein data derived from various cellular fractions including the nucleus, cytoplasm, membrane, the secretome (20) and exosomes (21) (from ExoCarta (22)).
Figure 1.

Snapshot of Colorectal Cancer Atlas features. An overview of proteomic and genomic data features for APC gene is displayed. A user can query the database using a gene symbol or a protein name. A gene information page will provide the users with details pertaining to protein domains, post-translational modifications (PTM), reported mutations in cell lines/tissues, quantitative protein expression, pathway, protein–protein interaction (PPI) and cell line drug sensitivity.

Snapshot of Colorectal Cancer Atlas features. An overview of proteomic and genomic data features for APC gene is displayed. A user can query the database using a gene symbol or a protein name. A gene information page will provide the users with details pertaining to protein domains, post-translational modifications (PTM), reported mutations in cell lines/tissues, quantitative protein expression, pathway, protein–protein interaction (PPI) and cell line drug sensitivity. The integration of sequence variants with proteomic data is designed to facilitate the prediction of functional effects of the protein. For each gene, Colorectal Cancer Atlas enables parallel visualization of CRC-associated sequence variants with quantitative protein expression across CRC cell lines and tissues. In addition, PTMs, and protein domains affected by the sequence variation can be visualized (Figure 1), enabling the potential effect of sequence variants on protein function to be easily ascertained. For example, β-catenin mutations in positions S33, S37, T41 and S45 occur in CRC, all of which are critical for phosphorylation (23). Mutations in these serine/threonine residues allow for the stabilization of β-catenin and constitutive activation of the Wnt signaling pathway. Similarly, Colorectal Cancer Atlas displays sequence variations in known protein domains which can provide valuable insight into the putative effect on protein function. For example, mutations in the armadillo domain (R582) in β-catenin have been described which have been reported to alter the binding of β-catenin to TCF4 (24) (Figure 2).
Figure 2.

PTMs and domains in β-catenin are affected due to mutation. Snapshot of β-catenin molecular page is displayed. The PTMs affected by mutations can be viewed in the tab PTMs. Mutations in β-catenin at positions important for phosphorylation (S33, S37, T41 and S45) allows for the stabilization of β-catenin and constitutive activation of the Wnt signaling pathway. The upstream kinases responsible for the phosphorylation is also provided along with the literature reference. Likewise, mutations in the armadillo domain can be viewed by correlating the sequence variants and the domain span regions. For example, mutations in the armadillo domain (p.R582) in β-catenin have been described which have been reported to alter the binding of β-catenin to TCF4 (24).

PTMs and domains in β-catenin are affected due to mutation. Snapshot of β-catenin molecular page is displayed. The PTMs affected by mutations can be viewed in the tab PTMs. Mutations in β-catenin at positions important for phosphorylation (S33, S37, T41 and S45) allows for the stabilization of β-catenin and constitutive activation of the Wnt signaling pathway. The upstream kinases responsible for the phosphorylation is also provided along with the literature reference. Likewise, mutations in the armadillo domain can be viewed by correlating the sequence variants and the domain span regions. For example, mutations in the armadillo domain (p.R582) in β-catenin have been described which have been reported to alter the binding of β-catenin to TCF4 (24). Colorectal Cancer Atlas also provides a graphical representation of known protein interactions (obtained from BioGrid (25) and Human Protein Resource Database (16)), where each protein is depicted as a node with a specific colour and intensity corresponding to the number of sequence variants in the encoding gene (Figure 1). Furthermore, Colorectal Cancer Atlas integrates biological pathways with gene sequence variants. Biological Pathways were obtained from Reactome (26), KEGG (27), Cell map and HumanCyc. For example, as shown in Figure 1, sequence variants in APC are implicated in dysregulation of the Wnt signaling pathway and actin cytoskeletal remodeling. Finally, Colorectal Cancer Atlas contains data on 5-flurouracil (5-FU) drug sensitivity for CRC cell lines curated from the literature (studies using at least three CRC cell lines (28)). Users can view the sensitivity profile of a cell line of interest relative to other CRC cells.

ACCESSING COLORECTAL CANCER ATLAS

Users can search Colorectal Cancer Atlas through the home, query or browse pages (Supplementary Figure S1). In addition, the website features a navigation menu and a search box at the top of the page. The database can be queried by gene symbol, Entrez Gene ID, protein name, cell line name or pathway. The browse page provides users with the option to access the database by categorized lists of genes, sequence variations, cell lines and techniques. The browse page allows the users to search for sequence variations in genes of interest and displays them in interactive color-coded table format. The gene information page includes gene details, associated GO terms, sequence variations (displayed in an interactive table), domain details, PTMs, a protein data page leading to experimental techniques and quantitative data with an interactive heat map, a column chart for spectral abundance and a list of detected peptides. Other information includes a list of cell lines and tissues that contain sequence variants in a given gene, a list of pathways in which the gene is involved, and an interactive protein–protein interaction network for the protein encoded by the gene. The cell line page provides details of the cell line, an interactive table of gene sequence variants identified in the cell line, an interactive table of dysregulated pathways and 5-FU drug sensitivity profile. Data curated in Colorectal Cancer Atlas are available as tab-delimited files and is free for download to all users. Using the custom database option, the tab delimited data can also be uploaded into FunRich (29), a functional enrichment analysis tool to identify classes of genes/proteins that are overrepresented in a specific category.

FUTURE DIRECTIONS

Colorectal Cancer Atlas will be continuously updated with more studies as they become available and additional features. Studies currently being curated include Wnt signaling activity determined by the TOPFLASH assay, and genomic and proteomic data generated from patient derived xenografts.
  28 in total

1.  Proteomics. Tissue-based map of the human proteome.

Authors:  Mathias Uhlén; Linn Fagerberg; Björn M Hallström; Cecilia Lindskog; Per Oksvold; Adil Mardinoglu; Åsa Sivertsson; Caroline Kampf; Evelina Sjöstedt; Anna Asplund; IngMarie Olsson; Karolina Edlund; Emma Lundberg; Sanjay Navani; Cristina Al-Khalili Szigyarto; Jacob Odeberg; Dijana Djureinovic; Jenny Ottosson Takanen; Sophia Hober; Tove Alm; Per-Henrik Edqvist; Holger Berling; Hanna Tegel; Jan Mulder; Johan Rockberg; Peter Nilsson; Jochen M Schwenk; Marica Hamsten; Kalle von Feilitzen; Mattias Forsberg; Lukas Persson; Fredric Johansson; Martin Zwahlen; Gunnar von Heijne; Jens Nielsen; Fredrik Pontén
Journal:  Science       Date:  2015-01-23       Impact factor: 47.728

2.  Proteogenomic characterization of human colon and rectal cancer.

Authors:  Bing Zhang; Jing Wang; Xiaojing Wang; Jing Zhu; Qi Liu; Zhiao Shi; Matthew C Chambers; Lisa J Zimmerman; Kent F Shaddox; Sangtae Kim; Sherri R Davies; Sean Wang; Pei Wang; Christopher R Kinsinger; Robert C Rivers; Henry Rodriguez; R Reid Townsend; Matthew J C Ellis; Steven A Carr; David L Tabb; Robert J Coffey; Robbert J C Slebos; Daniel C Liebler
Journal:  Nature       Date:  2014-07-20       Impact factor: 49.962

3.  A colorectal cancer classification system that associates cellular phenotype and responses to therapy.

Authors:  Anguraj Sadanandam; Costas A Lyssiotis; Krisztian Homicsko; Eric A Collisson; William J Gibb; Stephan Wullschleger; Liliane C Gonzalez Ostos; William A Lannon; Carsten Grotzinger; Maguy Del Rio; Benoit Lhermitte; Adam B Olshen; Bertram Wiedenmann; Lewis C Cantley; Joe W Gray; Douglas Hanahan
Journal:  Nat Med       Date:  2013-04-14       Impact factor: 53.440

4.  Gene expression profiling-based prediction of response of colon carcinoma cells to 5-fluorouracil and camptothecin.

Authors:  John M Mariadason; Diego Arango; Qiuhu Shi; Andrew J Wilson; Georgia A Corner; Courtney Nicholas; Maria J Aranes; Martin Lesser; Edward L Schwartz; Leonard H Augenlicht
Journal:  Cancer Res       Date:  2003-12-15       Impact factor: 12.701

5.  Identifying mutated proteins secreted by colon cancer cell lines using mass spectrometry.

Authors:  Suresh Mathivanan; Hong Ji; Bow J Tauro; Yuan-Shou Chen; Richard J Simpson
Journal:  J Proteomics       Date:  2012-07-13       Impact factor: 4.044

6.  A cross-platform toolkit for mass spectrometry and proteomics.

Authors:  Matthew C Chambers; Brendan Maclean; Robert Burke; Dario Amodei; Daniel L Ruderman; Steffen Neumann; Laurent Gatto; Bernd Fischer; Brian Pratt; Jarrett Egertson; Katherine Hoff; Darren Kessner; Natalie Tasman; Nicholas Shulman; Barbara Frewen; Tahmina A Baker; Mi-Youn Brusniak; Christopher Paulse; David Creasy; Lisa Flashner; Kian Kani; Chris Moulding; Sean L Seymour; Lydia M Nuwaysir; Brent Lefebvre; Frank Kuhlmann; Joe Roark; Paape Rainer; Suckau Detlev; Tina Hemenway; Andreas Huhmer; James Langridge; Brian Connolly; Trey Chadick; Krisztina Holly; Josh Eckels; Eric W Deutsch; Robert L Moritz; Jonathan E Katz; David B Agus; Michael MacCoss; David L Tabb; Parag Mallick
Journal:  Nat Biotechnol       Date:  2012-10       Impact factor: 54.908

7.  The BioGRID interaction database: 2015 update.

Authors:  Andrew Chatr-Aryamontri; Bobby-Joe Breitkreutz; Rose Oughtred; Lorrie Boucher; Sven Heinicke; Daici Chen; Chris Stark; Ashton Breitkreutz; Nadine Kolas; Lara O'Donnell; Teresa Reguly; Julie Nixon; Lindsay Ramage; Andrew Winter; Adnane Sellam; Christie Chang; Jodi Hirschman; Chandra Theesfeld; Jennifer Rust; Michael S Livstone; Kara Dolinski; Mike Tyers
Journal:  Nucleic Acids Res       Date:  2014-11-26       Impact factor: 19.160

8.  Empowering biologists with multi-omics data: colorectal cancer as a paradigm.

Authors:  Jing Zhu; Zhiao Shi; Jing Wang; Bing Zhang
Journal:  Bioinformatics       Date:  2014-12-18       Impact factor: 6.937

9.  NCG 4.0: the network of cancer genes in the era of massive mutational screenings of cancer genomes.

Authors:  Omer An; Vera Pendino; Matteo D'Antonio; Emanuele Ratti; Marco Gentilini; Francesca D Ciccarelli
Journal:  Database (Oxford)       Date:  2014-03-07       Impact factor: 3.451

10.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer.

Authors:  Simon A Forbes; David Beare; Prasad Gunasekaran; Kenric Leung; Nidhi Bindal; Harry Boutselakis; Minjie Ding; Sally Bamford; Charlotte Cole; Sari Ward; Chai Yin Kok; Mingming Jia; Tisham De; Jon W Teague; Michael R Stratton; Ultan McDermott; Peter J Campbell
Journal:  Nucleic Acids Res       Date:  2014-10-29       Impact factor: 16.971

View more
  23 in total

1.  The Cellosaurus, a Cell-Line Knowledge Resource.

Authors:  Amos Bairoch
Journal:  J Biomol Tech       Date:  2018-05-10

2.  The proprotein convertase furin is a pro-oncogenic driver in KRAS and BRAF driven colorectal cancer.

Authors:  Abdel-Majid Khatib; John W M Creemers; Zongsheng He; Lieven Thorrez; Geraldine Siegfried; Sandra Meulemans; Serge Evrard; Sabine Tejpar
Journal:  Oncogene       Date:  2020-03-06       Impact factor: 9.867

3.  Integrating Next-Generation Genomic Sequencing and Mass Spectrometry To Estimate Allele-Specific Protein Abundance in Human Brain.

Authors:  Thomas S Wingo; Duc M Duong; Maotian Zhou; Eric B Dammer; Hao Wu; David J Cutler; James J Lah; Allan I Levey; Nicholas T Seyfried
Journal:  J Proteome Res       Date:  2017-08-09       Impact factor: 4.466

4.  An in vivo RNAi screen uncovers the role of AdoR signaling and adenosine deaminase in controlling intestinal stem cell activity.

Authors:  Chiwei Xu; Brian Franklin; Hong-Wen Tang; Yannik Regimbald-Dumas; Yanhui Hu; Justine Ramos; Justin A Bosch; Christians Villalta; Xi He; Norbert Perrimon
Journal:  Proc Natl Acad Sci U S A       Date:  2019-12-18       Impact factor: 11.205

5.  Application of open-access databases to determine functional connectivity between resveratrol-binding protein QR2 and colorectal carcinoma.

Authors:  Barbara B Doonan; Evelien Schaafsma; John T Pinto; Joseph M Wu; Tze-Chen Hsieh
Journal:  In Vitro Cell Dev Biol Anim       Date:  2017-06-23       Impact factor: 2.416

6.  Identification of key candidate genes and pathways associated with colorectal aberrant crypt foci-to-adenoma-to-carcinoma progression.

Authors:  Setareh Fayazfar; Afsaneh Arefi Oskouie; Akram Safaei; Hakimeh Zali; Ehsan Nazemalhosseini Mojarad
Journal:  Gastroenterol Hepatol Bed Bench       Date:  2021

7.  TCGA-My: A Systematic Repository for Systems Biology of Malaysian Colorectal Cancer.

Authors:  Mohd Amin Azuwar; Nor Azlan Nor Muhammad; Nor Afiqah-Aleng; Nurul-Syakima Ab Mutalib; Najwa Farhah Md Yusof; Ryia Illani Mohd Yunos; Muhiddin Ishak; Sazuita Saidin; Isa Mohamed Rose; Ismail Sagap; Luqman Mazlan; Zairul Azwan Mohd Azman; Musalmah Mazlan; Sharaniza Ab Rahim; Wan Zurinah Wan Ngah; Sheila Nathan; Nurul Azmir Amir Hashim; Zeti-Azura Mohamed-Hussein; Rahman Jamal
Journal:  Life (Basel)       Date:  2022-05-24

8.  Genomic variation as a marker of response to neoadjuvant therapy in locally advanced rectal cancer.

Authors:  Jason K Douglas; Rose E Callahan; Zachary A Hothem; Craig S Cousineau; Samer Kawak; Bryan J Thibodeau; Shelli Bergeron; Wei Li; Claire E Peeples; Harry J Wasvary
Journal:  Mol Cell Oncol       Date:  2020-03-02

9.  Inferring novel genes related to colorectal cancer via random walk with restart algorithm.

Authors:  Sheng Lu; Zheng-Gang Zhu; Wen-Cong Lu
Journal:  Gene Ther       Date:  2019-07-15       Impact factor: 5.250

10.  Identification of cancer risk lncRNAs and cancer risk pathways regulated by cancer risk lncRNAs based on genome sequencing data in human cancers.

Authors:  Yiran Li; Wan Li; Binhua Liang; Liansheng Li; Li Wang; Hao Huang; Shanshan Guo; Yahui Wang; Yuehan He; Lina Chen; Weiming He
Journal:  Sci Rep       Date:  2016-12-19       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.