Literature DB >> 29059374

The Pancreatic Expression Database: 2018 update.

Jacek Marzec¹, Abu Z Dayem Ullah¹, Stefano Pirrò¹, Emanuela Gadaleta¹, Tatjana Crnogorac-Jurcevic², Nicholas R Lemoine², Hemant M Kocher³, Claude Chelala^1,4.

Abstract

The Pancreatic Expression Database (PED, http://www.pancreasexpression.org) continues to be a major resource for mining pancreatic -omics data a decade after its initial release. Here, we present recent updates to PED and describe its evolution into a comprehensive resource for extracting, analysing and integrating publicly available multi-omics datasets. A new analytical module has been implemented to run in parallel with the existing literature mining functions. This analytical module has been created using rich data content derived from pancreas-related specimens available through the major data repositories (GEO, ArrayExpress) and international initiatives (TCGA, GENIE, CCLE). Researchers have access to a host of functions to tailor analyses to meet their needs. Results are presented using interactive graphics that allow the molecular data to be visualized in a user-friendly manner. Furthermore, researchers are provided with the means to superimpose layers of molecular information to gain greater insight into alterations and the relationships between them. The literature-mining module has been improved with a redesigned web appearance, restructured query platforms and updated annotations. These updates to PED are in preparation for its integration with the Pancreatic Cancer Research Fund Tissue Bank (PCRFTB), a vital resource of pancreas cancer tissue for researchers to support and promote cutting-edge research.

Entities: Chemical

Mesh：

Year: 2018 PMID： 29059374 PMCID： PMC5753364 DOI： 10.1093/nar/gkx955

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Pancreatic Cancer (PC) is a death sentence for most of its patients and is projected to be one of the leading causes of cancer-related death by 2030, second only to lung cancer (1,2). Across the EU and US, an estimated 77% of PC patients will die within the first year of diagnosis and an estimated 94% of patients will die within 5 years (3,4). A multitude of studies investigated the pathogenesis of pancreatic malignancies and generated a huge volume of –omics data but these findings have not yet translated into clinical improvements. The Pancreatic Expression Database (PED) (5–7) was developed as a data repository to provide researchers with a single-entry point from which to manipulate, mine and integrate these heterogeneous and isolated findings into their own research. Although the emphasis is on pancreatic malignancies, PED also incorporates published findings on pancreatic precursor lesions, including pancreatic intraepithelial neoplasias (PanINs), intraductal papillary mucinous neoplasms (IPMNs) and mucinous cystic neoplasms (MCNs), as well as benign conditions such as chronic pancreatitis. Since its inception in 2007, the literature mining module of PED has been enriched through manual selection of pancreas-related papers, followed by data curation and review of the reported findings. In this release, we also offer researchers the opportunity to explore experimental data via our web-based bioinformatics infrastructure. Manual curation of the vast volume of experimental data available from public repositories is a time-consuming process. As such, PED uses an automated system for data selection and retrieval. This first-level data is made available immediately to the pancreatic cancer community for exploration. Here, we present the recent release of PED, which has novel analytical modalities, capable of analyzing and integrating data generated using a range of technologies, and an enhanced data content and query-building process. With these additions and improvements, PED now offers an unprecedented opportunity to explore, analyze and integrate molecular data derived from a broad range of specimens from tissues and body fluids of healthy people or patients, cell lines and mouse models, thereby expanding its utility for pancreatic cancer research. In future, the established framework will facilitate the integration of PED with the Pancreatic Cancer Research Fund Tissue Bank (PCRFTB; https://www.thepancreastissuebank.org).

NEW DEVELOPMENTS

Analytical module

PED now contains a new module for the interactive analysis of a rich collection of publicly available multi-omics datasets in ArrayExpress, GEO, TCGA, GENIE and CCLE (8–12). It allows simple and efficient exploration of those datasets to perform user-specific queries that are, for example, not addressed or not directly discernible from the original publications.

Automated data selection and retrieval system

We have adopted the Smart Automatic Classification system (SMAC, Pirrò ) to automate the selection and prioritization (13) of relevant articles accessed from PubMed. This is a marked departure from the previous releases of PED, where we relied on manual selection of articles and curation of reported high-level findings. The SMAC architecture has been expanded to identify any molecular data generated by the studies of interest. Where available, these associated experimental data files are downloaded from ArrayExpress or GEO and fed into the relevant analytical pipelines automatically. The automated system opens up the opportunity for periodic enrichment of our resource with minimal manual intervention.

Data content

The module is divided into four components, named after the core data sources: PubMed, TCGA, GENIE and CCLE. These hold expression, genomic and mutation profiles of pancreas-related samples or cell lines. Patient survival data is also available for a large number of patients in the PubMed and TCGA components (Figure 1).

Figure 1.

A Schematic overview of the data in the analytical module of PED.

Analytical features

A wide range of exploratory and investigative functions have been incorporated into PED (Table 1, see full details in the online user guide). The datasets extracted from different sources undergo a pre-processing workflow to ensure comparability and interoperability before being fed into the relevant analytical pipelines. For instance, the normalized mRNA expression data extracted from TCGA, CCLE and GEO/ArrayExpress undergo z-score transformation to ensure that the data from different studies are brought onto the same scale; the copy number data extracted from TCGA, GENIE and CCLE are binary-coded based on a hard threshold (2:Amplification; 1:Gain; -1:Loss; -2:Deletion).

Table 1.

An overview of the features in the analytical module of PED

Data type	Analytical features	Unit of analysis	Component				Description	Display
			PubMed	TCGA	GENIE	CCLE
Gene expression	Principal component analysis	Dataset	✓	✓		✓	Identification of key components of variability in the expression data	Scatter plot (2D, 3D)
								Scree plot
	Tumour purity	Dataset	✓	✓		✓	Estimate tumour purity and the presence of infiltrating stromal/immune cells for each sample	Scatter plot (3D) Table
	Expression profiles						Expression profile summarized across
	- gene-specific	Gene	✓	✓		✓	- biological groups	Box plot
							- samples	Bar plot
	- whole-genome	Biological group	✓				Expression profile using median values of Z-transformed expression score for each gene across the genome	CIRCOS plot
	Correlation analysis	Multiple genes	✓	✓		✓	Pairwise correlations of genes presented by Pearson Product Moment Correlation Coefficients	Heatmap
	Gene networks	Gene(s)	✓	✓		✓	Interactions between genes of interest and their primary neighbours using human interactome dataset from MENTHA (14), overlaid with the expression data summarized across the groups	Network plot Table
Survival	Survival analysis	Gene	✓	✓			Univariate Cox proportional hazards regression analysis to assess the relationship between survival and gene expression	Kaplan-Meier plot

DNA copy number	Copy number analysis
	- gene-specific	Multiple genes		✓	✓	✓	Genomic changes using DNA copy number of genes summarized across biological groups and samples	Heatmap

	- whole-genome	Biological group		✓	✓	✓	Genomic changes using the most frequent copy number alteration events for each gene across the genome	Frequency plot
Mutation	Mutations	Gene			✓		Frequency of different mutation types summarized across biological groups	Bar plot
Gene expression/DNA copy number/Mutation	Multi-omics data integration
	- gene-specific	Gene		✓		✓	Integration of discrete genetic events, such as CNA events and mutations, or relative linear copy-number values with continuous mRNA abundance data	Box plot Scatter plot
	- whole-genome	Biological group		✓		✓	Integration of copy number alteration events with gene expression profile across the genome	CIRCOS plot

The exploratory features provide pre-computed results for principal component analysis (PCA), estimation of tumour purity and whole-genome view of expression or copy number alteration events. The investigative features include expression profiling, correlation analysis, survival analysis, copy number analysis, mutational profiling and gene network analysis, which are performed on the fly based on user-specific query. For TCGA and CCLE datasets that contain data generated from different technologies, an integrative modality is made available to investigate relationships between mutations, expression and copy number changes at single gene or whole-genome level.

Advanced visualizations

Most results are presented in an interactive and informative graphical format using the open source visualization library Plotly (15). The various statistical and scientific charts (see Table 1) allow users to visualize the annotation of data points, zoom to focus on area of interest, exclude/include subgroups in the data, and download as static image files of publication quality. The whole-genome view of expression and/or copy number data is generated using CIRCOS software (16), which also allows users to click on a particular chromosomal band to be redirected for a detailed view of the region of interest in the UCSC Genome Browser (17). Where applicable, results are also presented in an interactive tabular format with filtering, pagination, and sorting options, and are available for download in multiple formats.

Improved literature mining module

The literature mining module has also been significantly updated to ensure that researchers are presented with a refined portal from which they can conduct intuitive biological queries. To this end, we have i) upgraded the BioMart data management system (18) to version 0.9 for improved query-optimization capability, and adopted MartExplorer GUI for intuitive query interface; ii) built a separate query mechanisms for beginners and advanced users. With the query interface containing fewer filters and attributes, beginners can now conduct quick and simple queries, whereas advanced users can access the original elaborate list of filters/attributes to conduct complex queries; iii) updated the annotations and mappings from Ensembl human gene annotations (19) release 63 to 90 to ensure up-to-date annotations with GRCh38/hg38 genome assembly.

Updated documentation

We have incorporated a detailed user guide describing all the features and functionalities in PED. The user guide for the literature mining module has been updated to reflect all components of the new BioMart 0.9 query interface. An ‘Examples of use’ section is available, with practical demonstrations of how the literature mining module can be used for building biologically relevant queries. The user guide for the analytics module provides an overview of the types and sources of data, followed by an exhaustive description of the available analytical features.

DISCUSSION AND FUTURE DIRECTIONS

Encouraged by the exponential growth of high-throughput multi-omics pancreas-related data, this release of PED has made the logical transition from a stand-alone data repository toward a cohesive research platform enriched with publicly available molecular data and equipped with the necessary analytical workflows required to conduct in-depth explorations and tailored investigations. Such a powerful combination of functionalities makes it possible to address challenging problems quickly, such as the discovery of non-invasive biomarkers or the identification of passenger deleted genes for cancer therapeutics. For instance, the concept of collateral lethality has recently been suggested as a potential therapeutic strategy for pancreatic cancer (20), focusing on the identification of co-deleted passenger genes neighbours. Using the literature mining module of PED, researchers can quickly search for homozygously co-deleted genes proximal to TSG, and their paralogous isoforms, a task that otherwise involves laborious data retrieval from a number of relevant studies and considerable additional data processing (see supplementary file 1). Furthermore, the analytics module of PED allows researchers to explore those genes in pancreas-related tissues and cancer cell lines, which aids in pinpointing potential candidates for further validation and pharmacological testing. The updated infrastructure of PED is a major step toward its adoption as the bioinformatics platform of PCRFTB (21). PCRFTB aims to create a unique resource of biological materials and supportive clinical data from patients with different pancreatic malignancies or from healthy donors. PED will provide the bioinformatics platform to support cutting-edge translational research on these samples for the benefit of patients. In the future, both experimental data generated using samples obtained from the tissue bank and the corresponding published findings will be incorporated into PED. The seamless interoperability between the PCRFTB clinical data and PED modules will allow researchers to examine those findings and analyze the molecular data prior to applying for tissues, facilitating data sharing and reducing duplication of efforts. Click here for additional data file.

19 in total

1. Circos: an information aesthetic for comparative genomics.

Authors: Martin Krzywinski; Jacqueline Schein; Inanç Birol; Joseph Connors; Randy Gascoyne; Doug Horsman; Steven J Jones; Marco A Marra
Journal: Genome Res Date: 2009-06-18 Impact factor: 9.043

2. Projecting cancer incidence and deaths to 2030: the unexpected burden of thyroid, liver, and pancreas cancers in the United States.

Authors: Lola Rahib; Benjamin D Smith; Rhonda Aizenberg; Allison B Rosenzweig; Julie M Fleshman; Lynn M Matrisian
Journal: Cancer Res Date: 2014-06-01 Impact factor: 12.701

3. The Gene Expression Omnibus Database.

Authors: Emily Clough; Tanya Barrett
Journal: Methods Mol Biol Date: 2016

4. Pancreatic cancer tissue banks: where are we heading?

Authors: Vickna Balarajah; Archana Ambily; Abu Z Dayem Ullah; Ahmet Imrali; Thomas Dowe; Bilal Al-Sarireh; Mohammed Abu Hilal; Brian R Davidson; Zahir Soonawalla; Matthew Metcalfe; Jo-Anne Chin Aleong; Claude Chelala; Hemant M Kocher
Journal: Future Oncol Date: 2016-08-19 Impact factor: 3.404

5. The GENIE Is Out of the Bottle: Landmark Cancer Genomics Dataset Released.

Authors: Kevin Litchfield; Samra Turajlic; Charles Swanton
Journal: Cancer Discov Date: 2017-08 Impact factor: 39.397

6. Survival after resection of pancreatic adenocarcinoma: results from a single institution over three decades.

Authors: Jordan M Winter; Murray F Brennan; Laura H Tang; Michael I D'Angelica; Ronald P Dematteo; Yuman Fong; David S Klimstra; William R Jarnagin; Peter J Allen
Journal: Ann Surg Oncol Date: 2011-07-15 Impact factor: 5.344

7. Robust rank aggregation for gene list integration and meta-analysis.

Authors: Raivo Kolde; Sven Laur; Priit Adler; Jaak Vilo
Journal: Bioinformatics Date: 2012-01-12 Impact factor: 6.937

8. ArrayExpress update--simplifying data submissions.

Authors: Nikolay Kolesnikov; Emma Hastings; Maria Keays; Olga Melnichuk; Y Amy Tang; Eleanor Williams; Miroslaw Dylag; Natalja Kurbatova; Marco Brandizi; Tony Burdett; Karyn Megy; Ekaterina Pilicheva; Gabriella Rustici; Andrew Tikhonov; Helen Parkinson; Robert Petryszak; Ugis Sarkans; Alvis Brazma
Journal: Nucleic Acids Res Date: 2014-10-31 Impact factor: 16.971

9. The BioMart community portal: an innovative alternative to large, centralized data repositories.

Authors: Damian Smedley; Syed Haider; Steffen Durinck; Luca Pandini; Paolo Provero; James Allen; Olivier Arnaiz; Mohammad Hamza Awedh; Richard Baldock; Giulia Barbiera; Philippe Bardou; Tim Beck; Andrew Blake; Merideth Bonierbale; Anthony J Brookes; Gabriele Bucci; Iwan Buetti; Sarah Burge; Cédric Cabau; Joseph W Carlson; Claude Chelala; Charalambos Chrysostomou; Davide Cittaro; Olivier Collin; Raul Cordova; Rosalind J Cutts; Erik Dassi; Alex Di Genova; Anis Djari; Anthony Esposito; Heather Estrella; Eduardo Eyras; Julio Fernandez-Banet; Simon Forbes; Robert C Free; Takatomo Fujisawa; Emanuela Gadaleta; Jose M Garcia-Manteiga; David Goodstein; Kristian Gray; José Afonso Guerra-Assunção; Bernard Haggarty; Dong-Jin Han; Byung Woo Han; Todd Harris; Jayson Harshbarger; Robert K Hastings; Richard D Hayes; Claire Hoede; Shen Hu; Zhi-Liang Hu; Lucie Hutchins; Zhengyan Kan; Hideya Kawaji; Aminah Keliet; Arnaud Kerhornou; Sunghoon Kim; Rhoda Kinsella; Christophe Klopp; Lei Kong; Daniel Lawson; Dejan Lazarevic; Ji-Hyun Lee; Thomas Letellier; Chuan-Yun Li; Pietro Lio; Chu-Jun Liu; Jie Luo; Alejandro Maass; Jerome Mariette; Thomas Maurel; Stefania Merella; Azza Mostafa Mohamed; Francois Moreews; Ibounyamine Nabihoudine; Nelson Ndegwa; Céline Noirot; Cristian Perez-Llamas; Michael Primig; Alessandro Quattrone; Hadi Quesneville; Davide Rambaldi; James Reecy; Michela Riba; Steven Rosanoff; Amna Ali Saddiq; Elisa Salas; Olivier Sallou; Rebecca Shepherd; Reinhard Simon; Linda Sperling; William Spooner; Daniel M Staines; Delphine Steinbach; Kevin Stone; Elia Stupka; Jon W Teague; Abu Z Dayem Ullah; Jun Wang; Doreen Ware; Marie Wong-Erasmus; Ken Youens-Clark; Amonida Zadissa; Shi-Jian Zhang; Arek Kasprzyk
Journal: Nucleic Acids Res Date: 2015-04-20 Impact factor: 16.971

10. Ensembl 2017.

Authors: Bronwen L Aken; Premanand Achuthan; Wasiu Akanni; M Ridwan Amode; Friederike Bernsdorff; Jyothish Bhai; Konstantinos Billis; Denise Carvalho-Silva; Carla Cummins; Peter Clapham; Laurent Gil; Carlos García Girón; Leo Gordon; Thibaut Hourlier; Sarah E Hunt; Sophie H Janacek; Thomas Juettemann; Stephen Keenan; Matthew R Laird; Ilias Lavidas; Thomas Maurel; William McLaren; Benjamin Moore; Daniel N Murphy; Rishi Nag; Victoria Newman; Michael Nuhn; Chuang Kee Ong; Anne Parker; Mateus Patricio; Harpreet Singh Riat; Daniel Sheppard; Helen Sparrow; Kieron Taylor; Anja Thormann; Alessandro Vullo; Brandon Walts; Steven P Wilder; Amonida Zadissa; Myrto Kostadima; Fergal J Martin; Matthieu Muffato; Emily Perry; Magali Ruffier; Daniel M Staines; Stephen J Trevanion; Fiona Cunningham; Andrew Yates; Daniel R Zerbino; Paul Flicek
Journal: Nucleic Acids Res Date: 2016-11-28 Impact factor: 16.971

3 in total

1. SMAC, a computational system to link literature, biomedical and expression data.

Authors: Stefano Pirrò; Emanuela Gadaleta; Andrea Galgani; Vittorio Colizzi; Claude Chelala
Journal: Sci Rep Date: 2019-07-19 Impact factor: 4.379

2. The A818-6 system as an in-vitro model for studying the role of the transportome in pancreatic cancer.

Authors: Doaa Tawfik; Angela Zaccagnino; Alexander Bernt; Monika Szczepanowski; Wolfram Klapper; Albrecht Schwab; Holger Kalthoff; Anna Trauzold
Journal: BMC Cancer Date: 2020-03-30 Impact factor: 4.430

3. DNA Methylation of PI3K/AKT Pathway-Related Genes Predicts Outcome in Patients with Pancreatic Cancer: A Comprehensive Bioinformatics-Based Study.

Authors: Inês Faleiro; Vânia Palma Roberto; Secil Demirkol Canli; Nicolas A Fraunhoffer; Juan Iovanna; Ali Osmay Gure; Wolfgang Link; Pedro Castelo-Branco
Journal: Cancers (Basel) Date: 2021-12-17 Impact factor: 6.639

3 in total