| Literature DB >> 27274817 |
Theofilos Papadopoulos1, Magdalena Krochmal2, Katryna Cisek3, Marco Fernandes4, Holger Husi4, Robert Stevens5, Jean-Loup Bascands1, Joost P Schanstra1, Julie Klein1.
Abstract
In the recent decades, the evolution of omics technologies has led to advances in all biological fields, creating a demand for effective storage, management and exchange of rapidly generated data and research discoveries. To address this need, the development of databases of experimental outputs has become a common part of scientific practice in order to serve as knowledge sources and data-sharing platforms, providing information about genes, transcripts, proteins or metabolites. In this review, we present omics databases available currently, with a special focus on their application in kidney research and possibly in clinical practice. Databases are divided into two categories: general databases with a broad information scope and kidney-specific databases distinctively concentrated on kidney pathologies. In research, databases can be used as a rich source of information about pathophysiological mechanisms and molecular targets. In the future, databases will support clinicians with their decisions, providing better and faster diagnoses and setting the direction towards more preventive, personalized medicine. We also provide a test case demonstrating the potential of biological databases in comparing multi-omics datasets and generating new hypotheses to answer a critical and common diagnostic problem in nephrology practice. In the future, employment of databases combined with data integration and data mining should provide powerful insights into unlocking the mysteries of kidney disease, leading to a potential impact on pharmacological intervention and therapeutic disease management.Entities:
Keywords: bioinformatics; data integration; kidney disease; omics databases; system biology
Year: 2016 PMID: 27274817 PMCID: PMC4886900 DOI: 10.1093/ckj/sfv155
Source DB: PubMed Journal: Clin Kidney J ISSN: 2048-8505
General omics databases
| Name | Description | Main features |
|---|---|---|
| GeneCards ( | Detailed information on all annotated and predicted human genes | Contains >152 000 GeneCards genes |
| Gene-centric data from >100 Web sources from all kind of omics | ||
| Very detailed description of genes (aliases, compounds, proteins, domains, expression, related publications, transcripts, pathways) | ||
| Online Mendelian Inheritance in Man ( | Comprehensive, authoritative compendium of human genes and genetic phenotypes | >15 000 genes |
| Information on all known Mendelian disorders | ||
| Relationship between phenotype and genotype | ||
| Gene Expression Omnibus (GEO) ( | Repository for gene expression datasets supplied by researchers | 3848 array and sequence-based datasets |
| Common data submission procedures ensure good data quality | ||
| Tools for data analysis and visualization are provided | ||
| ArrayExpress ( | Functional genomics archive | 60 054 high-throughput experiments |
| Stores both processed and raw data | ||
| Standardized data submission process, frequent updates, connected with GEO | ||
| Expression Atlas ( | Similar to GEO but with fewer datasets that are more focused on baseline experiments | 1572 datasets |
| Two components: Baseline Atlas for expression in ‘normal’ conditions and differential Atlas for experimental expression data | ||
| miRBase ( | Detailed information on all published and annotated miRNAs | 28 645 entries of miRNAs from 223 species |
| DIANA tools ( | Web tool dedicated to miRNA studies | miRNA target identification, and pathway analysis |
| Published validated miRNA–gene interactions | ||
| Automated pipelines to analyse user data | ||
| miRNA-related publication search | ||
| PRoteomics IDEntifications (PRIDE) ( | Proteomics data repository | Stores 3342 projects on protein/peptide identifications and post-translational modifications and supporting spectral evidence |
| Additional annotation of datasets for better organization | ||
| Human Protein Atlas ( | Protein expression and localization in different tissues and organs (immunochemistry) | Additional information regarding genes, annotations and organs |
| Human Metabolome Database ( | Detailed information on metabolites (chemical, clinical and molecular biology/biochemistry levels) | Contains thousands of metabolites |
| Search in 17 different biofluids and 617 diseases | ||
| Connections with pathways, proteins and reactions | ||
| Multi-Omics Profiling Expression Database ( | Processed multi-omics data | Interactive visualization tools |
Kidney-specific databases
| Name | Description | Main features |
|---|---|---|
| Nephroseq ( | Gene expression in renal disease; integration with clinical data | 26 datasets (1989 samples) |
| Analysis and visualization tools (differential expression, co-expression, outlier, etc.) | ||
| Upload and export tools | ||
| Renal Gene Expression Database ( | Gene expression in renal disease | 88 research papers analysed |
| Easy-to-use interface | ||
| Human Kidney and Urine Proteome Project (HKUPP) ( | Protein expression in normal urine and normal or diseased kidney | Search for proteins in kidney structures (glomerulus, human medulla) and urine |
| Enables viewing two-dimensional gels and query fractions | ||
| Urinary Protein Biomarker Database ( | Candidate protein biomarkers in urine | >400 reports on human and animals |
| 819 human biomarkers, 33 animal biomarkers | ||
| Urinary Peptidomics and Peak-maps ( | Urinary peptides modified in disease | Search by detection methodology and disease |
| Kidney and Urinary Pathway Knowledge Base (KUPKB) ( | Collection of publically available omics data related to renal disease | >220 experiments |
| Easy and fast interface | ||
| Pathway visualization with KUPKB Network Visualizer | ||
| Chronic Kidney Disease database (CKDdb) ( | Collection of publically available omics data related to chronic kidney disease | 366 datasets |
| Search by study, sample, tissue, disease and molecule type | ||
Fig. 1.Proposed workflow for a data-driven approach in multi-omics data integration. After data acquisition via different high-throughput omic platforms, raw data can be stored locally in the owner's database and be pre-processed (data cleaning, filtering, normalization, reduction, etc.). After the pre-processing steps the data are matched with current reference repositories (data curation). The latter metadata can be then deposited in a different database that only displays statistically relevant features, which is much more amenable for collaborative use for researchers in a common project. Single and/or multiple combinations can be used in order to integrate data coming from up- and downstream levels and then used to develop models that together try to mimic the cell environment and represent their own interactome. The state-of-art model would consider simultaneously any network topology, molecular interaction and statistical relevance in order to provide the most robust representation of the cell dynamics when undergoing disease. Every new model requires confirmation by validating some selected molecular features using in vivo or in vitro experiments (immunohistochemistry, qRT-PCR, ELISA, etc.) This is an iterative step, in which to obtain a final model, it could involve several cycles of incrementing new data and testing its validity until an optimal phase is reached where the model is considered suitable for scientific scrutiny.
Fig. 2.The test case pipeline. The comparison of the transcriptomics and proteomics datasets for Diabetic Nephropathy (DN) and IgA nephropathy (IgAN) from the CKDdb, the KUPKB and Nephroseq with the 990 proteins identified in urine in the Jia et al. [30] study yielded 37 common proteins, 25 DN-specific proteins, 9 IgAN-specific and 3 for both cases (Venn diagram). The proteins found as possible biomarker candidates in Jia et al. through analysis in the Urinary Proteome Biomarker Database were excluded, resulting to 21 proteins specific for DN and 5 proteins specific for IgAN.