Literature DB >> 21609949

BioProfiling.de: analytical web portal for high-throughput cell biology.

Alexey V Antonov1.   

Abstract

BioProfiling.de provides a comprehensive analytical toolkit for the interpretation gene/protein lists. As input, BioProfiling.de accepts a gene/protein list. As output, in one submission, the gene list is analyzed by a collection of tools which employs advanced enrichment or network-based statistical frameworks. The gene list is profiled with respect to the most information available regarding gene function, protein interactions, pathway relationships, in silico predicted microRNA to gene associations, as well as, information collected by text mining. BioProfiling.de provides a user friendly dialog-driven web interface for several model organisms and supports most available gene identifiers. The web portal is freely available at http://www.BioProfiling.de/gene_list.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21609949      PMCID: PMC3125774          DOI: 10.1093/nar/gkr372

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The development of high-throughput technologies has a dramatic impact on modern biology. Although being different technically, the experimental output of ‘omics’ technologies in the majority of cases is reduced to a list of genes/proteins. Genes or proteins that are differentially expressed or co-expressed across varying cellular conditions or have different epigenetic or mutational status are commonly delivered in many biological and clinically related studies. Functional profiling had become the de facto standard approach for the analysis of high-throughput data (1). Functional profiling can be generally defined as a statistical procedure to understand functional context of the gene/protein list using prior knowledge of gene properties and interactions (1–5). The most widespread example of functional profiling is enrichment analysis of Gene Ontology (GO) terms (6–10). Recently, we have introduced several web tools, which employ either an advance enrichment profiling schema [ProfCom (11), GeneSet2MiRNA (12), PLIPS (13), CCancer (14)] or a network-based statistical framework [KEGG spider (15), PPI spider (16), R spider (17)] for the interpretation of gene/protein lists based on available prior knowledge stored in public databases. BioProfiling.de provides experimentalists with an efficient interface to these tools: in one submission, the gene list is profiled with respect to the most information available regarding gene function [GO(18)], pathway relations [KEGG database (19), Reactome knowledgebase (20)], protein interactions [IntAct (21)], in silico predicted gene to MiRNA associations [GeneSet2MiRNA (12)] and information collected by text mining [PLIPS&CCancer (13,14)]. BioProfiling.de is not only a common interface for the collection of recently developed tools but also a pipeline for the fast implementation of new tools capable of exploring novel biological principles to group genes into functional classes or to associate genes into a global gene network. For example, ProfCom_PROT_MOTIFS is a new tool implemented within BioProfiling.de pipeline. In this case, genes are grouped into functional classes based on amino acid triplet composition of their protein products. ProfCom_PROT_MOTIFS employs the ‘ProfCom’ statistical framework to identify ‘amino acid triplets’ and logical combinations of ‘amino acid triplets’ overrepresented in the submitted gene/protein list. BioProfiling.de provides a user-friendly dialog-driven web interface and supports most available gene/protein identifiers. BioProfiling.de provides analyses for the six organisms: Homo sapiens, Mus musculus, Rattus norvegicus, Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana.

MATERIALS AND METHODS

Statistical frameworks

The prior knowledge about gene/protein function and interactions is commonly reduced to two data models, either grouping genes into classes based on the shared feature (Type 1) or connecting a pair of genes by edges (Type 2). The GO (18) database is an example of Type 1 data, while IntAct (21) database of protein–protein interactions is an example of Type 2 data. BioProfiling.de implements two different statistical frameworks to deal with both types of prior knowledge. The first statistical framework, referred to as ProfCom, is related to the Type 1 data and represents advanced enrichment schema. The second statistical framework, referred to as Global Network, was recently introduced to deal with Type 2 data.

ProfCom

In this case, the prior knowledge represents grouping genes into functional classes (GO terms) or grouping genes based on whether or not they are regulated by the same microRNA. Let us denote each class (i.e. GO term, microRNA, ‘amino acid triplet’) as f and the set of all available classes as F. In a standard enrichment schema, a query list of genes (referred to as list A) and a reference list (referred to as list B, usually all genes from the genome) are compared. For each class f from the set F, the number a of genes in the list A and the number b of genes in the list B that have been annotated with f are counted. In the next step, the null hypothesis H0 (genes that belong to the set A are independent of having attribute f) is tested. Hypergeometric, binomial or χ2-tests are usually employed to find over/under represented attributes. ProfCom extends the standard enrichment schema by construction ‘complex classes’, which are Boolean combination of the available classes of F. ProfCom uses two Boolean operations: intersection and difference. For example, intersection (AND operator) of two categories f1 and f2 is formally defined by the set of genes that belong to both classes f1 and f2. The difference (NOT operator) between two classes f1 and f is formally defined as the set of genes from f1 which are not in f2. Unlike the standard enrichment schema, which is limited to the set F, ProfCom tests all possible pairwise combinations joined by logical operators AND, NOT from the set F. Next, ProfCom employs the algorithm based on greedy heuristics to search for the most enriched triplet and quadruplet combinations. In the case of triplet and quadruplet combinations, the use of greedy heuristics does not guarantee finding the optimal solution in every case but does significantly reduce the computational complexity. To adjust P-values for multiple testing ProfCom uses both Bonferroni correction and the Monte–Carlo simulation approach.

Global network (spider tools)

In this case, pairwise gene associations of any biological essence are used as prior knowledge in the form of a global gene network (reference gene network). The sub-network inference procedure is based on natural assumptions: These assumptions can be reformulated as standard optimization principle: To realize this optimization principle, a network inference algorithm was recently proposed (15–17). A parameter m is introduced which fixes the maximal number of missing genes between any two input genes to be connected by edge in the output network model. The model is inferred in three steps by fixing m to be 0, 1, 2. At each step, any two input genes are connected by edge if they have less then or equal to m genes in between with respect to the reference gene network. At each step (m = 0, 1, 2), a connected sub-network component with maximal number of input genes is inferred and referred to as model D1, D2, D3, accordingly. It is clear that given a reference network and any input gene list (even randomly generated gene list), some genes from the input list might be connected into sub-network just by chance, in particular, when parameter m is equal 2. All spider tools implement robust statistical framework to estimate P-value of the inferred models. More details can be found in the original publications (15–17,22). most genes from the input list are related and most genes that are not from the input list are unrelated. to find a gene sub-network with maximal number of input genes connected by a minimal number of missing genes (genes that are not from the input list).

BioProfiling.de tools

BioProfiling.de provides a common interface for the collection of recently developed tools. The summary of currently available tools is presented in Table 1. Description and details of the tools can be found in original publications. Here, we provide a short description of the novel (recently unpublished) tools implemented within the BioProfiling.de analytical pipeline.
Table 1.

A Summary of currently available BioProfiling.de tools for the interpretation of gene/protein list

Tool nameStatistical frameworkDatabase (prior knowledge)
ProfCom_GOProfComGO
ProfCom_InerProProfComInterPro database
ProfCom_GO_not_IEAProfComGO
KEGG spiderGlobal NetworkKEGG
PPI spiderGlobal NetworkIntAct
GeneSet2MiRNAProfComIn silico predicted gene to MiRNA regulatory relations
R spiderGlobal NetworkReactome and KEGG
CCancer&PLIPSaStandard EnrichmentCCancer and PLIPS databases
ProfCom_PROT_MOTIFSProfComProtein sequences (amino acid triplets)
CCancer spideraGlobal NetworkCCancer and PLIPS databases

aAvailable only for human genome.

According to the global PPI network, all 47 Bosutinib targets (rectangles), which can be mapped to the global PPI network can be connected into sub-network with maximum two missing genes (triangles) in between. The P-value estimated by Monte–Carlo simulation is < 0.005. A Summary of currently available BioProfiling.de tools for the interpretation of gene/protein list aAvailable only for human genome.

ProfCom PROT_MOTIFS

ProfCom PROT_MOTIFS implements the ‘ProfCom’ statistical framework to identify amino acid triplets or logical combinations of ‘amino acid triplets’ overrepresented in the submitted list (genes are mapped to corresponding proteins). In the case, every ‘amino acid triplet’ represent a functional class (equivalent to GO category) and genes are grouped into the same class if the corresponding protein(s) have the same ‘amino acid triplet’. Single, pair, triplet or quadruplet combinations of amino acid triplets are considered (joined by ‘AND’, ‘NOT’ logical operators) and the ones which mostly discriminate the input list from all other genes are identified.

CCancer spider

CCancer spider implements the ‘Global Network’ statistical framework to analyze gene list using as reference knowledge the global gene association network derived from CCancer&PLIPS database. In total, CCancer&PLIPS database has 5238 gene/protein lists reported in various functional context by independent studies. For each gene pair, the number of times k12 they are reported together (in the same gene/protein list) is counted, as well as, the number of times each gene is reported alone (k1, k2). The standard urn schema is used to derive significantly associated gene pairs. Let us denote the total number of gene/protein lists in CCancer&PLIPS database as N (5 238 at the moment). The value k12 follows a hypergeometric distribution with parameters N, k1 and k2 (k1 balls were drawn without replacement from an urn containing ‘’ balls in total, k2 of which are white). The P-value need to be adjusted for multiple testing (each gene is tested versus all other genes). Bonferroni correction for multiple testing is used. Two genes are connected by edge in resulting global gene network used by CCancer spider if the significance of their association is <0.01.

RESULTS

BioProfiling.de (http://www.BioProfiling.de/gene_list) is a freely available analytical web portal, which provides a comprehensive analytical toolkit for the interpretation gene/protein lists. In one submission, the gene list is analyzed by a collection of tools. BioProfiling.de has a simple user-friendly interface. As input, it accepts several types of gene or protein identifiers, such as ‘Entrez Gene’, ‘Gene Symbols’, ‘UniProt/Swiss-Prot’ (23), ‘IPI - International Protein Index’, ‘UniGene’, ‘Ensembl’ and ‘RefSeq’.

Data submission

To start the analyses, the user needs to upload a text file with gene/protein identifiers and select an organism. After data submission, a link is provided to the ‘Main Result page’. As soon as computations are finished, the results will be available there. The user can either bookmark this page and return to it in 2–3 h or periodically refresh it. The submitted gene/protein Ids are automatically mapped to the ‘Entrez Gene’ ids. Gene Id mapping is an inherently difficult problem. To escape errors in results related to mapping issues, we recommend submitting ‘Entrez Gene’ identifiers. We also suggest several resources (6,24,25), which primarily concentrated to solve Gene Id mapping problem. The mapping report is provided first. If the number of recognized gene/protein ids is less than 10 then the user will get an error message. Next, the table with a short description of the tools available for the submission is provided. Each line of the table corresponds to one tool. The first column of the table specifies the tool name, the second provides the status of the computations (or a link to the results of the tool, in the case the computations are finished). The third column provides a short summary of the tool: the statistical framework, the database of prior knowledge and the total number of gene covered/annotated in the database for the selected genome. After the computations are finished, the status ‘in progress’ is substituted with a link to the tool results (second column of the summary table). The structure of the output is the same for all ‘spider’ tools as well as for all ‘ProfCom’ tools. In the case of the ‘spider’ tools, the main output summarized in the table ‘Enriched sub-networks’, where the details of the best sub-network models (D1, D2, D3) inferred from the submitted gene list are provided. In the case of the ‘ProfCom’ tools, the user initially gets a short summary table which reports the top enriched complex classes of degree 0, 1, 2, 3. The last column in the table (‘full report’) provides links to the detailed reports of the ‘complex class’ of a given degree.

Example: Bosutinib protein targets

BioProfiling.de provides a comprehensive functional profiling of a gene/protein list from various biological perspectives. The next example aims to demonstrate a wide spectrum of biological insights that one can get by using BioProfiling.de. Bosutinib is a novel drug (promiscuous kinase inhibitor). The whole proteome binding spectra of Bosutinib was identified by chemical proteomics (26), in total 55 proteins were reported to be direct Bosutinib interactors. Here, we used BioProfiling.de to understand properties of Bosutinib protein targets. As one might expect, the list of Bosutinib protein targets was significantly enriched from many functional perspectives. Particularly, interesting are results produced by ProfCom_PROT_MOTIF, a new tool in BioProfiling.de collection. In this case, the logical combinations of amino acid triplets highly discriminative between the list of Bosutinib protein targets and the whole-human proteome are reported. For example, logical pattern ‘((DFG and HRD) not (LPY, HEE))’ was present in 50 (out of 55) Bosutinib protein targets while only 305 (out of approximately 25 000) proteins in the whole genome comply with the pattern. The P-value of the enrichment adjusted by Bonferroni correction for multiple testing is 1.6e-77. In addition, results by spider tools (PPI spider, R spider) suggest that Bosutinib protein targets form densely interaction pattern. The result supports the novel ‘network pharmacology’ paradigm (27) in drug discovery: to be effective the drug should target multiple functionally dependent targets.

CONCLUSIONS

BioProfiling.de provides experimentalists a comprehensive toolkit for gene/protein list interpretation. In one submission, the gene list is profiled with respect to the most information available regarding gene function (GO), pathway relations (KEGG database, Reactome knowledgebase), protein interactions (IntAct), in silico predicted gene to MiRNA associations (GeneSet2MiRNA), information collected by text mining (PLIPS and CCancer) and protein ‘amino acid triplets’ composition. BioProfiling.de implements two statistical frameworks (‘ProfCom’ and ‘Global Network’), which allow fast implementation of new tools capable to explore novel biological principles (as prior knowledge) to group genes into functional classes or to associate genes by edge into global gene network. In the future, the collection of tools is going to expand to cover novel biological principles to profile gene/protein list using either ‘ProfCom’ or ‘Global network’ statistical framework. We also would like to point out that both statistical frameworks (‘ProfCom’, ‘Global Network’) are implemented only at BioProfiling.de tools. Although, there are many tools for the functional profiling of gene/protein lists, there several features in both frameworks which make BioProfiling.de distinguishable. Therefore, BioProfiling.de provides a combination of traits that makes it different among other resources available.

FUNDING

This work was supported by the Helmholtz Association “Impuls und Vernetzungsfonds” (Systems Biology Alliance). Funding for open access charge: Helmholtz Zentrum München. Conflict of interest statement. None declared.
  27 in total

1.  PLIPS, an automatically collected database of protein lists reported by proteomics studies.

Authors:  Alexey V Antonov; Sabine Dietmann; Philip Wong; Rodchenkov Igor; Hans W Mewes
Journal:  J Proteome Res       Date:  2009-03       Impact factor: 4.466

2.  PPI spider: a tool for the interpretation of proteomics data in the context of protein-protein interaction networks.

Authors:  Alexey V Antonov; Sabine Dietmann; Igor Rodchenkov; Hans W Mewes
Journal:  Proteomics       Date:  2009-05       Impact factor: 3.984

Review 3.  Network pharmacology: the next paradigm in drug discovery.

Authors:  Andrew L Hopkins
Journal:  Nat Chem Biol       Date:  2008-11       Impact factor: 15.040

4.  Acid elution and one-dimensional shotgun analysis on an Orbitrap mass spectrometer: an application to drug affinity chromatography.

Authors:  Nora V Fernbach; Melanie Planyavsky; André Müller; Florian P Breitwieser; Jacques Colinge; Uwe Rix; Keiryn L Bennett
Journal:  J Proteome Res       Date:  2009-10       Impact factor: 4.466

5.  R spider: a network-based analysis of gene lists by combining signaling and metabolic pathways from Reactome and KEGG databases.

Authors:  Alexey V Antonov; Esther E Schmidt; Sabine Dietmann; Maria Krestyaninova; Henning Hermjakob
Journal:  Nucleic Acids Res       Date:  2010-06-02       Impact factor: 16.971

6.  CCancer: a bird's eye view on gene lists reported in cancer-related studies.

Authors:  Sabine Dietmann; Wanseon Lee; Philip Wong; Igor Rodchenkov; Alexey V Antonov
Journal:  Nucleic Acids Res       Date:  2010-06-06       Impact factor: 16.971

7.  MADGene: retrieval and processing of gene identifier lists for the analysis of heterogeneous microarray datasets.

Authors:  Daniel Baron; Audrey Bihouée; Raluca Teusan; Emeric Dubois; Frédérique Savagner; Marja Steenman; Rémi Houlgatte; Gérard Ramstein
Journal:  Bioinformatics       Date:  2011-01-06       Impact factor: 6.937

8.  GeneSet2miRNA: finding the signature of cooperative miRNA activities in the gene lists.

Authors:  Alexey V Antonov; Sabine Dietmann; Philip Wong; Dominik Lutter; Hans W Mewes
Journal:  Nucleic Acids Res       Date:  2009-05-06       Impact factor: 16.971

9.  KEGG spider: interpretation of genomics data in the context of the global gene metabolic network.

Authors:  Alexey V Antonov; Sabine Dietmann; Hans W Mewes
Journal:  Genome Biol       Date:  2008-12-18       Impact factor: 13.583

10.  The IntAct molecular interaction database in 2010.

Authors:  B Aranda; P Achuthan; Y Alam-Faruque; I Armean; A Bridge; C Derow; M Feuermann; A T Ghanbarian; S Kerrien; J Khadake; J Kerssemakers; C Leroy; M Menden; M Michaut; L Montecchi-Palazzi; S N Neuhauser; S Orchard; V Perreau; B Roechert; K van Eijk; H Hermjakob
Journal:  Nucleic Acids Res       Date:  2009-10-22       Impact factor: 16.971

View more
  67 in total

1.  Angiopoietin pathway gene expression associated with poor breast cancer survival.

Authors:  Rajesh Ramanathan; Amy L Olex; Mikhail Dozmorov; Harry D Bear; Leopoldo Jose Fernandez; Kazuaki Takabe
Journal:  Breast Cancer Res Treat       Date:  2017-01-06       Impact factor: 4.872

2.  Causes and consequences of genetic background effects illuminated by integrative genomic analysis.

Authors:  Christopher H Chandler; Sudarshan Chari; David Tack; Ian Dworkin
Journal:  Genetics       Date:  2014-02-05       Impact factor: 4.562

3.  Genetics of alcohol consumption in Drosophila melanogaster.

Authors:  S Fochler; T V Morozova; M R Davis; A W Gearhart; W Huang; T F C Mackay; R R H Anholt
Journal:  Genes Brain Behav       Date:  2017-07-21       Impact factor: 3.449

4.  Network Analysis Identifies Disease-Specific Pathways for Parkinson's Disease.

Authors:  Chiara Monti; Ilaria Colugnat; Leonardo Lopiano; Adriano Chiò; Tiziana Alberio
Journal:  Mol Neurobiol       Date:  2016-12-21       Impact factor: 5.590

5.  Stearoyl-CoA-desaturase 1 regulates lung cancer stemness via stabilization and nuclear localization of YAP/TAZ.

Authors:  A Noto; C De Vitis; M E Pisanu; G Roscilli; G Ricci; A Catizone; G Sorrentino; G Chianese; O Taglialatela-Scafati; D Trisciuoglio; D Del Bufalo; M Di Martile; A Di Napoli; L Ruco; S Costantini; Z Jakopin; A Budillon; G Melino; G Del Sal; G Ciliberto; R Mancini
Journal:  Oncogene       Date:  2017-04-03       Impact factor: 9.867

6.  Co-expression of RelA/p65 and ACTN4 induces apoptosis in non-small lung carcinoma cells.

Authors:  Ekaterina Lomert; Lidia Turoverova; Daria Kriger; Nikolai D Aksenov; Alina D Nikotina; Alexey Petukhov; Alexey G Mittenberg; Nikolai V Panyushev; Mikhail Khotin; Kirill Volkov; Nikolai A Barlev; Dmitri Tentler
Journal:  Cell Cycle       Date:  2018-01-22       Impact factor: 4.534

7.  Orphan receptor NR4A3 is a novel target of p53 that contributes to apoptosis.

Authors:  Olga Fedorova; Alexey Petukhov; Alexandra Daks; Oleg Shuvalov; Tatyana Leonova; Elena Vasileva; Nikolai Aksenov; Gerry Melino; Nikolai A Barlev
Journal:  Oncogene       Date:  2018-11-19       Impact factor: 9.867

8.  miR-579-3p controls melanoma progression and resistance to target therapy.

Authors:  Luigi Fattore; Rita Mancini; Mario Acunzo; Giulia Romano; Alessandro Laganà; Maria Elena Pisanu; Debora Malpicci; Gabriele Madonna; Domenico Mallardo; Marilena Capone; Franco Fulciniti; Luca Mazzucchelli; Gerardo Botti; Carlo M Croce; Paolo Antonio Ascierto; Gennaro Ciliberto
Journal:  Proc Natl Acad Sci U S A       Date:  2016-08-08       Impact factor: 11.205

Review 9.  Dissecting pharmacological effects of chloroquine in cancer treatment: interference with inflammatory signaling pathways.

Authors:  Lokman Varisli; Osman Cen; Spiros Vlahopoulos
Journal:  Immunology       Date:  2019-12-22       Impact factor: 7.397

10.  A Polysome-Based microRNA Screen Identifies miR-24-3p as a Novel Promigratory miRNA in Mesothelioma.

Authors:  Stefania Oliveto; Roberta Alfieri; Annarita Miluzio; Alessandra Scagliola; Raissa S Secli; Pierluigi Gasparini; Stefano Grosso; Luciano Cascione; Luciano Mutti; Stefano Biffo
Journal:  Cancer Res       Date:  2018-08-02       Impact factor: 12.701

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.