Literature DB >> 18508807

ENDEAVOUR update: a web resource for gene prioritization in multiple species.

Léon-Charles Tranchevent1, Roland Barriot, Shi Yu, Steven Van Vooren, Peter Van Loo, Bert Coessens, Bart De Moor, Stein Aerts, Yves Moreau.   

Abstract

Endeavour (http://www.esat.kuleuven.be/endeavourweb; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes. Using a training set of genes known to be involved in a biological process of interest, our approach consists of (i) inferring several models (based on various genomic data sources), (ii) applying each model to the candidate genes to rank those candidates against the profile of the known genes and (iii) merging the several rankings into a global ranking of the candidate genes. In the present article, we describe the latest developments of Endeavour. First, we provide a web-based user interface, besides our Java client, to make Endeavour more universally accessible. Second, we support multiple species: in addition to Homo sapiens, we now provide gene prioritization for three major model organisms: Mus musculus, Rattus norvegicus and Caenorhabditis elegans. Third, Endeavour makes use of additional data sources and is now including numerous databases: ontologies and annotations, protein-protein interactions, cis-regulatory information, gene expression data sets, sequence information and text-mining data. We tested the novel version of Endeavour on 32 recent disease gene associations from the literature. Additionally, we describe a number of recent independent studies that made use of Endeavour to prioritize candidate genes for obesity and Type II diabetes, cleft lip and cleft palate, and pulmonary fibrosis.

Entities:  

Mesh:

Year:  2008        PMID: 18508807      PMCID: PMC2447805          DOI: 10.1093/nar/gkn325

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


BACKGROUND

With the recent improvements in high-throughput technologies, many organisms have seen their genomes sequenced and, more importantly, annotated. This process leads to the generation of a large amount of genomic data and the creation and maintenance of corresponding databases. However, converting genomic data into biological knowledge to identify genes involved in a particular process or disease remains a major challenge. Nevertheless, there is much evidence to suggest that functionally related genes often cause similar phenotypes (1–3). To identify which genes are responsible for which phenotype, association studies and linkage analyses are often used, resulting in large lists of candidate genes. In many cases, the list of candidates can be narrowed down to a few dozen. However, it is generally too expensive and time-consuming to perform experimental validation for all these candidates. Therefore, these candidates may be prioritized to first validate the best ones. Given the amount of genomic data publicly available, it is often prohibitive to perform the prioritization manually and consequently, there is a need for computational approaches. During the past 5 years, the bioinformatics community has developed several strategies to address this question, and several tools are available online (4,5). To our knowledge, all the tools use the concept of similarity. It is based on the assumption that similar phenotypes are caused by genes with similar or related functions (1–3). However, the tools differ by the strategy they adopt in calculating the similarity (either between the candidate genes and the phenotypes or between the candidate genes and the training genes) and by the data sources they use. The most commonly used data sources are text-mining data, gene expression data and sequence information. Additionally, phenotypic data, protein–protein interactions, ontologies and cis-regulatory information are sometimes included. However, most of the existing approaches mainly focus on the combination of few data sources. For instance, the combining gene expression and protein interaction data method proposed by Ma et al. (6) combines expression and interaction data. Several methods only rely on literature and ontologies: BITOLA (7), POCUS (8) Gentrepid (9), G2D (10) and the method defined by Tiffin et al. (11). In contrast, systems that use more data sources have recently been designed, such as CAESAR (12), GeneSeeker (13), SUSPECTS (14), TOM (15) and Endeavour (16). For a more detailed description of the available tools, see the reviews by Oti and Brunner (5) or by Zhu and Zhao (4). We previously presented the concept of gene prioritization through genomic data fusion and its implementation called Endeavour (16). This tool requires two inputs: the training genes, already known to be involved in the process under study, and the candidate genes to prioritize. Endeavour produces one output: the prioritized list of candidate genes, along with the rankings per data source. The algorithm is made up of three stages, called the training, scoring and fusion stages. In the training stage, Endeavour uses the training genes provided by the user to infer several models, one per data source. For example, with ontology-based data sources, genes are annotated with several terms and reciprocally one term can be associated to several genes. The algorithm selects only the significant terms, the ones that are over-represented in the training sets compared to the complete genome. Hence, the model consists of these significant terms together with their corresponding P-values that reflect the significance of the enrichment. In the scoring stage, the model is used to score the candidate genes and rank them according to their score. For ontologies, the algorithm scores each candidate independently by combining the P-values of its associated terms that are, at the same time, present in the model. The scores are then used to rank the candidates based on this one data source. In the final stage, the rankings per data source are fused into one global ranking using order statistics. Among the existing methods, the order statistics has the advantage of avoiding penalizing genes that are absent from a given data source. Indeed, the genomic data sources are almost always incomplete. For instance, some genes do not have any ontology annotations, while other genes do not have their corresponding probes spotted on the microarray platform for which data is available. The order statistics allows us to combine the rankings per data source, taking missing values into account. Thus, the use of ‘unbiased’ data sources (e.g. gene expression data, cis-regulatory motifs and protein sequences), together with the use of the order statistics, allows us to obtain results that are not overly biased towards the most studied genes (16). The use of several data sources is indeed an important strength of our approach: combining two data sources, although possibly incomplete, can be more powerful than either individual data source, as shown by our validation experiments (16). The fact that our approach does not rely only on a single data source also reinforces its robustness to noisy data sources like microarray data. More details about the training and scoring methods, the data sources and the order statistics can be found in Supplementary Tables 1 and 2 and in Supplementary Note 1. In the present article, we describe a novel intuitive web interface in addition to the original Java client. Furthermore, three major model organisms have been added to the application: M. musculus, R. norvegicus and C. elegans (Danio rerio and Drosophila melanogaster versions will be made available in 2008). Finally, novel data sources have been integrated including numerous protein–protein interaction databases and large species-specific expression data sets, bringing the number of available data sources to 26. Apart from our extensive validation (16), other recent independent publications confirm that Endeavour is efficient in identifying novel disease genes. Indeed, Endeavour was recently applied to analyze the adipocyte proteome (17) and to propose novel genes involved in Type II diabetes (18), cleft lip and cleft palate phenotypes (19), and pulmonary fibrosis (20).

OUTLINE OF THE Endeavour WEB SERVER

Endeavour was first implemented as a Java client application interacting with a SOAP server and a MySQL database. To make it more universally accessible, we have developed a PHP web-based interface that runs with the most common web browsers, without the need for Java to be installed. It is freely accessible and there is no login requirement. A four-step wizard guides the user through the preparation of the prioritization (Figure 1). The first step is to choose the organism: human, rat, mouse or worm. The second step is to specify the training set. The user can input a mixture of chromosomal bands, chromosomal intervals, gene symbols, EnsEMBL (21) gene identifiers, KEGG (22) identifiers, Gene Ontology (23) identifiers or OMIM (24) disease names. Each input has to be prefixed according to its type. The rules are explained in the Supplementary material and in the online manual. The genes corresponding to the input are retrieved and loaded into the application. The third step is to select the data sources to be used. The data sources available depend on the organism chosen in the first step. Some of these are species specific (e.g. gene expression data sets) while others are more generic (e.g. Gene Ontology annotations). The last step lets the user specify the candidate genes applying the same rules as in the second step. The user launches the prioritization by using a dedicated button. The computation time is dependent on the number of data sources used, the number of candidates and the load on our servers. The application can handle the prioritization of hundreds of genes (e.g. the average computation time for 400 candidates using 10 data sources is 19.14 s over 100 repeats). Warnings and errors, such as unrecognized gene identifiers, are displayed in the console located in the middle of the main windows. The results are displayed at the bottom of the main page in three panels. The first panel contains the sprint plot, a graphical representation of the rankings with one column per data source plus an additional one for the global ranking. The genes are represented as boxes and the top ranking boxes are coloured for better interpretation of the results. The second panel contains the raw scores and ranks for each gene in each data source. The user can sort the columns according to the global ranking or to any ranking per data source. The third panel allows one to export the results as a TSV spreadsheet or as an XML file. The user can also save the sprint plot using several picture formats (i.e. PNG, JPG and GIF).
Figure 1.

Endeavour: the algorithm behind the wizard. Once the organism of interest is chosen (Step 1), the user can specify the training genes (Step 2). Step 3 lets the user select the data sources that will be used to build the models. The models summarize the training gene information. The candidate genes specified by the user in Step 4 are then scored against the model. This produces one ranking per data source plus one global ranking obtained by fusion of the rankings per data source. The global ranking together with the rankings per data source are returned to the application and can be viewed in the ‘Results’ panel.

Endeavour: the algorithm behind the wizard. Once the organism of interest is chosen (Step 1), the user can specify the training genes (Step 2). Step 3 lets the user select the data sources that will be used to build the models. The models summarize the training gene information. The candidate genes specified by the user in Step 4 are then scored against the model. This produces one ranking per data source plus one global ranking obtained by fusion of the rankings per data source. The global ranking together with the rankings per data source are returned to the application and can be viewed in the ‘Results’ panel.

NEW MODEL ORGANISMS AND MORE DATA SOURCES

Endeavour is designed as a generic prioritization tool and is equally useful for the prioritization of candidate disease genes as for candidate members of biological pathways and processes. This is illustrated in our previous publication (16) where we used Endeavour to identify downstream genes of myeloid differentiation. Since the fundamental study of biological processes is predominantly performed in model organisms, we decided to extend our framework to several model organisms. Currently, gene prioritization can be performed for M. musculus, R. norvegicus and C. elegans, and we are also developing the versions for D. rerio and D. melanogaster. We have designed the web server so that the organism-specific versions use the same method for each generic data source (e.g. Gene Ontology annotations). The key strength of Endeavour resides in the fact that a lot of data sources are available and the user can select the ones that best correspond to the biological question under study. There are 8, 11, 12 and 20 data sources available, respectively, for R. norvegicus, C. elegans, M. musculus and H. sapiens, which, in total, result in 26 distinct data sources. They can be classified into six categories: ontologies, interactions, expression, regulatory information, sequence data and text-mining data. Ontologies are structured vocabularies that are used to describe the function of the gene products. Ontologies give more insight on the molecular functions performed [Gene Ontology (23) and SwissProt (25)], on the biological processes involved in [Gene Ontology and KEGG (22)], on the cellular components in which the gene products are active (Gene Ontology) and on the active domains of the proteins [InterPro (26)]. Interaction data come from databases that collect pairs of proteins that interact either physically or genetically. BIND (27) and DIP (28) curate the experimentally determined interactions collected from large-scale interaction and mapping experiments done using yeast two hybrid, mass spectrometry, genetic interactions and phage display. MINT (29) and MIPS (30) mine the literature, either manually or automatically, to find experimentally verified protein interactions. HPRD (31) does the same with an emphasis on domain architecture, post-translational modifications, interaction networks and disease association. IntAct (32) and BioGrid (33) collect physical and genetic interactions by combining analysis of high-throughput experiments and literature curation. STRING (34) and IntNetDb (35) are large databases that contain all kinds of interactions. They rely on a statistical framework to integrate data coming from numerous experiments and databases (including several databases described above), and, additionally, the interactions are transferred across the different organisms, when applicable. Regarding the expression data, the preferred studies are the ones that include a large number of tissues and a large number of genes. Two sets are available for H. sapiens [Su et al. (36) and Son et al. (37)], three for M. musculus [Su et al. (36), Hovatta et al. (38) and Lindsley et al. (39)] and one for R. norvegicus and C. elegans, respectively from the Walker et al. paper (40) and the Baugh et al. study (41). Additionally, anatomical expression sequence tags (EST) expression data from EnsEMBL (21) are available for human. Regarding the cis-regulatory data, we only have information for H. sapiens currently. Using the Toucan toolbox (42) and the upstream sequence of the genes, the algorithm looks for putative motifs and modules (combination of five motifs). There are two data sources that are based on sequences: the protein sequence similarities and the disease probabilities. For the latter, Lopez-Bigas et al. (43) and Adie et al. (44) (ProspectR) used sequence features (e.g. length of the sequence, length of the UTRs, number of introns, length of the introns) and a statistical framework to discriminate the human disease causing genes from the rest of the genome. Next, they associated to every gene a probability of being a disease causing gene, a priori. As for sequence similarity, an all-against-all similarity search is performed for all organisms using the NCBI BLAST (45). The data source based on literature mining relies on the TxtGate framework (46). The strategy is to screen the abstracts from PubMed (47) with a manually curated vocabulary based on Gene Ontology. Similarly to the ontologies described above, it provides more information on the molecular functions and biological processes of the genes. It is important to notice that, except for the regulatory information category, each organism is provided with at least one data source per category. As an alternative to the novel web-based application, one can use the original Java Web Start client, which is also extended to include the other model organisms. This application includes a few additional features, such as a full description of the models created, a full genome screening service in which the whole genome of the given organism can be prioritized and the possibility for users to make use of their own microarray data sets. A SOAP service is also available to allow integration in workflows [e.g. when using Taverna (48) or Kepler (49)].

SOFTWARE DOCUMENTATION

Endeavour comes with an online manual. A subsection describes the concept of gene prioritization through genomic data fusion. Another subsection contains the answers to frequently asked questions and gives more details on how to perform a prioritization and how to interpret the results. Finally, a step-by-step example is given together with the corresponding screenshots. The application is provided with three use cases taken from the literature. The user can run the examples by clicking on the corresponding buttons situated above the wizard that cause the training genes, the data sources and the candidate genes to be loaded automatically into the application. Then, the user can quickly go through the four steps and launch the prioritization process. The three use cases can be used as a first step to understand the mechanisms of Endeavour. The first example is derived from our previous publication in which we studied the DiGeorge syndrome (16). This example shows why YPEL1 was first selected for wet lab experiments that eventually confirmed the phenotypic association in zebrafish. The second example is taken from the Elbers et al. (18) review on obesity and Type II diabetes. They have prioritized five susceptibility loci to reveal a molecular link between the two disorders. Endeavour uncovered the susceptibility loci located on chromosome 11 for this example. It contains KCNJ5, a homolog of KCNJ11 that is known to contribute to the risk of Type II diabetes. We have built the last example after Ebermann et al. (50) published their discovery of a novel Usher gene, DFNB31, that encodes the whirlin protein. By using data six months prior to the publication, we made sure that the association was not yet present in the databases. Among the 32 candidates of the chromosomal band 9q32, DFNB31 ranked first, showing that, retrospectively, it was indeed a good candidate.

VALIDATION

Similarly to our previous work (16), we statistically validate the approach with a standard leave-one-out cross-validation using known gene sets. We produced the corresponding receiver operating characteristic (ROC) curves and measured the performance by calculating the area under the curve (AUC) (Figure 2). Here, we focused on the pathway gene prioritization for the newly added species by applying this scheme to three signalling pathways taken from the Gene Ontology database (23). These pathways are common to the four organisms and involve, respectively, 193, 170, 126 and 44 genes for H. sapiens, M. musculus, R. norvegicus and C. elegans. We performed both a fair validation and a complete validation. For the fair validation, we excluded the data sources that might contain explicitly the gene-pathway association (i.e. Gene Ontology, Kegg, String and Text) while all data sources were used for the complete validation. The first observation is that the performance of the four control validations stays close to the theoretical expectation of 50% (respectively, 48, 39, 45 and 51%). This means that when using randomly generated gene sets for training, we obtain random results. In contrast, the performance of biologically meaningful sets is much higher (respectively, 88, 92, 90 and 86% for the fair validation and 99, 99, 99 and 98% for the complete validation). An analysis per data source of the fair validation reveals that the global performance (e.g. 88% for human) is always higher than the best performing data source performance (e.g. 78% for human InterPro). It shows that our data fusion approach is scientifically sound and that it is crucial to make use of complementary data sources. Altogether, this indicates that our approach based on the assumption that functionally related genes often cause similar phenotypes can be applied successfully.
Figure 2.

Results of the leave-one-out cross-validation. For each organism, the leave-one-out cross-validation was performed on three pathways sets from Gene Ontology (23), and, as a control, on five sets of 20 randomly selected genes. The ROC curves of the random (dotted green) and pathway validation (solid red and dashed blue) are plotted for (a) H. sapiens, (b) M. musculus, (c) R. norvegicus and (d) C. elegans. Notice that for the fair validation (dashed blue), Gene Ontology, KEGG, Text and String were excluded while all data sources were used for the complete validation (solid red). The AUC of the control validations are respectively 48, 39, 45 and 51% indicating a random performance. On the opposite, the AUC of the pathway validations are respectively 88, 92, 90 and 86% for the fair validation and 99, 99, 99 and 98% for the complete validation showing the validity of our approach.

Results of the leave-one-out cross-validation. For each organism, the leave-one-out cross-validation was performed on three pathways sets from Gene Ontology (23), and, as a control, on five sets of 20 randomly selected genes. The ROC curves of the random (dotted green) and pathway validation (solid red and dashed blue) are plotted for (a) H. sapiens, (b) M. musculus, (c) R. norvegicus and (d) C. elegans. Notice that for the fair validation (dashed blue), Gene Ontology, KEGG, Text and String were excluded while all data sources were used for the complete validation (solid red). The AUC of the control validations are respectively 48, 39, 45 and 51% indicating a random performance. On the opposite, the AUC of the pathway validations are respectively 88, 92, 90 and 86% for the fair validation and 99, 99, 99 and 98% for the complete validation showing the validity of our approach. A difficulty of validating gene prioritization methods is the fact that known data are used for the ranking. In other words, for every disease or pathway gene, the link between the disease and the gene is described in the literature and sometimes evidence is also present in the ontologies or in the interaction information. Therefore, we excluded in the above analysis the data sources that contain explicit information about the similarity of the true positive to the training set. To assess the full performance of Endeavour to solve real biological cases, using all data sources, we therefore focused on genetic disorders for which associations were reported very recently in the literature, so that the explicit information is not yet present in our data. Particularly, we used gene–disease associations that were reported in Nature Genetics after 1 January 2008 (Table 1), 32 in total. For each disorder, we built a training set containing all the genes already known to play a role in that disorder according to the OMIM and Gene Ontology databases (both downloaded in August 2007). As candidate genes to be ranked we used the true positive gene together with 99 genes that flank the true positive in the genome. These regions were then prioritized with Endeavour using all data sources and their specific training sets. The results are presented in Table 1. Interestingly, BANK1, CTRC and SORT1 rank first out of their region and GDF5, RGS1 and SH2B3 rank second. All genes but four are within the top 20% and half of them are within the top 9%.
Table 1.

Results of the thirty two genetic disorder prioritizations

GeneDisorderReferenceEndeavour rank
BANK1Systemic lupus erythematosusKozyrev et al. (51)1
ITGAMSystemic lupus erythematosusNath et al. (52)3
TNFSF4Systemic lupus erythematosusGraham et al. (53)16
DPP6Amyotropic lateral sclerosisvan Es et al. (54)15
CTRCChronic pancreatitisRosendahl et al. (55)1
ATP6V0A2Impaired glycosylationKornak et al. (56)5
ATP6V0A2Cutis laxaKornak et al. (56)5
GALNT2aLDL/HDL cholesterolWiller et al. (57), Kathiresan et al. (58)13
SORT1aLDL/HDL cholesterolWiller et al. (57), Kathiresan et al. (58)1
MLXIPLaLDL/HDL cholesterolWiller et al. (57), Kathiresan et al. (58), Kooner et al. (59),12
GDF5aHuman heightSanna et al. (60)2
C20orf44aHuman heightSanna et al. (60)41
MSMBaProstate cancerEeles et al. (61), Thomas et al. (62)18
JAZF1aProstate cancerThomas et al. (62)14
CTBP2aProstate cancerThomas et al. (62)4
LMTK2aProstate cancerEeles et al. (61)4
KLK3aProstate cancerEeles et al. (61)9
CPNE3aProstate cancerThomas et al. (62)42
IL16aProstate cancerThomas et al. (62)9
CDH23aProstate cancerThomas et al. (62)40
EHBP1aProstate cancerGudmundsson et al. (63)19
CCR3aCeliac diseaseHunt et al. (64)12
RGS1aCeliac diseaseHunt et al. (64)2
LPPaCeliac diseaseHunt et al. (64)30
TAGAPaCeliac diseaseHunt et al. (64)3
SH2B3aCeliac diseaseHunt et al. (64)2
IL12AaCeliac diseaseHunt et al. (64)18
SCHIP1aCeliac diseaseHunt et al. (64)20
IL18R1aCeliac diseaseHunt et al. (64)3
IL18RAPaCeliac diseaseHunt et al. (64)4
IL2aCeliac diseaseHunt et al. (64)10
IL21aCeliac diseaseHunt et al. (64)14
                Mean (all genes)12.25
                Mean (GWAS excluded)6.57

Associations reported with GWAS (Genome Wide SNPs Associations Studies).

The gene-disease associations were reported in Nature Genetics after 1 January 2008 to exclude the presence of explicit evidence in our data sources. The training sets were built with OMIM and Gene Ontology; and the candidate regions contain the novel gene and its 99 nearest neighbours. The 20 human data sources were used to perform the prioritizations. The results show that Endeavour ranked all the novel genes but four within the top 20%, and half of them within the top 9%.

Results of the thirty two genetic disorder prioritizations Associations reported with GWAS (Genome Wide SNPs Associations Studies). The gene-disease associations were reported in Nature Genetics after 1 January 2008 to exclude the presence of explicit evidence in our data sources. The training sets were built with OMIM and Gene Ontology; and the candidate regions contain the novel gene and its 99 nearest neighbours. The 20 human data sources were used to perform the prioritizations. The results show that Endeavour ranked all the novel genes but four within the top 20%, and half of them within the top 9%. Others have used our gene prioritization tool as well. Elbers et al. (18) have used Endeavour in combination with other prioritization tools to define the best strategy to search for common obesity and Type II diabetes genes. They suggest a list of genes indicated as potential candidates by at least two of the six tools. Tzouvelekis et al. (20) have used Endeavour to prioritize a list of genes differentially expressed in idiopathic pulmonary fibrosis. They consistently find that among the top candidates, five and seven genes are targets of, respectively, tumor necrosis factor (TNF) and transforming growth factor (TGF). Osoegawa et al. (19) applied Endeavour to propose novel genes associated with cleft lip and cleft palate phenotypes. They analysed 83 syndromic cases and 104 non-syndromic cases and concluded that estrogen receptor 1 (ESR1) and fibroblast growth factor receptor 2 (FGFR2) were the most likely candidates, respectively, from region 6q25.1-25.2 and region 10q26.11-26.13. Using mass spectrometry and bioinformatics, Adachi et al. (17) explored the proteome of the adipocyte, a central player in energy metabolism. Using Endeavour, they were able to associate a number of factors with vesicle transport in response to insulin stimulation, which is a key function of adipocytes.

CONCLUSION

Endeavour is a web server that allows users to prioritize candidate genes with respect to their biological processes or diseases of interest. It is provided with an intuitive four-step wizard and an online manual. It is available for four organisms (H. sapiens, M. musculus, R. norvegicus and C. elegans). Endeavour relies on the similarity between the candidates and the models built with the training genes. The approach has been validated experimentally (16), by extensive leave-one-out cross-validations, and by analysis of recently reported cases from the literature. Additionally, several independent laboratories have used Endeavour to propose novel disease genes [Elbers et al. (18) and Osoegawa et al. (19)] or to optimize the analysis of medium-throughput experiments [Tzouvelekis et al. (20) and Adachi et al. (17)]. Importantly, the cross-validation revealed the added value of combining several complementary data sources. With 26 distinct data sources (51 in total) covering most aspects of the knowledge available on genes and gene products (functional annotations, protein interactions, expression profiles, regulatory information, sequence-based data and literature mining), Endeavour exploits the most comprehensive collection of publicly available knowledge.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.
  63 in total

1.  A computational system to select candidate genes for complex human traits.

Authors:  Kyle J Gaulton; Karen L Mohlke; Todd J Vision
Journal:  Bioinformatics       Date:  2007-01-19       Impact factor: 6.937

2.  CGI: a new approach for prioritizing genes by combining gene expression and protein-protein interaction data.

Authors:  Xiaotu Ma; Hyunju Lee; Li Wang; Fengzhu Sun
Journal:  Bioinformatics       Date:  2006-11-10       Impact factor: 6.937

3.  In-depth analysis of the adipocyte proteome by mass spectrometry and bioinformatics.

Authors:  Jun Adachi; Chanchal Kumar; Yanling Zhang; Matthias Mann
Journal:  Mol Cell Proteomics       Date:  2007-04-04       Impact factor: 5.911

4.  Genetic variation in DPP6 is associated with susceptibility to amyotrophic lateral sclerosis.

Authors:  Michael A van Es; Paul W J van Vught; Hylke M Blauw; Lude Franke; Christiaan G J Saris; Ludo Van den Bosch; Sonja W de Jong; Vianney de Jong; Frank Baas; Ruben van't Slot; Robin Lemmens; Helenius J Schelhaas; Anna Birve; Kristel Sleegers; Christine Van Broeckhoven; Jennifer C Schymick; Bryan J Traynor; John H J Wokke; Cisca Wijmenga; Wim Robberecht; Peter M Andersen; Jan H Veldink; Roel A Ophoff; Leonard H van den Berg
Journal:  Nat Genet       Date:  2007-12-16       Impact factor: 38.330

5.  The human disease network.

Authors:  Kwang-Il Goh; Michael E Cusick; David Valle; Barton Childs; Marc Vidal; Albert-László Barabási
Journal:  Proc Natl Acad Sci U S A       Date:  2007-05-14       Impact factor: 11.205

6.  Comparative expression profiling in pulmonary fibrosis suggests a role of hypoxia-inducible factor-1alpha in disease pathogenesis.

Authors:  Argyris Tzouvelekis; Vaggelis Harokopos; Triantafillos Paparountas; Nikos Oikonomou; Aristotelis Chatziioannou; George Vilaras; Evangelos Tsiambas; Andreas Karameris; Demosthenes Bouros; Vassilis Aidinis
Journal:  Am J Respir Crit Care Med       Date:  2007-08-29       Impact factor: 21.405

7.  Polymorphism at the TNF superfamily gene TNFSF4 confers susceptibility to systemic lupus erythematosus.

Authors:  Deborah S Cunninghame Graham; Robert R Graham; Harinder Manku; Andrew K Wong; John C Whittaker; Patrick M Gaffney; Kathy L Moser; John D Rioux; David Altshuler; Timothy W Behrens; Timothy J Vyse
Journal:  Nat Genet       Date:  2007-12-02       Impact factor: 38.330

8.  Chymotrypsin C (CTRC) variants that diminish activity or secretion are associated with chronic pancreatitis.

Authors:  Jonas Rosendahl; Heiko Witt; Richárd Szmola; Eesh Bhatia; Béla Ozsvári; Olfert Landt; Hans-Ulrich Schulz; Thomas M Gress; Roland Pfützer; Matthias Löhr; Peter Kovacs; Matthias Blüher; Michael Stumvoll; Gourdas Choudhuri; Péter Hegyi; René H M te Morsche; Joost P H Drenth; Kaspar Truninger; Milan Macek; Gero Puhl; Ulrike Witt; Hartmut Schmidt; Carsten Büning; Johann Ockenga; Andreas Kage; David Alexander Groneberg; Renate Nickel; Thomas Berg; Bertram Wiedenmann; Hans Bödeker; Volker Keim; Joachim Mössner; Niels Teich; Miklós Sahin-Tóth
Journal:  Nat Genet       Date:  2007-12-02       Impact factor: 38.330

9.  IntAct--open source resource for molecular interaction data.

Authors:  S Kerrien; Y Alam-Faruque; B Aranda; I Bancarz; A Bridge; C Derow; E Dimmer; M Feuermann; A Friedrichsen; R Huntley; C Kohler; J Khadake; C Leroy; A Liban; C Lieftink; L Montecchi-Palazzi; S Orchard; J Risse; K Robbe; B Roechert; D Thorneycroft; Y Zhang; R Apweiler; H Hermjakob
Journal:  Nucleic Acids Res       Date:  2006-12-01       Impact factor: 16.971

Review 10.  Candidate gene identification approach: progress and challenges.

Authors:  Mengjin Zhu; Shuhong Zhao
Journal:  Int J Biol Sci       Date:  2007-10-25       Impact factor: 6.580

View more
  94 in total

Review 1.  Bioinformatics for personal genome interpretation.

Authors:  Emidio Capriotti; Nathan L Nehrt; Maricel G Kann; Yana Bromberg
Journal:  Brief Bioinform       Date:  2012-01-13       Impact factor: 11.622

Review 2.  Computational tools for prioritizing candidate genes: boosting disease gene discovery.

Authors:  Yves Moreau; Léon-Charles Tranchevent
Journal:  Nat Rev Genet       Date:  2012-07-03       Impact factor: 53.242

3.  Outcome of array CGH analysis for 255 subjects with intellectual disability and search for candidate genes using bioinformatics.

Authors:  Y Qiao; C Harvard; C Tyson; X Liu; C Fawcett; P Pavlidis; J J A Holden; M E S Lewis; E Rajcan-Separovic
Journal:  Hum Genet       Date:  2010-05-29       Impact factor: 4.132

4.  Prioritization of candidate genes for attention deficit hyperactivity disorder by computational analysis of multiple data sources.

Authors:  Suhua Chang; Weina Zhang; Lei Gao; Jing Wang
Journal:  Protein Cell       Date:  2012-07-10       Impact factor: 14.870

Review 5.  Candidate gene prioritization.

Authors:  Ali Masoudi-Nejad; Alireza Meshkin; Behzad Haji-Eghrari; Gholamreza Bidkhori
Journal:  Mol Genet Genomics       Date:  2012-08-15       Impact factor: 3.291

6.  Mind the dbGAP: the application of data mining to identify biological mechanisms.

Authors:  Eric C Wooten; Gordon S Huggins
Journal:  Mol Interv       Date:  2011-04

7.  Beegle: from literature mining to disease-gene discovery.

Authors:  Sarah ElShal; Léon-Charles Tranchevent; Alejandro Sifrim; Amin Ardeshirdavani; Jesse Davis; Yves Moreau
Journal:  Nucleic Acids Res       Date:  2015-09-17       Impact factor: 16.971

8.  A quick guide to large-scale genomic data mining.

Authors:  Curtis Huttenhower; Oliver Hofmann
Journal:  PLoS Comput Biol       Date:  2010-05-27       Impact factor: 4.475

9.  Coordinated modular functionality and prognostic potential of a heart failure biomarker-driven interaction network.

Authors:  Francisco Azuaje; Yvan Devaux; Daniel R Wagner
Journal:  BMC Syst Biol       Date:  2010-05-12

10.  ToppGene Suite for gene list enrichment analysis and candidate gene prioritization.

Authors:  Jing Chen; Eric E Bardes; Bruce J Aronow; Anil G Jegga
Journal:  Nucleic Acids Res       Date:  2009-05-22       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.