Literature DB >> 28609292

Knowledge Discovery in Biological Databases for Revealing Candidate Genes Linked to Complex Phenotypes.

Keywan Hassani-Pak¹, Christopher Rawlings¹.
1. .

Abstract

Genetics and "omics" studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future.

Entities: Chemical

Keywords: Data integration; candidate gene prioritization; genotype-to-phenotype; knowledge discovery; knowledge graph

Mesh：

Year: 2017 PMID： 28609292 PMCID： PMC6042805 DOI： 10.1515/jib-2016-0002

Source DB: PubMed Journal: J Integr Bioinform ISSN： 1613-4516

Introduction

The discovery of causal genes and alleles that determine a particular biological phenotype in crops, animals or humans is referred to as the genotype to phenotype prediction-challenge [1], [2]. The use of genetics (e.g. genome wide association studies and quantitative trait mapping), and “omics” (e.g. RNA-sequencing) approaches can often identify large numbers of potential candidate genes, among which the causal genes are hidden. Experimental validation of candidate genes, e.g. from lab to greenhouse to field, is a slow process that can last several years. Following a wrong lead can waste significant effort, time and money. Scientists therefore need to prioritise genes that, when perturbed through potential interventions such as knock-down or gene editing approaches, might have a positive impact on the biological outcome for the whole organism without producing negative side effects. Because it is hard to undertake objective evaluation of large candidate gene sets, this choice is likely to be made subjectively, based on hunches or (potentially selective) prior experience and generally with limited scientific justification. The productivity and likelihood of success of genotype-phenotype mapping would be greatly improved if all candidate genes were to be thoroughly evaluated and only those with the highest level of confidence were considered for experimental validation. A systematic prioritization of candidate genes needs to be based on the generation of hypotheses that explain how genotype might be linked to phenotype. This requires the consideration of multiple types of information that is very heterogenous in nature such as: known records of gene-phenotype links, gene-disease associations, gene expression and co-expression, allelic information and effects of genetic variation, links to scientific literature, homology information from model species, protein-protein interactions, gene regulation, protein pathway memberships, gene-ontology annotations, protein-domain information and other domain specific information. The integration of such information into a knowledge network/graph combined with knowledge mining has considerable potential to improve the interpretation of complex genetic and omics experiments and help with the discovery of biological networks controlling phenotypes and diseases (Figure 1). However, it is not trivial to integrate and interrogate this information and obtain clear, objective answers that can be applied in practice. One of the key challenges is that the biological information is spread across many different databases and data formats [3].

Figure 1:

Using biological knowledge discovery to interpret genotype and omics experiments and establish links to phenotypes and diseases.

Using biological knowledge discovery to interpret genotype and omics experiments and establish links to phenotypes and diseases. It has been recognised that computational tools are needed that can systematically mine the wealth of biomedical information to boost candidate gene discovery [4]. The identification of patterns in such large structured, semi-structured and unstructured data is often referred to as “data mining”, or more broadly “knowledge discovery in databases”, or KDD [5]. In the last 25 years, many novel KDD approaches have been developed including methods to pre-process, integrate, analyse and interpret complex biomedical data with the aim of identifying testable hypotheses [6]. It has also been recognised that it is important to include the end user into the “interactive” knowledge discovery process with the goal of supporting human intelligence with machine intelligence [7]. Combined KDD-HCI (Human-Computer Interaction) approaches can significantly increase the capacity and efficiency of candidate gene discovery while reducing costs and time. Here, we first provide a comprehensive review of biological information types and databases that play an important role in candidate gene prioritization. The second part of this article provides a short overview of bioinformatics tools for integrating information from selected databases, and an overview of interactive knowledge discovery approaches that can help to bridge the genotype to phenotype gap.

Information Types and Databases for Gene Discovery

Some key information types and databases for in silico genotype to phenotype discovery are described below, together with their value and importance for the discovery and prioritization of candidate genes.

Genotype and Genetics Data

Quantitative genetics uses natural populations or families (mapping populations) and applies statistical techniques to identify those regions in the genome that can explain the phenotypic variability in the population. These regions are referred to as quantitative trait loci (QTL) [8]. Genetic linkage studies show that typical QTLs in both plants and animals encompass quite sizeable parts of the genome – often several hundred genes. Genome-wide association studies (GWAS) associate phenotype with genotype at a genome-wide level using “unrelated” individuals [9]. The limitation of family-based mapping populations can be overcome by the use of unrelated genotypes that have accumulated much higher number of recombination events since their last common progenitor [10]. Although genetic intervals identified from GWAS encompass much smaller regions of the genome compared to QTLs from mapping populations, they are likely to identify many significant candidate genes. Genetic variants that are linked to phenotypes via QTL mapping, GWAS or other genetic studies provide a key data resource for gene-phenotype discovery. Access to public databases that contain such information is invaluable, however, this information is often hidden in the literature in an unstructured manner; which makes it very hard to retrieve and integrate. An ideal resource for standardised QTL and GWAS data of livestock species is the AnimalQTLdb [11]. AnimalQTLdb contains 121,265 QTL for 1804 traits based on 1768 publications in seven species (Release 31, December 2016). In crop species, however, such structured genetics resources are only slowly beginning to emerge. For example, GnpIS [12] and the Triticeae Toolbox [13] provide access to genetic information (e.g. markers, phenotype and pedigree data) for species of agronomic interest. Genetic variants that do not have reported links to phenotypes might initially be considered less important to gene discovery. However, knowledge about published genetic variants and their effect on protein level can inform candidate gene prioritization, since variants of genes with major effects can be given higher weight than genes with no reported variants or minor variant effects. The european variation archive (EVA) provides access to all types of genetic variants, ranging from single nucleotide polymorphisms to large structural variants from any eukaryotic organism. EVA uses the variant effect predictor [14] of Ensembl to annotate variant consequences. The variant consequences are described using sequence ontology terms [15]. Reverse genetics approaches are based on disrupting genes of known sequence and studying the effect of the disruption on the phenome [16]. Reverse genetics resources consist of plant material (i.e. seeds) with a certain knockout gene that can be grown and used for functional characterisation of the disrupted gene. For several plant species, e.g. Arabidopsis, rice and wheat, reverse genetics resources have been generated that allow scientists to study the function of many genes more effectively [17], [18], [19]. The phenotypic consequences of such genetic disruptions are recorded in several databases. The public database UniProt contains a subsection “disruption phenotype” that describes the in vivo effects caused by knockout or knockdown of a gene [20]. TAIR provides phenotypic information for unique genotypes with mutations in individual genes [21]. NCBI has the GeneRIF database [22] that contains concise phrases describing a gene function that is sometimes used to add phenotypic descriptions. The data from such resources can be used to rank candidate genes higher for which gene knockouts with associated phenotype data exist.

Phenotype and Environment Data

Genotypic data is stable for a given plant or animal. In contrast, phenotypic characterisation requires environmental data because of the important role that environment has on the expression of a trait/phenotype. The development of standards for capturing phenotypic data has been challenging since “phenotype” is a broad concept that covers all observable traits stored as descriptive data, numeric observations including time series, molecular data and image data. Phenotypic information can be obtained from dedicated phenotyping platforms, from farmers’ fields, or from ecological diagnostics in natural environments. Phenotyping platforms measure a wide range of structural and functional plant traits at the same time as collecting accurate metadata on the environment and experimental setup [23]. Traits are measured at different spatial scales, from the field level (e.g. crop yield) to the cell (e.g. cell wall polysaccharide composition) and over widely varying temporal scales, from seconds (e.g. photosynthetic response) to months (e.g whole season biomass). An important recent development is the publication of a minimal metadata standard for plant phenotyping experiments (MIAPPE). Phenotype data itself (without being associated to genotype) is important in upstream processes involved in trait discovery and QTL mapping but has limited use to gene discovery per se. Once phenotype data can be related to genotype, gene or mutants then it becomes a relationship of high importance. The majority of phenotypic information is available in an unstructured form in the scientific literature and is therefore difficult to integrate with other knowledge resources such as ontologies. Text-mining techniques are required to identify and extract such information. Due to the complexity of phenotype descriptions and the essential role of environmental information, a variety of ontologies have been developed to formalise their representation. Many of these are species-specific. For example, available ontologies for plants and crops include the Plant Ontology (www.plantontology.org), the Crop Ontology (www.cropontology.org), the Plant Trait Ontology (www.planteome.org) and the Environment Ontology (www.environmentontology.org). Although several new phenotype ontologies are emerging, not many plant genomes and experiments are yet annotated with these ontology terms. Even in model species such as Arabidopsis, most phenotypic descriptions are still in free text. The Drosophila phenotype ontology [24] is a good example of a phenotype ontology that is systematically used to annotate genes and alleles enabling more powerful search queries.

Gene Expression Data

Gene expression data can be used as evidence to confirm the expression of candidate genes in tissues, organs, during different developmental stages, under treatments of interest or in particular genotypes. For example, for human studies, the Genotype-Tissue-Expression resource can reveal correlations between genotype and tissue-specific gene expression levels and can help identify regions of the genome that influence whether and by how much a gene is expressed [25]. A similar baseline expression resource does not exist for most plant and animal species. For example, identifying causal genes for a grain specific QTL would require any potential candidate gene to be expressed at some stage during grain development and potentially only expressed in certain individuals of a mapping population and not in others. Several other general gene expression databases exist such as the Gene Expression Atlas [26], the Gene Expression Omnibus [27] or the eFP Browser [28]. Reference-species resources such as The Arabidopsis Information Resource (TAIR) have annotated Arabidopsis genes with Plant Ontology [29] terms that describe in which tissues and during which developmental stages a gene is expressed. Other databases such as ATTED-II [30] analyse large amounts of expression datasets to compute clusters of coexpressed genes. Such co-expression data provides weak, speculative evidence that these genes are co-regulated and therefore could share a similar biological function or act together to control a phenotype.

Interaction Data

Protein-protein interaction (PPI) data provides very useful knowledge for candidate gene discovery. In contrast to co-expression data, PPI data provides evidence about the physical interaction of proteins in the cell. A large number of methods have been developed over the years to study protein-protein interactions, e.g. affinity-tagged proteins, the two-hybrid system and some quantitative proteomic techniques [31]. Measurable physical interaction implies that the proteins are involved in the same biological process and could contribute to higher-level traits although they might have different functions. Public PPI databases can be searched to identify previously reported interactions for a given bait protein. BioGRID [32] and IntAct [33] databases are populated by data either curated from the literature or from direct data depositions. Data access and download are provided for many species and in different data formats such as PSIMI-XML, PSIMI-TAB, BioPAX or RDF. Other types of interaction data such as protein-drug interactions [34] or pathogen-host interactions [35] can be considered for the discovery of genes relevant to human or plant disease.

Functional Annotation Data

Functional annotation of genes and gene products provides a key resource to elucidate the biological processes and pathways controlling complex traits. Gene Ontology annotations capture the knowledge that we have about the molecular function of genes in a systematic and cross-species comparable manner. GO provides a controlled vocabulary to describe biological processes, molecular functions and cellular components. GO annotations require the provision of evidence codes that describe the experimental or computational methods used to establish the gene function. The Evidence and Conclusion Ontology (ECO) is used to describe the evidence in a formalised manner and help to distinguish high quality annotations (e.g. inferred through mutant phenotypes) from low quality annotations (e.g. inferred through electronic annotations). As the best studied plant species Arabidopsis thaliana has about 50,000 GO annotations of experimental evidence (25 % of total annotations). The majority of annotations in non-model species are electronically inferred through sequence based comparisons with model species. The common data type for functional gene annotations is the Gene Association Format (GAF). Many functional or structural bioinformatics databases provide mappings to GO terms e.g. EC2GO, Pfam2GO and InterPro2GO. Biological pathways provide a more fine-grained knowledge about the enzymes, chemical reactions and small molecules that form the elements of biosynthetic pathways. Popular pathways databases such as KEGG [36], Reactome [37] and BioCyc [38] provide curated pathway information for model species and computationally inferred pathways for non-model species.

Homology Data

The function of the vast majority of genes in non-model species remains uncharacterised. Any effort to prioritize candidate genes without any evidence about their function is difficult or even impossible. Genes that have been well characterised in other species provide a reliable source of putative evidence assuming this knowledge can be transferred from one species to another. The principal idea supporting cross-species annotation transfer is that the function of proteins is, to some extent, conserved through evolution. Thus, two orthologs in two closely related species are likely to share the same function. But the level of conservation of protein function across species largely depends on the evolution of these species, including the evolution of their proteins, of their biochemical pathways and of their higher level biological traits. Orthologous relationships can be established when comparing the genomes of two or more species. Identification of orthologous gene sets typically involves phylogenetic tree analysis, heuristic algorithms based on sequence conservation, synteny analysis, or some combination of these approaches [39], [40]. Some of the prominent databases of orthologous genes include Ensembl [41], OrthoDB [42] OMA [43] and Phytozome [44]. The common data standard for orthology data provision is OrthoXML [45]. In addition to using orthology data for cross-species annotation transfer, a more direct approach exploiting sequence database search with the BLAST [46] or Smith-Waterman [47] algorithms can be used to infer putative gene function. This is a common shortcut taken by many scientists and bioinformatics tools such as Blast2GO [48]. Such data can be used for exploratory analysis but is prone to a high false positive rate. In the context of prioritizing genes it should be given a much lower importance than more accurate orthology inference methods.

Biological Knowledge Discovery for Gene Prioritization

Having identified various datasets and information types relevant to candidate gene discovery, the next step in the knowledge discovery process is the transformation of data into a suitable data structure. Biological data is typically highly connected, e.g. through common references to named biological entities, and semi-structured, e.g. because some data can be found in databases and other in free text. Furthermore, these data types are not static because new types of data are constantly emerging from advances in high-throughput experimental platforms. These characteristics of Life Science data make networks, consisting of nodes and links between them, a flexible data model that can capture much of the complexity and interconnectedness in the data [49]. In addition, networks are often considered as the layer that connects genotype to phenotype [50]. In contrast to homogeneous networks, where all nodes have the same type (e.g. protein-protein interaction networks), heterogeneous information networks, also referred to as knowledge graphs, are networks where nodes and links can have various types [51]. Biological knowledge networks are composed of nodes which represent biological entities such as genes, transcripts, proteins and compounds, as well as other entities such as protein domains, ontology terms, pathways, literature and phenotypes. The links in the network correspond to relations between entities and are described using terms which reflect the semantics of the biological or functional relationship such as encodes, interacts, involved_in, expressed_in, published_in etc. A number of biological data warehousing (DW) systems have been constructed to facilitate data integration and information retrieval from diverse biological data sources [52]. Common requirements of such biological DW systems include: (i) to provide solutions for reproducible data acquisition and integration, (ii) to be flexibly extended to new species and new data types and (iii) to support complex queries using a powerful (semantic) search engine. InterMine [53], BioMart [54] and Ondex [55] are examples of such DW systems that provide tools (parsers) for integrating data from many common biological data sources and formats, and frameworks for adding custom user data in tabular format. Most biological DW use a relational database to store information and only a few systems such as Ondex use networks (graphs) as their internal data structure. Our group has developed genome-scale knowledge networks (GSKNs) for key plant and crop species using the Ondex platform [56]. For example, the wheat GSKN contains approximately 700,000 nodes of 20 different types and 3 Million links of 30 different types between them. In order to expand knowledge networks with phenotypic information from unstructured free text such as scientific publications, automated approaches are needed that link trait descriptions to the cited genes and their corresponding nodes in the network. Such approaches will create novel, structured relationships between biological concepts and therefore improve the ability to reason over the data and make novel connections between previously unrelated biological concepts [57]. In recent years, several stand-alone text mining systems have been developed [58], mostly to support database curators finding evidence text for particular information of interest, such as protein-protein interactions or functional gene annotations [59], [60]. In addition to such user-centred systems, Java based libraries and frameworks have recently emerged providing APIs that enable language processing functionality to be embedded in diverse applications [61], [62]. Such frameworks allow text mining workflows to be created that consist of elementary components, for example text segmentation, sentence boundary detection, entity detection and relation extraction. For example, the Ondex data integration platform has been extended with easy to use text mining workflows that operate on the knowledge graph and include steps to filter associations with low scores [63]. Once the data has been transformed and integrated, the next step in the knowledge discovery process requires tools for knowledge mining, exploration and visualisation that help scientists to prioritize candidate genes and biological processes. A number of web-based resources for prioritizing candidate genes by exploiting multiple information types have been developed [4], [64]. For example, Endeavour [65] integrates 75 datasets from 6 model species including human and mouse into a local database, and uses basic machine learning techniques with a-priori known candidate genes to model the biological process under study and then to prioritize the candidate genes. Another tool named BioGraph is based on a graph data warehouse approach and uses unsupervised data mining for the exploration and discovery of biomedical information [66]. In total, BioGraph contains 532,889 distinct relations among 71,042 biomedical concepts, supported by 61,570 literature references. The biological knowledge, which includes many indirect relationships, is used for gene prioritization and hypothesis generation. The main limitations of many gene prioritization tools, including Endeavour and BioGraph, are that they are restricted to the analysis of key model species and the data integration process is not easily reproducible and adaptable to other species. PosMed-Plus [68] was one of the first tools to prioritize candidate genes for two plant species (Arabidopsis thaliana and rice) using a knowledge-based approach and including literature co-occurrence and cross-species information. KnetMiner (http://knetminer.rothamsted.ac.uk/) is one of the first tools to provide a generic and easily configurable approach that works for model and non-model species. KnetMiner searches and evaluates millions of relations and concepts within biological knowledge networks (created using the Ondex data integration platform) in real-time to determine if direct or indirect links between genes and phenotypes, pathways, annotations etc. can be established using biologically plausible graph queries. KnetMiner accepts as user inputs: search terms in combination with a gene list and/or genomic regions. It produces tables of ranked candidate genes or evidence summaries, and allows users to explore the knowledge networks using interactive web-based tools. KnetMiner is currently available for several plant, crop and animal species such as Arabidopsis, wheat, maize, barley, camelina, potato, tomato, poplar, pig, cow and chicken. A benefit of the KnetMiner compared to other existing gene discovery tools is its generic and interactive approach.

Conclusion

Mining information across different biological databases has the potential to discover new knowledge that was hidden before. For example, linking a GWAS dataset that contains statistical associations between SNPs and phenotypes, with genomic information about genes and proteins, and protein-protein interaction data, can reveal new insights into the regulation of complex traits. In this article we have reviewed several biological databases and information types that can be used to provide evidence for the discovery of genotype to phenotype relationships. Creating a complete knowledge base of gene functions, interaction networks and trait biology is technically challenging because the relevant data are dispersed in myriad databases in a variety of data formats with variable quality and coverage. Innovative approaches are often needed to infer implicit relationships between concepts in a knowledge network. For example, linking SNP to gene can be based on genomic coordinate information, linking gene to phenotype can be based on sentence-level co-occurrence of names. Building knowledge networks in non-model species is even more challenging as the majority of genes are not well studied and have unknown names or function. In this article, we also reviewed a small set of tools for biological knowledge discovery and candidate gene mining. Although many candidate gene mining tools already exist, there is still an urgent need for tools that improve the efficiency and interactivity of gene discovery using new approaches from the KDD-HCI field. In the following we identify some of the challenges we consider to be key to improving.

Key Challenges for Data Integration

Ontologies play an important role into data integration and allow us to unify different terminologies. It is important that data providers increase their use of ontologies and metadata standards as much as possible to facilitate data integration. In recent years, linked data principles and Semantic Web standards (RDF) have further contributed to the integration of heterogeneous data sources. Making more data available in such linked form, will significantly simplify data integration processes and improve capturing most aspects of data provenance. The Monarch Initiative [67] is an outstanding example of how RDF and semantic web technolgies can be harnessed to build analytical tools that connect genotype to phenotype across species. Furthermore, recent developments in this field have been using innovative approaches to address the problem of interoperability between different ontologies and data models [69]. Finally, more synergistic approaches will be needed that can effectively integrate information from structured databases with facts extracted from semi-structured and unstructured data [70], [71].

Key Challenges for Inference over Integrated Knowledge Networks

Once the heterogeneous information has been transformed into a standard data structure such as a knowledge graph or network, tools are need for interrogating the network and for analysis and inference steps, for example to prioritise genes based on an evaluation of quality of the supporting evidence. One of the challenges that need to be addressed by biological graph mining approaches is to distinguish between high and low confidence links, for example, links that are based on poor alignments, weak associations or insufficient evidence need to be treated differently to high-quality curated links. In the future, we hope to see applications similar to the Google Knowledge Graph Search to be developed for the Life Sciences that utilise the strength of biological knowledge graphs.

Key Challenges for Interactive Knowledge Discovery

For users, the visual representation of complex biological information and navigating it to find new knowledge or testable hypotheses is relevant. Networks provide the means for interactive knowledge discovery. While networks are intuitive for biologists, there remain challenges in terms of usability of the current generation of network visualisation tools. Key challenges are the representation of many different information types, uncertainty of relationships and linked quantitative data such as time series or dose response. It is important that a new generation of interactive knowledge discovery tools are developed that allow human intelligence to play a major role in candidate gene discovery and decision making.

61 in total

1. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.

Authors: Ron Edgar; Michael Domrachev; Alex E Lash
Journal: Nucleic Acids Res Date: 2002-01-01 Impact factor: 16.971

Review 2. Reverse genetic approaches for functional genomics of rice.

Authors: Gynheung An; Dong-Hoon Jeong; Ki-Hong Jung; Sichul Lee
Journal: Plant Mol Biol Date: 2005-09 Impact factor: 4.076

Review 3. Genome-wide association studies for common diseases and complex traits.

Authors: Joel N Hirschhorn; Mark J Daly
Journal: Nat Rev Genet Date: 2005-02 Impact factor: 53.242

Review 4. Graphs in molecular biology.

Authors: Wolfgang Huber; Vincent J Carey; Li Long; Seth Falcon; Robert Gentleman
Journal: BMC Bioinformatics Date: 2007-09-27 Impact factor: 3.169

Review 5. Methods for the detection and analysis of protein-protein interactions.

Authors: Tord Berggård; Sara Linse; Peter James
Journal: Proteomics Date: 2007-08 Impact factor: 3.984

6. Graph-based analysis and visualization of experimental results with ONDEX.

Authors: Jacob Köhler; Jan Baumbach; Jan Taubert; Michael Specht; Andre Skusa; Alexander Rüegg; Chris Rawlings; Paul Verrier; Stephan Philippi
Journal: Bioinformatics Date: 2006-03-13 Impact factor: 6.937

7. The Sequence Ontology: a tool for the unification of genome annotations.

Authors: Karen Eilbeck; Suzanna E Lewis; Christopher J Mungall; Mark Yandell; Lincoln Stein; Richard Durbin; Michael Ashburner
Journal: Genome Biol Date: 2005-04-29 Impact factor: 13.583

8. DrugBank: a comprehensive resource for in silico drug discovery and exploration.

Authors: David S Wishart; Craig Knox; An Chi Guo; Savita Shrivastava; Murtaza Hassanali; Paul Stothard; Zhan Chang; Jennifer Woolsey
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

9. An "Electronic Fluorescent Pictograph" browser for exploring and analyzing large-scale biological data sets.

Authors: Debbie Winter; Ben Vinegar; Hardeep Nahal; Ron Ammar; Greg V Wilson; Nicholas J Provart
Journal: PLoS One Date: 2007-08-08 Impact factor: 3.240

10. High-throughput functional annotation and data mining with the Blast2GO suite.

Authors: Stefan Götz; Juan Miguel García-Gómez; Javier Terol; Tim D Williams; Shivashankar H Nagaraj; María José Nueda; Montserrat Robles; Manuel Talón; Joaquín Dopazo; Ana Conesa
Journal: Nucleic Acids Res Date: 2008-04-29 Impact factor: 16.971

10 in total

1. Genetic Basis of Maize Resistance to Multiple Insect Pests: Integrated Genome-Wide Comparative Mapping and Candidate Gene Prioritization.

Authors: A Badji; D B Kwemoi; L Machida; D Okii; N Mwila; S Agbahoungba; F Kumi; A Ibanda; A Bararyenya; M Solemanegy; T Odong; P Wasswa; M Otim; G Asea; M Ochwo-Ssemakula; H Talwana; S Kyamanywa; P Rubaihayo
Journal: Genes (Basel) Date: 2020-06-24 Impact factor: 4.096

2. Knowledge integration and decision support for accelerated discovery of antibiotic resistance genes.

Authors: Jason Youn; Navneet Rai; Ilias Tagkopoulos
Journal: Nat Commun Date: 2022-04-29 Impact factor: 17.694

Review 3. Turning omics data into therapeutic insights.

Authors: Amanda Kedaigle; Ernest Fraenkel
Journal: Curr Opin Pharmacol Date: 2018-08-24 Impact factor: 5.547

Review 4. From markers to genome-based breeding in wheat.

Authors: Awais Rasheed; Xianchun Xia
Journal: Theor Appl Genet Date: 2019-01-23 Impact factor: 5.699

5. Computational miRNomics - Integrative Approaches.

Authors: Ralf Hofestädt; Falk Schreiber; Björn Sommer; Jens Allmer
Journal: J Integr Bioinform Date: 2017-06-13

Review 6. Data Integration in Poplar: 'Omics Layers and Integration Strategies.

Authors: Deborah Weighill; Timothy J Tschaplinski; Gerald A Tuskan; Daniel Jacobson
Journal: Front Genet Date: 2019-09-25 Impact factor: 4.599

7. Bottlenecks for genome-edited crops on the road from lab to farm.

Authors: Armin Scheben; David Edwards
Journal: Genome Biol Date: 2018-10-26 Impact factor: 13.583

8. Towards FAIRer Biological Knowledge Networks Using a Hybrid Linked Data and Graph Database Approach.

Authors: Marco Brandizi; Ajit Singh; Christopher Rawlings; Keywan Hassani-Pak
Journal: J Integr Bioinform Date: 2018-08-07

Review 9. A Survey of Gene Prioritization Tools for Mendelian and Complex Human Diseases.

Authors: Olga Zolotareva; Maren Kleine
Journal: J Integr Bioinform Date: 2019-09-09

10. Gene Regulatory Networks of Penicillium echinulatum 2HH and Penicillium oxalicum 114-2 Inferred by a Computational Biology Approach.

Authors: Alexandre Rafael Lenz; Edgardo Galán-Vásquez; Eduardo Balbinot; Fernanda Pessi de Abreu; Nikael Souza de Oliveira; Letícia Osório da Rosa; Scheila de Avila E Silva; Marli Camassola; Aldo José Pinheiro Dillon; Ernesto Perez-Rueda
Journal: Front Microbiol Date: 2020-10-27 Impact factor: 5.640

10 in total