| Literature DB >> 33253151 |
Simon Kasif1,2, Richard J Roberts1,3.
Abstract
How do we scale biological science to the demand of next generation biology and medicine to keep track of the facts, predictions, and hypotheses? These days, enormous amounts of DNA sequence and other omics data are generated. Since these data contain the blueprint for life, it is imperative that we interpret it accurately. The abundance of DNA is only one part of the challenge. Artificial Intelligence (AI) and network methods routinely build on large screens, single cell technologies, proteomics, and other modalities to infer or predict biological functions and phenotypes associated with proteins, pathways, and organisms. As a first step, how do we systematically trace the provenance of knowledge from experimental ground truth to gene function predictions and annotations? Here, we review the main challenges in tracking the evolution of biological knowledge and propose several specific solutions to provenance and computational tracing of evidence in functional linkage networks.Entities:
Year: 2020 PMID: 33253151 PMCID: PMC7728211 DOI: 10.1371/journal.pbio.3000999
Source DB: PubMed Journal: PLoS Biol ISSN: 1544-9173 Impact factor: 8.029
Fig 1Functional linkage networks in bacteria and human genomes.
The networks are produced by the STRING database. Functional linkage relationships could be based on homology, coevolution, genomic context, protein–protein interaction screens, co-expression in a set of experiments, and other correlative or experimental evidence and became standard representations in computational biology. Each node has an associated functional label produced either by experiment or prediction. STRING, Search Tool for the Retrieval of Interacting Genes/Proteins.
Fig 2Tracing provenance in functional linkage graphs (2A) and causal networks (2B). The nodes correspond to genes, and the colors are adopted from COMBREX. This coloring scheme was inspired by skiing signs (green, blue, black slopes, and an additional color of gold). Black labels correspond to genes that have no current predictions and therefore are most challenging to experimentally test. Green and gold genes have experimental evidence, and blue genes have predictions. Green labels suggest that some experimental evidence might be missing such as the organisms used or conditions under which the experiment was performed. All blue nodes should ideally have a provenance link to a gene that may demonstrate a phenotype associated with this gene in a genetic experiment or its biochemical function.