| Literature DB >> 26399714 |
Adam J Richards1, Anthony Herrel2,3, Camille Bonneaud4,5.
Abstract
BACKGROUND: Sequencing technologies provide a wealth of details in terms of genes, expression, splice variants, polymorphisms, and other features. A standard for sequencing analysis pipelines is to put genomic or transcriptomic features into a context of known functional information, but the relationships between ontology terms are often ignored. For RNA-Seq, considering genes and their genetic variants at the group level enables a convenient way to both integrate annotation data and detect small coordinated changes between experimental conditions, a known caveat of gene level analyses.Entities:
Mesh:
Year: 2015 PMID: 26399714 PMCID: PMC4581156 DOI: 10.1186/s12859-015-0729-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Database entity diagram. Data collected from NCBI, the Gene Ontology, and UniProt are organized for efficient taxa related queries. The database tables or entities are shown along with their attributes. The relationships among tables are designated with edges that connect specific attributes
Fig. 2Calculating distances. a Represent a specific aspect of the GO (e.g. biological process) as a directed acyclic graph with solid edges corresponding to the is_a and part_of relationships and nodes representing specific ontology terms. b Genes for one or more taxa are added to the network via annotations (dashed edged) and they are used to calculate term-term distances. c The term relationships can be re-drawn as a fully connected graph where each weighted edge corresponds to a pairwise shortest path from (b). The graph is then represented in gene space as a distance matrix for subsequent clustering
Fig. 3Gene set visualization. htsint was used to visualize a gene set that is produced in the tutorial section of the documentation. Gene Ontology terms are shown as square nodes with the rank according to the number of connections indicated by the label. Additionally, the full name for each term is provided in the legend. Terms are connected by edges representing their semantic distance, which is scaled and shown only for a percentile cutoff (default is 25th) for visualization purposes. Genes are represented as circular nodes with NCBI gene symbols overlaid as labels. The gene nodes are connected through annotations and the species from which the gene belongs to is indicated by the color and specified in the legend