| Literature DB >> 22607587 |
Christopher W Bartlett1, Soo Yeon Cheong, Liping Hou, Jesse Paquette, Pek Yee Lum, Günter Jäger, Florian Battke, Corinna Vehlow, Julian Heinrich, Kay Nieselt, Ryo Sakai, Jan Aerts, William C Ray.
Abstract
In 2011, the IEEE VisWeek conferences inaugurated a symposium on Biological Data Visualization. Like other domain-oriented Vis symposia, this symposium's purpose was to explore the unique characteristics and requirements of visualization within the domain, and to enhance both the Visualization and Bio/Life-Sciences communities by pushing Biological data sets and domain understanding into the Visualization community, and well-informed Visualization solutions back to the Biological community. Amongst several other activities, the BioVis symposium created a data analysis and visualization contest. Unlike many contests in other venues, where the purpose is primarily to allow entrants to demonstrate tour-de-force programming skills on sample problems with known solutions, the BioVis contest was intended to whet the participants' appetites for a tremendously challenging biological domain, and simultaneously produce viable tools for a biological grand challenge domain with no extant solutions. For this purpose expression Quantitative Trait Locus (eQTL) data analysis was selected. In the BioVis 2011 contest, we provided contestants with a synthetic eQTL data set containing real biological variation, as well as a spiked-in gene expression interaction network influenced by single nucleotide polymorphism (SNP) DNA variation and a hypothetical disease model. Contestants were asked to elucidate the pattern of SNPs and interactions that predicted an individual's disease state. 9 teams competed in the contest using a mixture of methods, some analytical and others through visual exploratory methods. Independent panels of visualization and biological experts judged entries. Awards were given for each panel's favorite entry, and an overall best entry agreed upon by both panels. Three special mention awards were given for particularly innovative and useful aspects of those entries. And further recognition was given to entries that correctly answered a bonus question about how a proposed "gene therapy" change to a SNP might change an individual's disease status, which served as a calibration for each approaches' applicability to a typical domain question. In the future, BioVis will continue the data analysis and visualization contest, maintaining the philosophy of providing new challenging questions in open-ended and dramatically underserved Bio/Life Sciences domains.Entities:
Mesh:
Year: 2012 PMID: 22607587 PMCID: PMC3355334 DOI: 10.1186/1471-2105-13-S8-S8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1A heat map representation of the spiked-in correlation network in the simulated data. The heatmap is a two dimensional projection of a four dimensional matrix, 15 × 15 genes × 3 × 3 genotypes. Here the 3 × 3 cross-genotype blocks are nested within each gene block. As a self-correlation matrix, the column IDs are identical to the row IDs. The left panel shows the two sub-networks that were used to drive the simulation, one involving CDH1 and CDH10, the second involving CDH19, PCDH1, PCDH10, and PCDH17. PCHD19 interacted with several genes, but only under certain genotype configurations. This matrix also implies other high order dependencies that are not well shown in this form, but can be observed by tracing from a significant value in a cell, to any other significant value for another gene that occurs in either the same row or column. The number of steps along which such a chain may be followed, defines the number of interacting factors. The correlation matrix re-derived from the output of the simulation (right panel) includes both the spiked-in network and stochastic variation from the simulation, as well as the real biological correlations across genes.
Figure 2The Visualization Experts' pick. (a) Association gene network ed from all pairs of 3843 SNPs with a significant association (p <0.05, PLINK two-locus results) with the gene expression of the 15 genes and filtered such that only SNP pairs containing at least one highly significant SNP (R2 >0.1 and p <0.05, PLINK single locus results) remain. All edges with weight w ≥ 40 are shown. Nodes represent genes, edges represent significant SNP pairs. Genes significantly associated with SNP pairs are colored using a distinct color, genes with no significant association are drawn with gray fill. Each edge conveys four pieces of information: An edge e of weight w starting in node s, ending in node t and drawn with color c represents w SNP pairs, where each of them has one SNP in gene s and one in gene t. These SNP pairs are significantly associated with the expression of the gene whose node is filled with color c; (b) Aggregated iHAT visualization of 29 visually selected SNPs where the 'affected' and 'not affected' groups display different colors.
Figure 3The Biology Experts' pick. Parallel coordinate display of gene expressions per individual. Vertical axes represent expression level for a given gene; horizontal polylines across the display represent each individual. Individuals are stratified in case (pink) versus control (grey). At the top of each vertical axis a histogram displays the distribution of expression levels of that gene over all individuals, stratified by group. The data for genes 1, 3, 5 and 6 are filtered for high and/or low values in this figure.
Figure 4The Overall Best entry. A topological network map of SNPs produced by Iris. Each node represents a cluster of SNPs and nodes are connected with an edge if they have any SNPs in common. The starburst shape indicates subgroups of SNPs with distinct linkage disequilibrium patterns in the data set. A) Each flare of the starburst contains SNPs from a single locus and is labeled accordingly, except for the "Mixed" flare. The nodes are colored by SNP mutual information with disease. Higher mutual information values are colored red and indicate a stronger relationship. B) The nodes are colored by SNP ANOVA F-statistic with expression of CDH19. Higher F-statistics are colored red and indicate a stronger relationship. The flare with the red tip contains SNPs from the CDH19 locus; see label in A. C) The nodes are colored by F -statistic to expression of PCDH17. D) The nodes are colored by F -statistic to PCDH10. E) The nodes are colored by F -statistic to CDH11.