| Literature DB >> 26905728 |
Chao Han1, Leanna House1, Scotland C Leman1.
Abstract
Introduced by Bishop et al. in 1996, Generative Topographic Mapping (GTM) is a powerful nonlinear latent variable modeling approach for visualizing high-dimensional data. It has shown useful when typical linear methods fail. However, GTM still suffers from drawbacks. Its complex parameterization of data make GTM hard to fit and sensitive to slight changes in the model. For this reason, we extend GTM to a visual analytics framework so that users may guide the parameterization and assess the data from multiple GTM perspectives. Specifically, we develop the theory and methods for Visual to Parametric Interaction (V2PI) with data using GTM visualizations. The result is a dynamic version of GTM that fosters data exploration. We refer to the new version as V2PI-GTM. In this paper, we develop V2PI-GTM in stages and demonstrate its benefits within the context of a text mining case study.Entities:
Mesh:
Year: 2016 PMID: 26905728 PMCID: PMC4764361 DOI: 10.1371/journal.pone.0129122
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1A visual description of GTM.
This exemplifies how the latent space constructed by (denoted by ⋆ on the left) and the manifold constructed by (denoted by ⋆ on the right) in a three-dimensional data space relate. Raw data points are denoted by •.
Fig 2A simulated three-dimensional dataset from five Multivariate Normal distributions and its GTM visualization.
Fig a) shows that there are two groups of clusters in three dimensions. The first group includes clusters 1, 2, and 3. The second group includes clusters 4 and 5. Fig b) provides a two-dimensional visualization of the data using GTM.
Fig 3The progression of V2PI-GTM.
Fig a is a GTM display (in latent dimensions q1, q2) of the simulated dataset when K = 16, J = 400. The data points are labeled according to their cluster numbers. The arrows show how a user may interact. A user may move one point from location A to location B and another point from location C to location D. Figs b, c, and d show respectively how the observations respond (or do not respond) to the move when stages 1, 2, and 3 of V2PI-GTM are in place.
Fig 4V2PI-GTM with NIH data.
We provide a GTM display of the NIH abstracts (labeled by their identification numbers) before and after user interaction in Figs a and b, respectively. The interaction is portrayed by the pink arrow in Fig a; Abstract 7 was moved to a location near cluster D. In addition, to labeling and learning about four clusters in the data (marked by A, B, C, and D), we also tagged the latent GTM space. After the interaction, we see that the clusters grouped differently and the meaning of the latent space changed. Also, the manifold changed dramatically.
This table lists the Top 10 keywords that either differentiate clusters A, B, C, and D or are shared among all of the clusters in Fig 4.
| Cluster A | tumors, brains, stem, treatments, patients, generations, drugs, ordering, controlling, therapeutics |
| Cluster B | stem, neuronal, brains, proteins, deliveries, regulations, neural, patients, differentiation, expression, treatments |
| Cluster C | stem, genetically, regulations, drugs, structurally, proteins, genomics, epigenetics, RNAs, complexities |
| Cluster D | Infections, treatments, tuberculosis, expression, patients, drugs, strains, resistance, vaccination, immunity |
| Shared | cells, functionalization, diseases, developments, genes, cancerous, studying, researchers, proposing, mechanisms, specification |
Descriptions of Abstracts 20, 22, 32 and 39 in Fig 4.
| 20 | discusses diagnosis of HIV infection in patients who live with limited access to therapeutic treatments |
| 22 | discusses expression characteristics of a drug-resistant gene |
| 32 | discusses varying yeast strains |
| 39 | discusses Lymphocyte Homing |