| Literature DB >> 25581801 |
Qian Zhu1, Aaron K Wong1, Arjun Krishnan2, Miriam R Aure3, Alicja Tadych2, Ran Zhang4, David C Corney4, Casey S Greene5, Lars A Bongo6, Vessela N Kristensen7, Moses Charikar8, Kai Li8, Olga G Troyanskaya9.
Abstract
We present SEEK (search-based exploration of expression compendia; http://seek.princeton.edu/), a query-based search engine for very large transcriptomic data collections, including thousands of human data sets from many different microarray and high-throughput sequencing platforms. SEEK uses a query-level cross-validation-based algorithm to automatically prioritize data sets relevant to the query and a robust search approach to identify genes, pathways and processes co-regulated with the query. SEEK provides multigene query searching with iterative metadata-based search refinement and extensive visualization-based analysis options.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25581801 PMCID: PMC4768301 DOI: 10.1038/nmeth.3249
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Figure 1The SEEK system overview and systematic functional evaluation
(a) The system overview. Users begin by defining a query gene set of interest. SEEK can easily accommodate gene sets as small as 1–2 genes and as large as 100 genes (step 1). The SEEK search engine searches the entire compendium, and returns genes that are co-expressed with the query and the top relevant data sets (steps 2, 3). The web user-interface provides visualizations of gene co-expressions across data sets (step 4), and enables users to iteratively refine their search (Fig. 2) and further analyze the results through condition-specific view (step 5). The latter allows users to check possible associations with the measured outcomes in order to interpret the co-expressed genes (Supplementary Note 3). (b) Gene retrieval evaluations across 995 diverse GO biological process terms, for each of SEEK, MEM, Gene recommender, and meta-data set correlation algorithms (Supplementary Note 1). Queries of diverse sizes (2–20 genes) were selected randomly among each term’s genes to evaluate the precision of retrieving the remaining genes in each term. Individual term performances (Supplementary Data 2) and additional detailed comparative evaluations (Supplementary Figs. 1, 2) are provided.
Figure 2Search results for the Hedgehog (Hh) query (GLI1, GLI2, PTCH1) and search refinement
(a) Data sets prioritized and genes retrieved for the query in the main result page, expression view. The result is retrieved from the Hh query after a global compendium search. The top ranked data sets (1) and the co-expressed gene list (2) are indicated. Conditions in each data set are hierarchically clustered in real-time according to the expression values of the top genes retrieved from the search (3). The expression heat-map of the genes in one of the data sets is shown in (4). (b) Illustration of the search refinement function. Refine Search enables users to narrow the scope of their search based on a powerful and broad set of selection criteria including tissue, cell-type, or disease categories, platforms, or rank of data sets from initial search (Supplementary Note 3). (c) The final results after limiting the search scope to brain data sets. Brain-specific co-expressions are noted in this case with higher co-expression scores to the query and better groupings of conditions than the initial search. SEEK also has alternative view modes such as co-expression view, and condition-specific view (Supplementary Note 3).