| Literature DB >> 22369201 |
Langho Lee1, Kai Wang, Gang Li, Zhi Xie, Yuli Wang, Jiangchun Xu, Shaoxian Sun, David Pocalyko, Jong Bhak, Chulhong Kim, Kee-Ho Lee, Ye Jin Jang, Young Il Yeom, Hyang-Sook Yoo, Seungwoo Hwang.
Abstract
BACKGROUND: Hepatocellular carcinoma (HCC) is the fifth most common cancer worldwide. A number of molecular profiling studies have investigated the changes in gene and protein expression that are associated with various clinicopathological characteristics of HCC and generated a wealth of scattered information, usually in the form of gene signature tables. A database of the published HCC gene signatures would be useful to liver cancer researchers seeking to retrieve existing differential expression information on a candidate gene and to make comparisons between signatures for prioritization of common genes. A challenge in constructing such database is that a direct import of the signatures as appeared in articles would lead to a loss or ambiguity of their context information that is essential for a correct biological interpretation of a gene's expression change. This challenge arises because designation of compared sample groups is most often abbreviated, ad hoc, or even missing from published signature tables. Without manual curation, the context information becomes lost, leading to uninformative database contents. Although several databases of gene signatures are available, none of them contains informative form of signatures nor shows comprehensive coverage on liver cancer. Thus we constructed Liverome, a curated database of liver cancer-related gene signatures with self-contained context information. DESCRIPTION: Liverome's data coverage is more than three times larger than any other signature database, consisting of 143 signatures taken from 98 HCC studies, mostly microarray and proteome, and involving 6,927 genes. The signatures were post-processed into an informative and uniform representation and annotated with an itemized summary so that all context information is unambiguously self-contained within the database. The signatures were further informatively named and meaningfully organized according to ten functional categories for guided browsing. Its web interface enables a straightforward retrieval of known differential expression information on a query gene and a comparison of signatures to prioritize common genes. The utility of Liverome-collected data is shown by case studies in which useful biological insights on HCC are produced.Entities:
Mesh:
Year: 2011 PMID: 22369201 PMCID: PMC3333186 DOI: 10.1186/1471-2164-12-S3-S3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Examples on how gene signatures appear in articles, in other databases, and in Liverome. The left, middle, and right columns show gene signatures as appeared in articles, in other gene signature databases, and in Liverome, respectively. Column titles for numerical ranking information were abbreviated or ad hoc in the original gene signature tables (left column). Gene signatures extracted by other databases are uninformative (middle column): (A) direct import of original table causes an ambiguity as to what P1 and P2 means and to the context of the observed expression changes (Down and Up), (B) importing only the gene identifiers causes a complete loss of differential expression information, and (C) importing only the change direction causes a loss of numerical information as well as an ambiguity as to the context of the observed expression changes. Shown in blue highlights how the information from the original signature tables became transformed in the databases. Liverome derives the most informative form of gene signatures through manual curation to construct self-contained database content (right column). In addition, for an easier recognition, fold change values were uniformly formatted (shown in red) and signatures were informatively named (shown in green) in Liverome.
Figure 2Comparison of HCC data coverage between Liverome and four other signature databases. For each of the databases, the number of collected HCC-related articles from which gene signatures were extracted is indicated. The four other databases show minimal overlaps with each other. Liverome’s collection is more than three times larger than any other database. One third of its data (thirty articles) is unique to Liverome. Only the articles that met our collection criteria were counted, as described in the “Construction and content” section.
Figure 3The web interface of Liverome. (A) Itemized summary of gene signature, which appears as a pop-up window upon clicking on the name of the signature, as indicated by red rounded rectangles in the figure. (B) Result from gene search interface which reports the informatively named gene signature hits in which the queried gene is found (left column), along with numerical ranking information and designation of compared groups (right column). (C) The browse and comparison interface in which three signatures are marked. (D) Result from the browse and comparison interface. Several display controls are provided at the top of the screen. The table below provides a sorted view of all the genes that are found in the selected signatures. The sequence of the sorting applied to the table is shown on the right of the “Sorted by” control. This table was made compact by using the “Set columns to display” control. See the user guide on the web site for more information.
Comparison of Liverome with other related tools.
| Liverome | EHCO | dbDEPC | CCancer | GeneSigDB | |
|---|---|---|---|---|---|
| Coverage of phenotype | HCC only | HCC only | 15 cancers | Half the data are on cancer | Mostly cancer and stem cell |
| Coverage of HCC-specific data (signatures // articles) | 143 // 98 | 12 // 32 | 6 // 5 | 25 // 21 | 34 // 18 |
| Overall data coverage | Same as above | Same as above | 65 // 48 | 3369 // 2644 | 2142 // 973 |
| Covers both transcriptomics and proteomics studies | No (proteomics only) | ||||
| Explicit designation of compared sample groups | No | No | No | ||
| Contains numerical ranking information | No (change direction only) | No | |||
| Uniform representation of numerical ranking values | No | No | No | No | |
| Informative naming of signatures | No | No | No | No | |
| Summary of experiment | No | No | No | No | |
| Signature comparison tool | Yes | No | |||
| Gene search tool | No | ||||
| Functional categorization of signatures for guided browsing | No | No | No | No | |
| Spreadsheet-like sorting utility for prioritization | No | No | No | No |
Liverome is compared to four other gene signature databases with respect to Liverome’s main utility as a gene search and signature comparison resource for liver cancer research community. Liverome achieves the largest coverage of HCC signatures and the most informative data content at the same time. Its web interface is designed to facilitate the retrieval of the informative data content, the guided browsing of signatures, and their comparison for occurrence-based prioritization.
Figure 4A gene search result is informative only in the presence of informative data content. (A) A search for IGFBP3 gene on EHCO database produces an uninformative result. To decipher the retrieved result, users need to read the source article named “mRNA” to figure out whether the observed expression change of the gene was up-regulation or down-regulation under that dataset as well as the compared sample groups. (B) A search for IGFBP3 gene on Liverome produces an informative result; all information is readily readable and self-contained in one screen.
Figure 5Co-occurrence network of genes in Liverome. Shown here is a matrix representation of the co-occurrence network, where genes are represented on rows and columns in the same order. Elements in the matrix are color coded according to the similarity between corresponding pairs of genes. Genes that are grouped into modules are indicated by the color bars and by the associated clustering trees. Blocks along the diagonal indicate that the genes in the same module are more interconnected than those between modules. Details on the plot can be found in the documentation of the WGCNA package. The enriched biological pathways for selected modules are also shown. The modules are numbered as indicated in Additional File 4.