| Literature DB >> 34500462 |
Ying Wang1,2, Yuantao Tong1, Zeyu Zhang1, Rongbin Zheng1, Danqi Huang1, Jinxuan Yang1, Hui Zong1, Fanglin Tan1, Yujia Xie1, Honglian Huang1, Xiaoyan Zhang1.
Abstract
Molecular mechanisms of virus-related diseases involve multiple factors, including viral mutation accumulation and integration of a viral genome into the host DNA. With increasing attention being paid to virus-mediated pathogenesis and the development of many useful technologies to identify virus mutations (VMs) and viral integration sites (VISs), much research on these topics is available in PubMed. However, knowledge of VMs and VISs is widely scattered in numerous published papers which lack standardization, integration and curation. To address these challenges, we built a pilot database of human disease-related Virus Mutations, Integration sites and Cis-effects (ViMIC), which specializes in three features: virus mutation sites, viral integration sites and target genes. In total, the ViMIC provides information on 31 712 VMs entries, 105 624 VISs, 16 310 viral target genes and 1 110 015 virus sequences of eight viruses in 77 human diseases obtained from the public domain. Furthermore, in ViMIC users are allowed to explore the cis-effects of virus-host interactions by surveying 78 histone modifications, binding of 1358 transcription regulators and chromatin accessibility on these VISs. We believe ViMIC will become a valuable resource for the virus research community. The database is available at http://bmtongji.cn/ViMIC/index.php.Entities:
Mesh:
Substances:
Year: 2022 PMID: 34500462 PMCID: PMC8728280 DOI: 10.1093/nar/gkab779
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.A schematic overview of the ViMIC database. ViMIC collects publicly available virus mutations (VMs), viral integration sites (VISs), cistrome factors and target genes information from multiple public resources, including PubMed, Cistrome Data Browser, VISDB and GEO. ViMIC uses text mining to extract the virus-related bioconcepts from published citations, followed by a manual curation to ensure annotation accuracy. The data processing layer is responsible for specific tasks, including establishing annotation standards stored in a MySQL relationship database, performing VIS mapping to the reference genome, calculating overlaps between VIS and cistrome factors, analysing the correlation between gene expression and infiltrating immune cell fractions and generating statistical plots. The ViMIC interface was designed with three main features: virus mutation sites, viral integration sites and target genes. ViMIC provides three ways to query and explore data: by keyword search, quick search engine or an advanced search menu. In addition, ViMIC includes other modules, labelled as Virus, Disease, Dataset, Download and Help that allow users to better explore the collected data.
Statistics of the ViMIC data
| Data type | Count | Description |
|---|---|---|
| Virus | 8 | Including 5 DNA viruses and 3 RNA viruses. |
| Virus Mutation (VM) | 31 712 | Curated mutation information entries. |
| Virus Sequence | 1 110 015 | Curated virus sequence information entries. |
| Histone Mark | 78 | Histone Mark types overlapped with VISs in ViMIC. |
| Transcription Regulator | 1358 | Transcription regulator types overlapped with VISs in ViMIC. |
| Viral Integration Site (VIS) (Sum Overlaps > 0) | 95 280 | VISs that overlaps with cistrome data. |
| VIS-Histone Modification (HM) Overlaps > 0 | 88 393 | VISs that overlaps with cistrome HM data. |
| VIS-Transcription Factor Binding Site (TFBS) Overlaps > 0 | 71 668 | VISs that overlaps with cistrome TFBS data. |
| VIS-Chromatin Accessibility (CA) Overlaps > 0 | 39 127 | VISs that overlaps with cistrome CA data. |
| Virus Related Disease | 77 | Disease types which associated with the 8 ViMIC viruses. |
| Target Gene | 16 310 | Curated target genes affected by viral genome insertion or virus gene/protein/region regulation. |
| Literature | 2539 | Count of the literature related to virus mutation and integration. |
| Clinical Annotation and Data | 127 | Clinical characteristics and virus sequencing data which are available in literatures. |
Figure 2.Screenshot depicting an example exploration of the ViMIC database. (A) HBV (Hepatitis B virus) is used as an example. The user can access a page on ViMIC containing brief mutation information on HBV by clicking HBV VM Count. The user can input a keyword through the quick search engine and enter the detailed mutation information page to view the mutation site or click the ‘Sequence’ button to quickly browse statistics on virus genome sequence information derived from different regions worldwide and download the curated data information table. (B) If the user was interested in acquiring a detailed factor list for VIS entry ‘1000606’ on chromosome 5 in HBV, ViMIC will show the overlaps result of three cistrome factors after the user clicks the ‘chr5’ and searches the entry of ‘1000606’. If the user was interested in the ‘444’ overlap number within the ‘1000606’ and ‘TFBS’ entry, ViMIC will then return the ranking of all transcriptional regulators that have overlaps with this VIS. By clicking the ‘View’ button of MYC, ViMIC will generate a MYC-related table including the GSMID/ENCODE ID, cell line, cell type, tissue type and factor name. On the HBV VIS-cistrome factor homepage, the user can select the chromosome menu to view the overlap distribution of three factors and virus integration fragments on a specific chromosome (e.g. chr5). By selecting the factor menu, the user can explore more overlap statistics and distribution information for a specific transcription factor (e.g., MYC). The user can click the ‘View VIS-VM Association’ button to browse the enrichment of mutations in VISs. By selecting a reference genome (e.g. AY800389.1) and search a DVID, ViMIC will return a sequence alignment result (table and map) showing the distribution of mutations harbored in VISs (e.g. 1000606). (C) ViMIC provides the ‘Target Gene’ feature for the curated target genes affected by viral genome insertion or by virus gene/protein/region. If the user was interested in the VIS 1000606 target gene TERT in HBV, ViMIC will show TERT gene information, inserted VISs reported in the literature, and the correlation between gene expression levels and the fraction of infiltrating immune cells in HBV related human disease on the detailed gene information page.