| Literature DB >> 36164976 |
Hang Hu1, Zhenxiao Lu1, Haisong Feng1, Guojun Chen1, Yongmei Wang1, Congshan Yang1, Zhenyu Yue1.
Abstract
Apicomplexan parasites cause severe diseases in human and livestock. Dense granule proteins (GRAs), specific to the Apicomplexa, participate in the maintenance of intracellular parasitism of host cells. GRAs have better immunogenicity and they can be emerged as important players in vaccine development. Although studies on GRAs have increased gradually in recent years, due to incompleteness and complexity of data collection, biologists have difficulty in the comprehensive utilization of information. Thus, there is a desperate need of user-friendly resource to integrate with existing GRAs. In this paper, we developed the Dense Granule Protein Database (DGPD), the first knowledge database dedicated to the integration and analysis of typical GRAs properties. The current version of DGPD includes annotated GRAs metadata of 245 samples derived from multiple web repositories and literature mining, involving five species that cause common diseases (Plasmodium falciparum, Toxoplasma gondii, Hammondia hammondi, Neospora caninum and Cystoisospora suis). We explored the baseline characteristics of GRAs and found that the number of introns and transmembrane domains in GRAs are markedly different from those of non-GRAs. Furthermore, we utilized the data in DGPD to explore the prediction algorithms for GRAs. We hope DGPD will be a good database for researchers to study GRAs. Database URL: http://dgpd.tlds.cc/DGPD/index/.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36164976 PMCID: PMC9513560 DOI: 10.1093/database/baac085
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 4.462
Figure 1.Workflow for data curation in DGPD database. Experimentally validated GRAs are classified as the group of ‘confirmed GRAs’ (with blue arrows). Highly suspected GRAs existing in the main text or attachment of literature are included in the group of ‘likely GRAs’ (with orange arrows). Homologous proteins of known dense granule proteins in PlasmoDB and ToxoDB database are included in the group of ‘predicted GRAs’ (with green arrows).
Figure 2.The workflow of prediction models for identifying GRAs.
Statistics in DGPD
| Species | Level 1 | Level 2 | Level 3 | Total |
|---|---|---|---|---|
|
| 66 | 26 | 80 | 172 |
|
| 11 | 0 | 18 | 29 |
|
| 8 | 11 | 0 | 19 |
|
| 16 | 0 | 0 | 16 |
|
| 9 | 0 | 0 | 9 |
Figure 3.A web-interface of DGPD database. (A) Panel of GRA repository. A statistics visualization is displayed on the right. The gene information can be viewed by submitting keywords in search bar. (B) Panel of gene information. Detailed information of gene that users search is visualized on this panel. (C) Panel of database introduction and help. Users will receive help and brief introduction for database functions. The catalog is displayed on the top left. (D) Download panel. All data are available through this panel. (E) Data submission panel. The novel GRAs information is allowed to submit in this panel. (F) Contact panel. The different contact ways is provided for user to communicate with us.
Figure 4.Feature analysis between positive and negative samples across species. Orange and blue represent GRAs and non-GRAs, respectively.
Figure 5.Performance of different machine learning-based models.