| Literature DB >> 20305778 |
Ashish V Tendulkar1, Martin Krallinger, Victor de la Torre, Gonzalo López, Pramod P Wangikar, Alfonso Valencia.
Abstract
BACKGROUND: FragKB (Fragment Knowledgebase) is a repository of clusters of structurally similar fragments from proteins. Fragments are annotated with information at the level of sequence, structure and function, integrating biological descriptions derived from multiple existing resources and text mining.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20305778 PMCID: PMC2841175 DOI: 10.1371/journal.pone.0009679
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1FragKB annotation flowchart.
This figure provides an overview of the various steps followed in the annotation workflow followed by FragKB, from the initial generation of structural octapeptide fragments and clusters to the functional annotation at the level of clusters, individual fragments and proteins. A. Fragment Preprocessing: generation of structural fragments and clusters; B. Cluster Annotation: structure, sequence and functional descriptions generated for fragment clusters; C. Protein Annotation: description of fragments at the upper level of the corresponding proteins, the fragment itself and the individual residues; D. Cluster classification method based on SCOP superfamily distribution; E. Cluster Molecular Function Annotation method through Gene Ontology term frequency and over-representation analysis; F. Text Mining and Literature processing for the extraction of protein descriptions and mutations.
Summary statistics for homogeneous and heterogeneous clusters.
| Homogeneous Clusters | Heterogeneous Clusters | |
| Number of clusters | 2,207 | 10,696 |
| Number of fragments | 28,575 | 437,455 |
| Average Information Content (Bits) | 3.24 (+/− 0.51) | 1.89 (+/− 0.60) |
| Average distance between fragment endpoints (Å) | 13.50 (+/− 3.86) | 13.59 (+/− 3.93) |
Figure 2FragKB annotation example: HRAS GTPase.
A. Fragment overview of the protein sequence, in green homogeneous fragments can be seen, while heterogeneous fragments are displayed in red. B. Structural visualization of fragment types (green and red) and user selected fragment (pink). C. GO molecular function term frequency analysis of the cluster. D. Fragment summary information and octapeptide sequence. E. Fragment cluster summary showing general cluster descriptions including the cluster regular expression and signature SCOP superfamily association. F. Sample subset of protein cluster list. G. Sequence logo representation showing sequence position versus information content. H. Prosite pattern annotation for the fragment. I. Sample of literature extracted mutation sentence descriptions for G12.