| Literature DB >> 18047702 |
Darren A Natale1, Cecilia N Arighi, Winona C Barker, Judith Blake, Ti-Cheng Chang, Zhangzhi Hu, Hongfang Liu, Barry Smith, Cathy H Wu.
Abstract
Biomedical ontologies are emerging as critical tools in genomic and proteomic research, where complex data in disparate resources need to be integrated. A number of ontologies describe properties that can be attributed to proteins. For example, protein functions are described by the Gene Ontology (GO) and human diseases by SNOMED CT or ICD10. There is, however, a gap in the current set of ontologies - one that describes the protein entities themselves and their relationships. We have designed the PRotein Ontology (PRO) to facilitate protein annotation and to guide new experiments. The components of PRO extend from the classification of proteins on the basis of evolutionary relationships to the representation of the multiple protein forms of a gene (products generated by genetic variation, alternative splicing, proteolytic cleavage, and other post-translational modifications). PRO will allow the specification of relationships between PRO, GO and other ontologies in the OBO Foundry. Here we describe the initial development of PRO, illustrated using human and mouse proteins involved in the transforming growth factor-beta and bone morphogenetic protein signaling pathways.Entities:
Mesh:
Substances:
Year: 2007 PMID: 18047702 PMCID: PMC2217659 DOI: 10.1186/1471-2105-8-S9-S1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1PRO protein ontology overview. The figure shows the current (partial) working model and a subset of the possible connections to other ontologies. Blue text boxes:ProEvo component; lavender text boxes: ProForm component.
Figure 2Schematic representation of the evolutionary relationship between human and mouse versions EPB42 and TGM3. Top panel: protein-glutamine gamma-glutamyltransferase (TGM3) and erythrocyte membrane protein band 4.2 (EPB42) are descended from a common ancestor with glutamyltransferase activity. Middle panel: All four proteins share a common domain arrangement. Bottom panel: PRO and GO connections. TGM3 and EBP42 are descended from an ancestral transglutaminase (TGase) represented by the large circle. This protein evolved into the ancestral forms of TGM3 and EPB42, represented by the ovals. All descendants of the ancestral TGase comprise the parts indicated by blue boxes. Except where indicated, all descendants of the ancestral TGase have transglutaminase activity and are involved in protein modification. However, the ancestral EPB42 lacks these attributes, and instead acquired new attributes not shared by the ancestral TGM3.
Figure 3Smad2 component of the TGF-beta signaling pathway. Not all protein forms and pathway branches are indicated. The steps shown are preceded by phosphorylation of Smad4, TGF-beta binding to the receptor, and receptor phosphorylation. Step 1: Phosphorylation of Smad2 by TGF beta receptor I. Step 2: Complex formation of Smad2 and Smad4. Step 3: Nuclear import of Smad2:Smad4. Step 4: Binding of Smad2:Smad4 complex coactivator to responsive element.
Figure 4Multiple possible Smad2 forms. Not all possibilities are indicated. The third column indicates the known properties for each form. Italicized text indicates those properties that are accurately reflected by the GO terms currently used to annotate human Smad2 in UniProt KB (SMAD2_HUMAN; accession Q15796).
Figure 5A PRO example using nodes and relationships illustrated by Smad2 protein. Not all possibilities are indicated. Cross-references to source of information (in curly braces) and description of sequence forms (in parentheses) are given for clarity. The symbols preceding each PRO accession are as follows: $: root; >: has_part (for domains) or derives_from (for proteins); <: variant_of.