| Literature DB >> 23514233 |
Daniel Lopez1, Florencio Pazos.
Abstract
Predicting the function of newly sequenced proteins is crucial due to the pace at which these raw sequences are being obtained. Almost all resources for predicting protein function assign functional terms to whole chains, and do not distinguish which particular domain is responsible for the allocated function. This is not a limitation of the methodologies themselves but it is due to the fact that in the databases of functional annotations these methods use for transferring functional terms to new proteins, these annotations are done on a whole-chain basis. Nevertheless, domains are the basic evolutionary and often functional units of proteins. In many cases, the domains of a protein chain have distinct molecular functions, independent from each other. For that reason resources with functional annotations at the domain level, as well as methodologies for predicting function for individual domains adapted to these resources are required.We present a methodology for predicting the molecular function of individual domains, based on a previously developed database of functional annotations at the domain level. The approach, which we show outperforms a standard method based on sequence searches in assigning function, concomitantly predicts the structural fold of the domains and can give hints on the functionally important residues associated to the predicted function.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23514233 PMCID: PMC3584904 DOI: 10.1186/1471-2105-14-S3-S12
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Schema of the method. The starting point is the SCOP2GO functional annotation of chains at the domain level (top left). The shapes represent the folds of the domains, while the colors represent assigned GO:MF functions. Structural alignments for the domains with the same fold and the same function are generated and PSSM profiles are derived from them. For benchmarking (left), an equivalent database is constructed merging all the domain sequences involved in these profiles. For assessing the performance of both resources for a given test sequence, new versions of both databases are built by excluding this sequence and all its homologs (transparent cylinders in the figure). Querying the test sequence against both resources produces list of hits which can be interpreted as predictions of folds and functions (colored shapes) associated to its domains. These predictions of both resources for the domains of the test sequence can be contrasted against its original SCOP2GO annotations (multi-colored triangle and circle). For predicting (right), a sequence of unknown domain characteristics in terms of fold and function is queried against the database of PSSMs. The hits can be interpreted as predictions of fold and function at the domain level. Additionally, the conservation pattern of the structural alignments associated to the matched PSSMs can give clues about functionally important residues.
Figure 2Large-scale evaluation results. ROC plots illustrating the discriminative capacity of the highest scoring hits detected by both methods in detecting the right function (a and b) and fold (c) in the correct domain. a) Only the GO terms at distance 2 or higher from the root of the GO:MF graph are evaluated; b) the same for distance 4 or higher (more specific terms).
Figure 3Example of predicted functional residues. Residues predicted by the method as associated to the GO:0004672 ("protein kinase activity") function for the casein kinase 1 ([PDB:2csn]A) mapped in the structure of this protein. Red and purple: predicted residues. Blue and purple: catalytic residues annotated in CSA. The prosthetic groups are shown in grey and spacefill. Figure generated with PyMOL (http://www.pymol.org).