| Literature DB >> 18421562 |
Zhi-Ping Liu1, Ling-Yun Wu, Yong Wang, Xiang-Sun Zhang, Luonan Chen.
Abstract
One of the major goals of molecular and evolutionary biology is to understand the functions of proteins by extracting functional information from protein sequences, structures and interactions. In this review, we summarize the repertoire of methods currently being applied and report recent progress in the field of in silico annotation of protein function based on the accumulation of vast amounts of sequence and structure data. In particular, we emphasize the newly developed structure-based methods, which are able to identify locally structural motifs and reveal their relationship with protein functions. These methods include computational tools to identify the structural motifs and reveal the strong relationship between these pre-computed local structures and protein functions. We also discuss remaining problems and possible directions for this exciting and challenging area.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18421562 PMCID: PMC7088341 DOI: 10.1007/s00726-008-0088-8
Source DB: PubMed Journal: Amino Acids ISSN: 0939-4451 Impact factor: 3.520
The classification schemes to define functions of proteins
| Method | URL | Description |
|---|---|---|
| EC |
| The functional catalogue for enzyme. It provides four hierarchical level classes. For example, EC 1.1.1.163 represents cyclopentanol dehydrogenase |
| MIPS |
| The functional categories for yeast. It can be extended to other organisms of life. For example, 01.01.06.06.01.01 represents diaminopimelic acid pathway |
| GO |
| The systematic classification of proteins. It is species-independent and contains three relatively independent ontologies. For example, GO:0051635 represents bacterial cell surface binding (F) |
| KEGG |
| Linking genomes to biological systems and also to environments by the processes of interaction and reaction mapping |
MIPS, Munich Information Center for Protein Sequences; EC, Enzyme Commission; KEGG, Kyoto Encyclopedia of Genes and Genomes; GO, Gene Ontology
Fig. 1Framework of existing function annotation methods. The dotted line links the individual methods with the interaction methods. The safe zone means that pairwise sequence identity is higher than 40%, the twilight zone, about 20–30%, the midnight zone below 20%
The categories of protein functions close related to local structures
| Function | Descriptor |
|---|---|
| Protein binding | The protein–protein interfaces where the physical interactions take place |
| Ligand binding | Including nucleotide binding (e.g. DNA and RNA binding), lipid binding (e.g. cholesterol, glycerol, ganglioside, etc.), ligand; and carbohydrate binding (e.g. glucose, fructose, lactose, maltose, disaccharides, trisaccharides, etc.) |
| Metal binding | Functions of binding metals, such as zinc, magnesium and calcium |
| Catalytic site | Functional regions performing the catalytic functions |
| Miscellaneous sites | Active sites involving particular functions |
Database of identified local structures based on sequences information
| Database | URL | Descriptor |
|---|---|---|
| PROSITE |
| A database of protein families and domains |
| PRINTS |
| A compendium of protein fingerprints |
| Pfam |
| A database of common protein domains and families by HMM |
| ProDom |
| A database of protein domain families |
| SMART |
| Simple Modular Architecture Research Tool |
| SUPERFAMILY |
| A database of structural and functional protein annotations |
Methods and databases to identify local structures based on structure information
| Method | URL | Descriptor |
|---|---|---|
| CASTp |
| A database for identifying pockets and voids of proteins |
| pvSOAR |
| A web server of detecting similar pockets from CASTp |
| SURFNET |
| An algorithm for generating protein surfaces |
| SURFACE |
| A database of protein surface patches |
| eF-Site |
| A database for molecular surfaces of proteins’ functional sites |
| LigSite | Unavailable | A fast algorithm to identify ligand-binding site |
| CSA |
| A database documenting enzyme catalytic residues |
| PINTS |
| Finding local similarities between protein structures |
| SiteBase |
| A database of known ligand-binding sites |
| PDBSiteScan |
| Performing the best superposition of sites from PDBSite |
| SPASM |
| Comparing user-defined motifs against a structure database |
| RIGOR |
| Searching a motif database to find matches, (opposite of SPASM, hence the name) |
| SuMo |
| A graph-based algorithm for finding similarities in substructures |
Fig. 2Bridges between the local structures and the functions. a The schematic categories of the bridges, b the detailed and hierarchical classification of these bridges. In the lowest classes, the bound color implies the schematic category to which they belong
Element-based methods for identifying functional motifs
| Local structure | Method | Software | Reference |
|---|---|---|---|
| Sequence motif | |||
| Binding sites | Multiple sequence alignment | – | Ma et al. ( |
| Catalytic sites | Multiple sequence alignment | Conservation | Capra and Singh ( |
| Structural motif | |||
| Functional active sites | Surface comparison | – | Rosen et al. ( |
| Recurring 3D motifs | Geometric hashing for structure alignment | – | Fischer et al. ( |
| Protein–protein interfaces | Comparison and querying | BID | Fischer et al. ( |
| Functional sites | All-vs-all comparison (from FSSP) | Phunctioner | Pazos and Sternberg ( |
| Constructed surface cavity | Pairwise alignment and querying | pvSOAR | Binkowski et al. ( |
| Geometric and electrostatic surfaces | Pairwise alignment and querying | eF-site | Kinoshita and Nakamura ( |
| Surface chemical groups | Querying for similarity | SuMo | Jambon et al. ( |
| Binding pockets | Alignment all-vs-all and clustering | CavBase | Schmitt et al. ( |
| Binding sites and interface | Comparison for similarity | I2I-SiteEngine | Shulman-Peleg et al. ( |
| Documented motif | |||
| Annotated sites | Alignment all-vs.-all and querying | PINTS | Stark and Russell ( |
| Ligand-binding sites | Alignment all-vs.-all and querying | SiteBase | Gold and Jackson ( |
| Known sites, especially interfaces | Querying for similarity | PDBSiteScan | Ivanisenko et al. ( |
| Sequence map to spatial motif | |||
| Functional residues and sites | Multiple sequence alignment and phylogenetic | ET | Yao et al. ( |
| Functional residue clusters | Based on ET | – | Landgraf et al. ( |
| Patches of conserved residues | Based on ET | ConSurf | Armon et al. ( |
| Functional sites | Based on ET | – | Aloy et al. ( |
| Function template | |||
| Functional 3D templates | Matching by geometric hashing | TESS | Wallace et al. ( |
| Metal-binding sites | Comparison with templates | PAR-3D | Goyal and Mande ( |
| Annotated functional sites | Comparison with templates | FIC | Chakrabarti and Lanczycki ( |
| Tertiary side-chain patterns | Subgraph-isomorphism matching | ASSAM | Artymiuk et al. ( |
Feature-based methods for identifying functional motifs
| Local structure | Feature | Software | Reference |
|---|---|---|---|
| Scoring for every features: physical features, such as shape, size, depth and geometry, among others | |||
| DNA-binding sites | Interfacial geometry | IAlign | Siggers et al. ( |
| Pockets for binding | Size and depth | PHECOM | Kawabata and Go ( |
| Binding pockets | Shape | – | Morris et al. ( |
| Binding pockets | Geometrical complementary | – | Kahraman et al. ( |
| Chemical features, such as energy, potential and conservation, among others | |||
| Functional important residues | Electrostatic energy and conservation | – | Elcock ( |
| Protein–ligand binding sites | Physicochemical energy | Q-sitefinder | Laurie and Jackson ( |
| Protein–DNA binding sites | Five characteristics of patches | Web server | Jones et al. ( |
| Protein–RNA binding sites | As the former DNA-binding sites and van der Waals | Web server | Jones et al. ( |
| Protein–DNA binding sites | Hydrogen bonds and van der Waals interactions | Web server | Luscombe et al. ( |
| Protein interface | Energy score, propensity, conservation | PINUP | Liang et al. ( |
| Functional sites | Sequence, Rosetta free energy | Web server | Cheng et al. ( |
| Functional residues | Conservation score | – | Panchenko et al. ( |
| Functional sites | Functional groups | CFG | Innis et al. ( |
| Combined feature, such as the former features | |||
| Ligand-binding sites | Geometry and conservation score | LIGSITEcsc | Huang and Schroeder ( |
| Protein–DNA binding sites | Shape and electrostatic potential | – | Tsuchiya et al. ( |
| Carbohydrate-binding sites | Six parameters | – | Taroni et al. ( |
| Protein–protein interfaces | Structure and physicochemical | ProMate | Neuvirth et al. ( |
| Docking pockets | Geometry and energy | – | Li et al. ( |
| Protein–protein interfaces | Five parameters | – | Hoskins et al. ( |
| Ligand binding pockets | Cleft volume and residue conservation | SURFNET-Consurf | Glaser et al. ( |
| Learning the features: SVM | |||
| Protein–protein interfaces | Sequence profile, amino acid composition | – | Koike and Takagi ( |
| Protein–protein interfaces | Evolutionary conservation signal | – | Bordner and Abagyan ( |
| Protein–DNA binding sites | Composition, charge, positive potential patches | Web server | Bhardwaj et al. ( |
| Binding sites | Sequence and structural complementary | – | Chung et al. ( |
| Neural network | |||
| Protein–protein interfaces | Composition | – | Ofran and Rost ( |
| Protein–protein interfaces | Conservation and residues structure properties | PPISP | Zhou and Shan ( |
| Catalytic residues | Conservation, ASA, structure, depth | – | Gutteridge et al. ( |
| Protein–protein interaction sites | Conservation and disposition | ISPRED | Fariselli et al. ( |
| Nucleic-acid-binding sites | Ensemble features of sequence and structure | – | Stawiski et al. ( |
| DNA-binding sites | Sequence profiles and solvent accessibility | DISPLAR | Tjong and Zhou ( |
| DNA-binding sites | Structure, ASA and electrostatic potential | DbHTH | Ferrer-Costa et al. ( |
| Metal-binding site residues | Sequence and structure data | MetSite | Sodhi et al. ( |
| Binding sites | Physical and chemical property lists | – | Keil et al. ( |
| DNA-binding sites | Evolutionary conservation | DP-BIND | Kuznetsov et al. ( |
| Metal-binding sites | Evolutionary profiles | – | Passerini et al. ( |
| Describing the features by statistical methods | |||
| Functional sites | Calculated feature vectors | FEATURE | Liang et al. ( |
| Protein–protein binding site | Six parameters | PPI-Pred | Bradford et al. ( |
| Protein–protein interface | Amino acid clusters | – | Yan et al. ( |
| Protein–DNA binding sites | Residues and sequence entropy | – | Yan et al. ( |
| Protein–protein interaction sites | Motifs and coexpression | InSite | Wang et al. ( |
| DNA-binding sites | Geometrical measures | – | McLaughlin and Berman ( |
| Drug-binding sites | 408 attributes, 8 broad categories | SCREEN | Nayal and Honig ( |
| Metal-binding sites | Geometric features | CHED | Babor et al. ( |
| Zinc-binding sites | A physicochemical feature set | Web server | Ebert and Altman ( |
Network-based methods for identifying functional motifs
| Local structure | Method | Software | Reference |
|---|---|---|---|
| Micro level: mining the special residues or subgraphs in the structure graphs | |||
| Active site residues | High closeness value of residue interaction graphs | RIG | Amitai et al. ( |
| Functional residues | Residues of special topology in small-world network | – | del Sol et al. ( |
| Recurring side-chain patterns | Searching for similar subgraph | DRESPAT | Wangikar et al. ( |
| Structure motifs | Mining for cliques of the structure graph | CliqueHashing | Huan et al. ( |
| Macro level: similar groups of local structures | |||
| Functional pockets | Similar pocket groups | PSN | Liu et al. ( |