| Literature DB >> 23091487 |
Hamed Bostan1, Naomie Salim, Zeti Azura Hussein, Peter Klappa, Mohd Shahir Shamsir.
Abstract
Computational approaches to the disulphide bonding state and its connectivity pattern prediction are based on various descriptors. One descriptor is the amino acid sequence motifs flanking the cysteine residue motifs. Despite the existence of disulphide bonding information in many databases and applications, there is no complete reference and motif query available at the moment. Cysteine motif database (CMD) is the first online resource that stores all cysteine residues, their flanking motifs with their secondary structure, and propensity values assignment derived from the laboratory data. We extracted more than 3 million cysteine motifs from PDB and UniProt data, annotated with secondary structure assignment, propensity value assignment, and frequency of occurrence and coefficiency of their bonding status. Removal of redundancies generated 15875 unique flanking motifs that are always bonded and 41577 unique patterns that are always nonbonded. Queries are based on the protein ID, FASTA sequence, sequence motif, and secondary structure individually or in batch format using the provided APIs that allow remote users to query our database via third party software and/or high throughput screening/querying. The CMD offers extensive information about the bonded, free cysteine residues, and their motifs that allows in-depth characterization of the sequence motif composition.Entities:
Year: 2012 PMID: 23091487 PMCID: PMC3474208 DOI: 10.1155/2012/849830
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
| PDB | PDB | UniProt | UniProt | |
|---|---|---|---|---|
| Proteins | 73656 | 33874 | 531462 | 140723 |
| Patterns | 535544 | 230213 | 2509611 | 966374 |
| Bonded motifs | 148505 | 64246 | 189238 | 113365 |
| Nonbonded motifs | 387039 | 165967 | 2320373 | 853009 |
| Intrachain | 84591 | 36473 | — | — |
| Interchain | 4013 | 1900 | — | — |
NH: Nonhomologous unique sequences which have been affected by 100% similarity removal.
Figure 1Annotated diagram describing the search options for “Search By ID” section. (A) Users can choose either PDB or SwissProt. (B) Users can enter single or multiple ProteinIDs separated by comma (,) as keyword. (C) Users can choose which of the results to appear in the output.
Figure 2Annotated diagram of “Search By FASTA Sequence” section showing all search options and filtering criteria. (A) Users can choose either PDB or SwissProt. (B) Users can enter single or multiple FASTA sequences to be investigated for each motif inside. (C) Users can also upload a FASTA format file to be investigated. (D) Users can choose the number of amino acid residues on each side of cysteine for motif extraction process within the FASTA sequence. (E) Users can filter the proteins in which the motif will be investigated. User can specify whether the protein was engineered or mutated and choose whether the protein contains any DNA or RNA link. They can also filter out the similar proteins and keep only one identical copy of them for advanced investigations.
Figure 3Annotated diagram describing the result's annotation for the “Search By Molecule Name” section. (A) Showing the motifs, secondary structure, cysteine position in the sequence, and the chain name. (B) Showing the propensity values of the motif sequence. (C) The navigation pane facilitating accessing ProteinIDs having common and similar features. (D) Listing the pair patterns existing in the protein in details. (E) The summary of bonding for the selected protein.
Figure 4Query for full length human protein disulphide isomerase (PDII, P07237 [UniParc]). (A) Screenshot of parameters for CFMD.
Edited output from (A). The bold rows indicate the second active site cysteine residues in the respective thioredoxin motif. Column 1 (Thioredoxin motif) was added for additional clarification. The cysteine residue in italics indicates the queried cysteine residue, the respective position of which is given in the second column.
| Thioredoxin motif | Position | Motif | Total | Bond | Coefficient |
|---|---|---|---|---|---|
| 0 | 0 | 0 | |||
| 1 | 52 | APW | 12 | 5 | 0.417 |
|
|
|
|
|
|
|
| 0 | 0 | 0 | |||
| 0 | 0 | 0 | |||
| 2 | 396 | APW | 12 | 5 | 0.417 |
|
|
|
|
|
|
|