| Literature DB >> 23031578 |
Abstract
BACKGROUND: Computational sequence analysis, that is, prediction of local sequence properties, homologs, spatial structure and function from the sequence of a protein, offers an efficient way to obtain needed information about proteins under study. Since reliable prediction is usually based on the consensus of many computer programs, meta-severs have been developed to fit such needs. Most meta-servers focus on one aspect of sequence analysis, while others incorporate more information, such as PredictProtein for local sequence feature predictions, SMART for domain architecture and sequence motif annotation, and GeneSilico for secondary and spatial structure prediction. However, as predictions of local sequence properties, three-dimensional structure and function are usually intertwined, it is beneficial to address them together.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23031578 PMCID: PMC3519821 DOI: 10.1186/1741-7007-10-82
Source DB: PubMed Journal: BMC Biol ISSN: 1741-7007 Impact factor: 7.431
Programs used in MESSA for prediction of local sequence features and their interpretation
| Feature | Meaning | Programs used | Output |
|---|---|---|---|
| Assist three-dimensional structure and domain boundary prediction. | PSIPRED (v2.0) [ | PSIPRED and SSPRO predict 3-states secondary structures (H: α-helix, E: β-strand, C: coils); DISEMBL predict coils (lower-case letters highlighted in pink) | |
| Assist three-dimensional structure prediction. | DISEMBL (v1.5) [ | Loops that are likely to have high B factors in the X-ray crystallography (lower-case letters highlighted in pink) | |
| DISEMBL (v1.5) [ | Residues without a defined structure (represented by star marks and highlighted in red) | ||
| Predict subcellular localization and transmembrane, reveal topology of transmembrane proteins and provide hints to the protein function. | TMHMM (v2.0) [ | H: transmembrane helix (colored in blue); h: not confidently predicted transmembrane helix; o: periplasmic loop, i: cytoplasmic loop. x: loop region (not specified as periplasmic or cytoplasmic). | |
| MEMSATSVM [ | H: transmembrane helix (colored in blue); S: signal peptide (colored in green); h: unconfident transmembrane helix; o: periplasmic loop, i: cytoplasmic loop. | ||
| SignalP (v3.0) [ | S: signal peptide (highlighted in green) o: periplasmic region; x: do not have signal peptide | ||
| Reveal false positive hits of homology search caused by matching of low-complexity region. | SEG [ | The part with low diversity in amino acid composition (highlighted in pink), likely to be disordered or fold as α helices, such as coiled coil | |
| Assist three-dimensional structure prediction. | COILS [ | x: coiled coils, highlighted in yellow | |
| Reveal essential residues for the folding and function of a protein. | BLAST (hits filtered by > 40% coverage and < 90% identity are included in the profile), AL2CO (calculate conservation indices based on profile) [ | Sequence highlighted by the conservation (highlighted from white, through yellow to dark red as conservation increases) | |
aTOPPRED and HMMTOP are mainly designed to predict the topology of a given membrane protein rather than distinguish transmembrane proteins from cytoplasmic ones. Thus they may recognize the hydrophobic buried helices in cytoplasmic proteins as transmembrane helices, leading to a high false positive rate.
Figure 1Local sequence predictions. (B) Function prediction. (C) Structure prediction.
Figure 2Fractions of proteins in .
Confidence score of homologs from Swiss-Prot database.
| Evaluation method | Criteria | Points |
|---|---|---|
| < 0.001 | 1 | |
| identity 30% to 50%, coverage > 40% | 1 | |
| identity 50% to 70%, coverage > 40% | 2 | |
| identity 70% to 90%, coverage > 40% | 3 | |
| identity 90% to 99%, coverage > 40% | 4 | |
| identity > 99%, coverage > 40% | 5 | |
| 60% to 80% | 1 | |
| 80% to 100% | 2 | |
| Best hit | 2 | |
| N/A | 1 | |
| Best hit | 2 | |
| N/A | 1 | |
Confidence score of predicted gene ontology terms
| Evaluation method | Criteria | Points |
|---|---|---|
| 0.001 | 1 | |
| identity 30% to 50%, coverage > 40% | 1 | |
| identity 50% to 70%, coverage > 40% | 2 | |
| identity 70% to 90%, coverage > 40% | 3 | |
| identity 90% to 100%, coverage > 40% | 4 | |
| 60% to 80% | 1 | |
| 80% to 100% | 2 | |
| EXP, IDA | 3 | |
| IPI, IMP, IGI, IEP, ISO, TAS | 2 | |
| ISS, ISA, ISM, IGC, IBA, IBD, IKR, IRD, RCA, NAS, IC, IEA | 1 | |
| Associated with no less than three hits | 2 | |
EXP: inferred from experiment; GO: Gene Ontology; IBA: inferred from biological aspect of ancestor; IBD: inferred from biological aspect of descendant; IC: inferred by curator; IDA: inferred from direct assay; IEA: inferred from electronic annotation; IEP: inferred from expression pattern; IGC: inferred from genomic context; IGI: inferred from genetic interaction; IKR: inferred from key residues; IMP: inferred from mutant phenotype; IPI: inferred from physical interaction; IRD: inferred from rapid divergence; ISA: inferred from sequence alignment; ISM: inferred from sequence model; ISO: inferred from sequence orthology; ISS: inferred from sequence or structural similarity; NAS: non-traceable author statement; RCA: inferred from reviewed computational analysis; TAS: traceable author statement.
Confidence score of predicted Enzyme Commission numbers
| Evaluation method | Criteria | Points |
|---|---|---|
| ≥ 6 and < 8 | 1 | |
| ≥ 8 and < 10 | 2 | |
| ≥ 10 | 3 | |
| If the EC number is assigned for at least three different Swiss-Prot hits | 1 | |
| If the EC number agrees with the prediction of Ezypred | 2 | |
| Low confidence prediction | 2 | |
| 0.6 to 0.7 | 2.5 | |
| 0.7 to 0.8 | 3 | |
| 0.8 to 0.9 | 3.5 | |
| 0.9 to 1 | 4 | |
Evaluation of homology modeling templates
| Evaluation method | Criteria | Points |
|---|---|---|
| 20% to 40% | 1 | |
| 40% to 60% | 2 | |
| 60% to 80% | 3 | |
| 80% to 90% | 4 | |
| 90% to100% | 5 | |
| 80% to 85% | 1 | |
| 85% to 90% | 2 | |
| 90% to 99% | 3 | |
| 99% to 99.99% | 4 | |
| 99.99% to 100% | 5 | |
| 1e-6 to 1e-2 | 1 | |
| 1e-6 to 1e-18 | 2 | |
| 1e-18 to 1e-54 | 3 | |
| < 1e-54 | 4 | |
| Predicted by two methods | 1 | |
| Predicted by three methods | 2 | |