| Literature DB >> 33999203 |
Michael Bernhofer1,2, Christian Dallago1,2, Tim Karl1, Venkata Satagopam3,4, Michael Heinzinger1,2, Maria Littmann1,2, Tobias Olenyi1, Jiajun Qiu1,5, Konstantin Schütze1, Guy Yachdav1, Haim Ashkenazy6,7, Nir Ben-Tal8, Yana Bromberg9, Tatyana Goldberg1, Laszlo Kajan10, Sean O'Donoghue11, Chris Sander12,13,14, Andrea Schafferhans1,15, Avner Schlessinger16, Gerrit Vriend17, Milot Mirdita18, Piotr Gawron3, Wei Gu3,4, Yohan Jarosz3,4, Christophe Trefois3,4, Martin Steinegger19,20, Reinhard Schneider3,4, Burkhard Rost1,21,22.
Abstract
Since 1992 PredictProtein (https://predictprotein.org) is a one-stop online resource for protein sequence analysis with its main site hosted at the Luxembourg Centre for Systems Biomedicine (LCSB) and queried monthly by over 3,000 users in 2020. PredictProtein was the first Internet server for protein predictions. It pioneered combining evolutionary information and machine learning. Given a protein sequence as input, the server outputs multiple sequence alignments, predictions of protein structure in 1D and 2D (secondary structure, solvent accessibility, transmembrane segments, disordered regions, protein flexibility, and disulfide bridges) and predictions of protein function (functional effects of sequence variation or point mutations, Gene Ontology (GO) terms, subcellular localization, and protein-, RNA-, and DNA binding). PredictProtein's infrastructure has moved to the LCSB increasing throughput; the use of MMseqs2 sequence search reduced runtime five-fold (apparently without lowering performance of prediction methods); user interface elements improved usability, and new prediction methods were added. PredictProtein recently included predictions from deep learning embeddings (GO and secondary structure) and a method for the prediction of proteins and residues binding DNA, RNA, or other proteins. PredictProtein.org aspires to provide reliable predictions to computational and experimental biologists alike. All scripts and methods are freely available for offline execution in high-throughput settings.Entities:
Mesh:
Substances:
Year: 2021 PMID: 33999203 PMCID: PMC8265159 DOI: 10.1093/nar/gkab354
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Predictions for SARS-CoV-2 Nucleoprotein (NCAP_SARS2). Underneath the interactive slider at the top: RePROF and ProtBertSec secondary structure (blue: helix; purple: strand; orange: other); Meta-Disorder intrinsically disordered regions (purple); ProNA2020 RNA-binding residues (low confidence: blue; medium confidence: purple). goPredSim transfers of GeneOntology (GO) terms based on embedding similarity (lower left: CCO; lower right: BPO & MFO). SNAP2 predicts the effect of point-mutations on function for the RNA-binding region from I84 to D98 (bottom-center; black: native residue). Link: predictprotein.org/visual_results?req_id=$1$nAmulUQY$FRPFaP8NTqLW9DzdlTG3B/.
Figure 2.Experimental and predicted RNA-binding residues for NCAP2_SARS2. Predicted (via ProNA2020, in cyan, panels A and C) and observed (within 5Å, in magenta, panels B and D) RNA-binding residues for the SARS-CoV-2 nucleoprotein (gray) complexed with a 10-mer ssRNA (orange), PDB structure 7ACT (61). Two-third of the predictions are correct (precision = 0.73, recall = 0.20), which is around the expected average performance reported by ProNA2020. The important sequence consecutive central strand and loop are predicted well, while several short sequence segments that are far away in sequence space but close in structure space are missed, which is expected as ProNA2020 has no notion of 3D structure, i.e., cannot identify ‘binding sites’. Panels A and B show a different orientation than panels C and D.