BACKGROUND: Protein structure data in Protein Data Bank (PDB) are widely used in studies of protein function and evolution and in protein structure prediction. However, there are two main barriers in large-scale usage of PDB data: 1) PDB data are highly redundant in terms of sequence and structure similarity; and 2) many PDB files have issues due to inconsistency of data and standards as well as missing residues, so that automated retrieval and analysis are often difficult. DESCRIPTION: To address these issues, we have created MUFOLD-DB http://mufold.org/mufolddb.php, a web-based database, to collect and process the weekly PDB files thereby providing users with non-redundant, cleaned and partially-predicted structure data. For each of the non-redundant sequences, we annotate the SCOP domain classification and predict structures of missing regions by loop modelling. In addition, evolutional information, secondary structure, disorder region, and processed three-dimensional structure are computed and visualized to help users better understand the protein. CONCLUSIONS: MUFOLD-DB integrates processed PDB sequence and structure data and multiple computational results, provides a friendly interface for users to retrieve, browse and download these data, and offers several useful functionalities to facilitate users' data operation.
BACKGROUND: Protein structure data in Protein Data Bank (PDB) are widely used in studies of protein function and evolution and in protein structure prediction. However, there are two main barriers in large-scale usage of PDB data: 1) PDB data are highly redundant in terms of sequence and structure similarity; and 2) many PDB files have issues due to inconsistency of data and standards as well as missing residues, so that automated retrieval and analysis are often difficult. DESCRIPTION: To address these issues, we have created MUFOLD-DB http://mufold.org/mufolddb.php, a web-based database, to collect and process the weekly PDB files thereby providing users with non-redundant, cleaned and partially-predicted structure data. For each of the non-redundant sequences, we annotate the SCOP domain classification and predict structures of missing regions by loop modelling. In addition, evolutional information, secondary structure, disorder region, and processed three-dimensional structure are computed and visualized to help users better understand the protein. CONCLUSIONS: MUFOLD-DB integrates processed PDB sequence and structure data and multiple computational results, provides a friendly interface for users to retrieve, browse and download these data, and offers several useful functionalities to facilitate users' data operation.
Authors: Mary Qu Yang; Kenji Yoshigoe; William Yang; Weida Tong; Xiang Qin; A Dunker; Zhongxue Chen; Hamid R Arbania; Jun S Liu; Andrzej Niemierko; Jack Y Yang Journal: BMC Genomics Date: 2014-12-16 Impact factor: 3.969
Authors: Gareth S A Wright; Tatiana F Watanabe; Kangsa Amporndanai; Steven S Plotkin; Neil R Cashman; Svetlana V Antonyuk; S Samar Hasnain Journal: iScience Date: 2020-05-15