Bahareh Behkamal1, Mahmoud Naghibzadeh1, Andrea Pagnani2,3,4, Mohammad Reza Saberi5,6, Kamal Al Nasr7. 1. Department of Computer Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad 9177948974, Iran. 2. Department of Applied Science and Technology (DISAT), Politecnico di Torino, Torino I-10129, Italy. 3. Italian Institute for Genomic Medicine (IIGM), IRCC-Candiolo, Candiolo (TO) I-10060, Italy. 4. INFN Sezione di Torino, Torino I-10125, Italy. 5. Medicinal Chemistry Department, School of Pharmacy, Mashhad University of Medical Sciences, Mashhad 9177899191, Iran. 6. Bioinformatics Research Group, Mashhad University of Medical Sciences, Mashhad 9177899191, Iran. 7. Department of Computer Science, Tennessee State University, Nashville, TN 37209, USA.
Abstract
SUMMARY: Topology determination is one of the most important intermediate steps toward building the atomic structure of proteins from their medium-resolution cryo-electron microscopy (cryo-EM) map. The main goal in the topology determination is to identify correct matches (i.e. assignment and direction) between secondary structure elements (SSEs) (α-helices and β-sheets) detected in a protein sequence and cryo-EM density map. Despite many recent advances in molecular biology technologies, the problem remains a challenging issue. To overcome the problem, this article proposes a linear programming-based topology determination (LPTD) method to solve the secondary structure topology problem in three-dimensional geometrical space. Through modeling of the protein's sequence with the aid of extracting highly reliable features and a distance-based scoring function, the secondary structure matching problem is transformed into a complete weighted bipartite graph matching problem. Subsequently, an algorithm based on linear programming is developed as a decision-making strategy to extract the true topology (native topology) between all possible topologies. The proposed automatic framework is verified using 12 experimental and 15 simulated α-β proteins. Results demonstrate that LPTD is highly efficient and extremely fast in such a way that for 77% of cases in the dataset, the native topology has been detected in the first rank topology in <2 s. Besides, this method is able to successfully handle large complex proteins with as many as 65 SSEs. Such a large number of SSEs have never been solved with current tools/methods. AVAILABILITY AND IMPLEMENTATION: The LPTD package (source code and data) is publicly available at https://github.com/B-Behkamal/LPTD. Moreover, two test samples as well as the instruction of utilizing the graphical user interface have been provided in the shared readme file. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
SUMMARY: Topology determination is one of the most important intermediate steps toward building the atomic structure of proteins from their medium-resolution cryo-electron microscopy (cryo-EM) map. The main goal in the topology determination is to identify correct matches (i.e. assignment and direction) between secondary structure elements (SSEs) (α-helices and β-sheets) detected in a protein sequence and cryo-EM density map. Despite many recent advances in molecular biology technologies, the problem remains a challenging issue. To overcome the problem, this article proposes a linear programming-based topology determination (LPTD) method to solve the secondary structure topology problem in three-dimensional geometrical space. Through modeling of the protein's sequence with the aid of extracting highly reliable features and a distance-based scoring function, the secondary structure matching problem is transformed into a complete weighted bipartite graph matching problem. Subsequently, an algorithm based on linear programming is developed as a decision-making strategy to extract the true topology (native topology) between all possible topologies. The proposed automatic framework is verified using 12 experimental and 15 simulated α-β proteins. Results demonstrate that LPTD is highly efficient and extremely fast in such a way that for 77% of cases in the dataset, the native topology has been detected in the first rank topology in <2 s. Besides, this method is able to successfully handle large complex proteins with as many as 65 SSEs. Such a large number of SSEs have never been solved with current tools/methods. AVAILABILITY AND IMPLEMENTATION: The LPTD package (source code and data) is publicly available at https://github.com/B-Behkamal/LPTD. Moreover, two test samples as well as the instruction of utilizing the graphical user interface have been provided in the shared readme file. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Kamal Al Nasr; Desh Ranjan; Mohammad Zubair; Lin Chen; Jing He Journal: IEEE/ACM Trans Comput Biol Bioinform Date: 2014 Mar-Apr Impact factor: 3.710
Authors: Andrew W Senior; Richard Evans; John Jumper; James Kirkpatrick; Laurent Sifre; Tim Green; Chongli Qin; Augustin Žídek; Alexander W R Nelson; Alex Bridgland; Hugo Penedones; Stig Petersen; Karen Simonyan; Steve Crossan; Pushmeet Kohli; David T Jones; David Silver; Koray Kavukcuoglu; Demis Hassabis Journal: Nature Date: 2020-01-15 Impact factor: 49.962
Authors: Bahareh Behkamal; Mahmoud Naghibzadeh; Mohammad Reza Saberi; Zeinab Amiri Tehranizadeh; Andrea Pagnani; Kamal Al Nasr Journal: Biomolecules Date: 2021-11-26
Authors: Philipp Mostosi; Hermann Schindelin; Philip Kollmannsberger; Andrea Thorn Journal: Angew Chem Int Ed Engl Date: 2020-05-11 Impact factor: 16.823