| Literature DB >> 28207401 |
Letu Qingge, Xiaowen Liu, Farong Zhong, Binhai Zhu.
Abstract
In mass spectrometry-based de novo protein sequencing, it is hard to complete the sequence of the whole protein. Motivated by this, we study the (one-sided) problem of filling a protein scaffold S with some missing amino acids, given a sequence of contigs none of which is allowed to be altered, with respect to a complete reference protein P of length n , such that the BLOSUM62 score between P and the filled sequence S' is maximized. We show that this problem is polynomial-time solvable in O(n26) time. We also consider the case when the contigs are not of high quality and they are concatenated into an (incomplete) sequence I , where the missing amino acids can be inserted anywhere in I to obtain I' , such that the BLOSUM62 score between P and I' is maximized. We show that this problem is polynomial-time solvable in O(n22) time. Due to the high time complexity, both of these algorithms are impractical, we hence present several algorithms based on greedy and local search, trying to solve the problems practically. The empirical results, based on some antibody and mammalian proteins, show that the algorithms can fill protein scaffolds with high quality, provided that a good pair of scaffold and reference are given.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28207401 PMCID: PMC5439369 DOI: 10.1109/TNB.2017.2666780
Source DB: PubMed Journal: IEEE Trans Nanobioscience ISSN: 1536-1241 Impact factor: 2.935