| Literature DB >> 31871975 |
Abstract
Based on ideas about the molecular vector machine of proteins [1], a database of protein pentafragments has been created and algorithms have been proposed for predicting the secondary structure of proteins according to their primary structure and for designing the primary protein structure for a given secondary structure that it takes on. A comprehensive software suite (Predicto @ Designer) has been developed using the pentafragments database and the said algorithms. For the proteins used to create the pentafragments database, a high accuracy (close to 100%) in predicting the secondary protein structure as well as good prospects for its use for designing secondary structures of proteins have been demonstrated.Entities:
Keywords: Database of protein pentafragments; Molecular vector machine; Software for predicting and design the secondary protein structure
Year: 2019 PMID: 31871975 PMCID: PMC6911939 DOI: 10.1016/j.dib.2019.104815
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Predicting secondary myoglobin structure without correction (A) and with correction based on the replacement of amino acids in pentafragments (B).
| A. Without correction | B. Correction based on the replacement of amino acids | |
|---|---|---|
| Pig | Alligator | Alligator |
| 141 XXX D Asp 1111111111 | 142 XXX D Asp 1111111111 | 142 XXX D Asp 1111111111 |
| 140 XXX N Asn 1111111111 | 141 XXX N Asn 1111111111 | 141 XXX N Asn 1111111111 |
| 139 XXX R Arg 1111111111 | 140 XXX R Arg 1111111111 | 140 XXX R Arg 1111111111 |
| 138 XXX F Phe 1111111111 | 139 XXX F Phe 1111111111 | 139 XXX F Phe 1111111111 |
| 137 XXX L Leu 1111111111 | 138 XXX L Leu 1111111121 | 138 XXX L Leu 1111111111 |
| 136 XXX E Glu 1111111111 | 137 XXX E Glu | 137 XXX E Glu 1111111111 |
| 135 XXX L Leu 1111111111 | 136 XXX L Leu | 136 XXX L Leu 1111111111 |
| 134 XXX A Ala 1111111111 | 135 XXX A Ala | 135 XXX A Ala 1111111111 |
| 133 XXX K Lys 1111111111 | 134 XXX K Lys | 134 XXX K Lys 1111111111 |
| 131 XXX M Met 1111111101 | 132 XXX M Met | 132 XXX M Met 1111111101 |
| 130 XXX A Ala 1111110101 | 131 XXX A Ala | 131 XXX A Ala 1111110101 |
| 126 XXX D Asp 0101011000 | 127 XXX D Asp | 127 XXX D Asp 0101011030 |
| 125 XXX A Ala 0101100000 | 126 XXX A Ala | 126 XXX A Ala 0101103000 |
| 124 XXX G Gly 0110000010 | 125 XXX G Gly | 125 XXX G Gly 0110300000 |
| 123 XXX F Phe 1000001011 | 124 XXX F Phe | 124 XXX F Phe 1030000012 |
| 122 XXX D Asp 0000101110 | 123 XXX D Asp | 123 XXX D Asp 3000001210 |
| 120 XXX P Pro 1011101011 | 121 XXX P Pro 0000000000 | 121 XXX P Pro 0012101010 |
| 118 XXX K Lys 1010111111 | 119 XXX K Lys 0000000000 | 119 XXX K Lys 1010101111 ARG |
| 114 XXX V Val 1111111111 | 115 XXX V Val | 115 XXX V Val 1111111111 |
Bold indicate substitutions of amino acids in the polypeptide chain at which the prediction in column B occurs. The substituted amino acids used are shown in this column to the right.
Fig. 1Fragment 114–141 of the polypeptide chain of porcine myoglobin [4].
Individual stages of how pentafragments to be inserted in the database are obtained.
| A | B | C | D |
|---|---|---|---|
| Fragment from a text file | Fragment from an inverted text file (inv_1MWD inverted text file.txt) | Examples of pentafragments obtained by cutting | Example of simplified file |
Notations of bonds in text PDB-files (A), types of H-bonds (B), their coding with Boolean pairs of variables (C). an example of pentafragment (D) and its 10-digit description (E).
| А. Notations in text PDB-files | B. Types of H-bonds | C. Coding | D. An example of pentafragment and its coding | |
|---|---|---|---|---|
| X1X2 Abc | 00 | |||
| X1X2 Abc O–Y1Y2 Deh N | 01 | |||
| X1X2 Abc N–Y3Y4 Ehf O | 10 | |||
| X1X2 Abc O–Y1Y2 Deh N | 11 | |||
In cell D, the selected first two lines correspond to the highlighted designation 01 in cell E.
Coding of types of H-Bonds in the form of binary combinations for an improved database of pentafragments.
| № | Types of H-bonds | Binary Combinations | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Bonds | Code | Bonds | Code | Bonds | Code | Bonds | Code | ||
| α-helix | |||||||||
| 1. | NiH … Oi-4 | 0 | 00 | 0 | 01 | 1 | 10 | 1 | 11 |
| Inverted α-helix | |||||||||
| 2. | NiH … Оi+4 | 0 | 00 | 1 | 70 | 0 | 07 | 1 | 77 |
| helix 310 | |||||||||
| 3. | NiH … Oi-3 | 0 | 00 | 0 | 03 | 1 | 30 | 1 | 33 |
| Inverted helix 310 | |||||||||
| 4. | NiH … Оi+3 | 0 | 00 | 1 | 60 | 0 | 06 | 1 | 66 |
| Combination of α-helix and helix 310 | |||||||||
| 5. | NiH … Oi-4 … Oi-3 | 0 | 00 | 0 | 02 | 2 | 20 | 2 | 22 |
| Combination of Inverted α-helix and helix 310 | |||||||||
| 6. | NiH … Oi+4 … Oi+3 | 0 | 00 | 2 | 40 | 0 | 04 | 2 | 44 |
Pentafragment database structure.
| Folder numbering (Database.JPG) | Pentafragment files. | Pentafragments of the file 37 | |||
|---|---|---|---|---|---|
| No. | Folder | No. | Folder | ||
| 1 | 00-XX | 20 | 30-XX | DKK | |
Formats used by the program PREDICTO @ DESIGNER.
| A | B | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| A fragment of the pig myoglobin protein (1MWC file) in.dbk format | Recording the result of the program in.dbkx format | |||||||||
| 15 | XXX | G | GLY | bbbbbbbbbb | 15 | XXX | G | GLY | 1112121011 | 3K9Z 1DMR |
| 14 | XXX | W | TRP | bbbbbbbbbb | 14 | XXX | W | TRP | 1212101111 | 3K9Z 1DMR |
| 13 | XXX | V | VAL | bbbbbbbbbb | 13 | XXX | V | VAL | 1210111111 | 3K9Z 1MWC |
| 12 | XXX | N | ASN | bbbbbbbbbb | 12 | XXX | N | ASN | 1011111111 | 3K9Z 1MWC |
| 11 | XXX | L | LEU | bbbbbbbbbb | 11 | XXX | L | LEU | 1111111111 | 3K9Z 1MWC |
| 10 | XXX | V | VAL | bbbbbbbbbb | 10 | XXX | V | VAL | 1111111101 | 3K9Z 1MWC |
| 9 | XXX | L | LEU | bbbbbbbbbb | 9 | XXX | L | LEU | 1111110101 | 3K9Z 1MWC |
| 8 | XXX | Q | GLN | bbbbbbbbbb | 8 | XXX | Q | GLN | 1111010101 | 3K9Z 1DMR |
| 7 | XXX | W | TRP | bbbbbbbbbb | 7 | XXX | W | TRP | 1101010101 | 3K9Z 1DMR |
| 6 | XXX | E | GLU | bbbbbbbbbb | 6 | XXX | E | GLU | 0101010100 | 3K9Z 1DMR |
| 5 | XXX | G | GLY | bbbbbbbbbb | 5 | XXX | G | GLY | 0101010000 | 3K9Z 1DMR |
| 4 | XXX | D | ASP | bbbbbbbbbb | 4 | XXX | D | ASP | bbbbbbbbbb | |
| 3 | XXX | S | SER | bbbbbbbbbb | 3 | XXX | S | SER | bbbbbbbbbb | |
| 2 | XXX | L | LEU | bbbbbbbbbb | 2 | XXX | L | LEU | bbbbbbbbbb | |
| 1 | XXX | G | GLY | bbbbbbbbbb | 1 | XXX | G | GLY | bbbbbbbbbb | |
| 0 | ATG | M | MET | bbbbbbbbbb | 0 | ATG | M | MET | bbbbbbbbbb | |
Fig. 2The startup screen and workspaces of the PREDICTO @ DESIGNER program. a – program startup screen; b – PREDICTO section workspace; c – DESIGNER section workspace.
Specifications Table
| Subject | biology |
| Specific subject area | database of protein pentafragments and computersoftware |
| Type of data | Table |
| How data were acquired | Computer software |
| Data format | Raw and Analysed |
| Parameters for data collection | The primary structure of the protein |
| Description of data collection | By using a database and computer programs |
| Data source location | Source of protein isolation (animal or plant species) |
| Data accessibility | Data are with this article |
| Related research article | Vladimir Karasev, BioSystems 180 (2019) 7–18, |
A database of protein pentafragments, sorted according to a binary description of their structure. A computer program Predicto @ Designer using this database and algorithm has been written. This program may be useful in the problems of predicting and designing of protein structure. The obtained data can contribute to the development of a database and computer software. |