| Literature DB >> 27930687 |
Jayanta Kumar Das1, Provas Das2, Korak Kumar Ray3, Pabitra Pal Choudhury1, Siddhartha Sankar Jana2.
Abstract
Comparison of amino acid sequence similarity is the fundamental concept behind the protein phylogenetic tree formation. By virtue of this method, we can explain the evolutionary relationships, but further explanations are not possible unless sequences are studied through the chemical nature of individual amino acids. Here we develop a new methodology to characterize the protein sequences on the basis of the chemical nature of the amino acids. We design various algorithms for studying the variation of chemical group transitions and various chemical group combinations as patterns in the protein sequences. The amino acid sequence of conventional myosin II head domain of 14 family members are taken to illustrate this new approach. We find two blocks of maximum length 6 aa as 'FPKATD' and 'Y/FTNEKL' without repeating the same chemical nature and one block of maximum length 20 aa with the repetition of chemical nature which are common among all 14 members. We also check commonality with another motor protein sub-family kinesin, KIF1A. Based on our analysis we find a common block of length 8 aa both in myosin II and KIF1A. This motif is located in the neck linker region which could be responsible for the generation of mechanical force, enabling us to find the unique blocks which remain chemically conserved across the family. We also validate our methodology with different protein families such as MYOI, Myosin light chain kinase (MLCK) and Rho-associated protein kinase (ROCK), Na+/K+-ATPase and Ca2+-ATPase. Altogether, our studies provide a new methodology for investigating the conserved amino acids' pattern in different proteins.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27930687 PMCID: PMC5145171 DOI: 10.1371/journal.pone.0167651
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Amino acid categorization based on their chemical nature.
| Chemical Nature of the Group | Amino Acids |
|---|---|
| Acidic | Aspartate (D), Glutamate (E) |
| Basic | Arginine (R), Histidine (H), Lysine (K) |
| Aromatic side chain | Tyrosine (Y), Phenylalanine (F), Tryptophan (W) |
| Aliphatic side chain | Isoleucine (I), Leucine (L), Valine (V), Alanine (A), Glycine (G) |
| Cyclic | Proline (P) |
| Sulfur containing | Methionine (M), Cysteine (C) |
| Hydroxyl containing | Serine (S), Threonine (T) |
| Acidic amide | Glutamine (Q), Asparagine (N) |
Mapping Twenty standard amino acids to eight chemical groups of amino acids.
| Amino Acids | D | E | R | H | K | Y | F | W | I | L | V | A | G | P | S | T | M | C | Q | N |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |||||||||||||
| G1 | G2 | G3 | G4 | G5 | G6 | G7 | G8 | |||||||||||||
Fig 1Conventional Myosin II Family as previously described in [16–18].
Based on the heavy chain amino acid sequence similarities myosin II is broadly divided into two groups; (a) non-muscle myosin group and (b) muscle myosin group as detailed in Table 3.
Details of conventional myosin II family members in human.
| Seq. Nos. | Gene Name | Length of the Head Domain (aa) | Accession Number | Protein Names and Remarks |
|---|---|---|---|---|
| 1 | MYH14 | 860 | Q7Z406 | Myosin-14(NMHC II-C) |
| 2 | MYH 11 | 843 | P35749 | Myosin-11 (SM MyHc) |
| 3 | MYH9 | 836 | P35579 | Myosin-9 (NMHC II-A) |
| 4 | MYH10 | 843 | P35580 | Myosin-10 (NMHC II-B) |
| 5 | MYH15 | 850 | Q9Y2K3 | Myosion-15 |
| 6 | MYH7B | 845 | A7E2Y1 | Myosion-7B (Cardiac |
| 7 | MYH2 | 844 | Q9UKX2 | Myosin-2 (IIa MYH2) |
| 8 | MYH4 | 842 | Q9Y623 | Myosin-4 (IIb MYH4) |
| 9 | MYH8 | 841 | P13535 | Myosin-8 (Perinatal MyHC) |
| 10 | MYH1 | 842 | P12882 | Myosin-1 (Ix/d MYH1) |
| 11 | MYH3 | 829 | P11055 | Myosin-3 (Embryonic MyHC) |
| 12 | MYH13 | 842 | Q9UKX3 | Myosin-13 (Extraocular MyHC) |
| 13 | MYH7 | 838 | P12883 | Mysion-7 (Cardiac |
| 14 | MYH6 | 840 | P13533 | Mysion-6 (Cardiac |
This myosin family is having two sub groups: a) Seq.nos. 1-4 is the non-muscle myosin group, and b) Seq nos. 5-14 is the muscle myosin group. (aa-amino acid)
Details of unconventional myosin sub-family MYOI class members in human.
| Seq. Nos. | Gene Name | Length of the Head Domain (aa) | Accession Number | Protein Names and Remarks |
|---|---|---|---|---|
| 1 | MYO1A | 694 | Q9UBC5 | Unconventional myosin-Ia |
| 2 | MYO1B | 701 | O43795 | Unconventional myosin-Ib |
| 3 | MYO1D | 695 | O94832 | Unconventional myosin-Id |
| 4 | MYO1G | 707 | B0I1T2 | Unconventional myosin-Ig |
| 5 | MYO1C | 731 | O00159 | Unconventional myosin-Ic |
| 6 | MYO1H | 690 | Q8N1T3 | Unconventional myosin-Ih |
| 7 | MYO1E | 692 | Q12965 | Unconventional myosin-Ie |
| 8 | MYO1F | 690 | O00160 | Unconventional myosin-If |
Details of kinesin sub-family KIF 1A class members in human.
| Seq. Nos. | Gene Name | Length of the Head Domain (aa) | Accession Number | Protein Names and Remarks |
|---|---|---|---|---|
| 1 | KIF 1A | 365 | Q12756 | Kinesin-like protein KIF1A |
| 2 | KIF 1B | 364 | O60333 | Kinesin-like protein KIF1B |
| 3 | KIF 1C | 358 | O43896 | Kinesin-like protein KIF1C |
Details of MLCK protein family members in human.
| Seq. Nos. | Gene Name | Length (aa) | Accession Number | Protein Names and Remarks |
|---|---|---|---|---|
| 1 | MYLK1 | 1914 | Q15746 | Myosin light chain kinase 1 |
| 2 | MYLK2 | 596 | Q9H1R3 | Myosin light chain kinase 2 |
| 3 | MYLK3 | 819 | Q32MK0 | Myosin light chain kinase 3 |
| 4 | MYLK4 | 388 | Q86YV6 | Myosin light chain kinase 4 |
Details of Rho-associated protein kinase family members in human.
| Seq. Nos. | Gene Name | Length (aa) | Accession Number | Protein Names and Remarks |
|---|---|---|---|---|
| 1 | ROCK1 | 1354 | Q13464 | Rho-associated protein kinase 1 |
| 2 | ROCK1 | 1388 | O75116 | Rho-associated protein kinase 2 |
Details of Na+/K+-ATPase family members in human.
| Seq. Nos. | Gene Name | Length | Accession Number | Protein Names and Remarks |
|---|---|---|---|---|
| 1 | ATP1A1 | 1023 | P05023 | Sodium/potassium-transporting ATPase subunit alpha-1 |
| 2 | ATP1A2 | 1020 | P50993 | Sodium/potassium-transporting ATPase subunit alpha-2 |
| 3 | ATP1A3 | 1013 | P13637 | Sodium/potassium-transporting ATPase subunit alpha-3 |
| 4 | ATP1A4 | 1029 | Q13733 | Sodium/potassium-transporting ATPase subunit alpha-4 |
| 5 | ATP1B1 | 303 | P05026 | Sodium/potassium-transporting ATPase subunit beta-1 |
| 6 | ATP1B2 | 290 | P14415 | Sodium/potassium-transporting ATPase subunit beta-2 |
| 7 | ATP1B3 | 279 | P54709 | Sodium/potassium-transporting ATPase subunit beta-3 |
| 8 | ATP1B4 | 357 | B7ZKV8 | Sodium/potassium-transporting ATPase subunit beta-4 |
Details of Ca2+-ATPase family members in human.
| Seq. Nos. | Gene Name | Length (aa) | Accession Number | Protein Names and Remarks |
|---|---|---|---|---|
| 1 | ATP2A1 | 1001 | O14983 | Sarcoplasmic/endoplasmic reticulum calcium ATPase 1 |
| 2 | ATP2A2 | 1042 | P16615 | Sarcoplasmic/endoplasmic reticulum calcium ATPase 2 |
| 3 | ATP2A3 | 1043 | Q93084 | Sarcoplasmic/endoplasmic reticulum calcium ATPase 3 |
| 4 | ATP2B1 | 1258 | P20020 | Plasma membrane calcium-transporting ATPase 1 |
| 5 | ATP2B2 | 1243 | Q01814 | Plasma membrane calcium-transporting ATPase 2 |
| 6 | ATP2B3 | 1220 | Q16720 | Plasma membrane calcium-transporting ATPase 3 |
| 7 | ATP2B4 | 1241 | P23634 | Plasma membrane calcium-transporting ATPase 4 |
Percentage identity matrix of every pair sequences of myosin II head domain by using the site www.ebi.ac.uk/Tools/msa/clustalw2/.
| 1 | 100.0 | 73.21 | 76.47 | 78.57 | 45.47 | 49.04 | 47.29 | 46.56 | 47.83 | 47.17 | 48.61 | 48.25 | 47.34 | 46.02 |
| 2 | 73.21 | 100.0 | 84.33 | 83.51 | 46.86 | 49.88 | 49.57 | 49.57 | 50.37 | 49.82 | 50.31 | 50.06 | 49.88 | 48.66 |
| 3 | 76.47 | 84.33 | 100.0 | 85.77 | 48.01 | 49.08 | 48.77 | 48.77 | 49.20 | 48.89 | 50.37 | 50.12 | 48.95 | 48.21 |
| 4 | 78.57 | 83.51 | 85.77 | 100.0 | 47.72 | 49.39 | 49.08 | 48.35 | 49.63 | 48.71 | 50.55 | 49.57 | 50.12 | 47.80 |
| 5 | 45.47 | 46.86 | 48.01 | 47.72 | 100.0 | 68.95 | 63.15 | 64.14 | 63.34 | 62.70 | 66.18 | 66.27 | 65.06 | 64.30 |
| 6 | 49.04 | 49.88 | 49.08 | 49.39 | 68.95 | 100.0 | 65.76 | 65.91 | 66.99 | 65.79 | 69.26 | 68.38 | 66.75 | 65.55 |
| 7 | 47.29 | 49.57 | 48.77 | 49.08 | 63.15 | 65.76 | 100.0 | 92.28 | 93.92 | 93.59 | 79.47 | 79.52 | 85.94 | 85.75 |
| 8 | 46.56 | 49.57 | 48.77 | 48.35 | 64.14 | 65.91 | 92.28 | 100.0 | 92.36 | 95.49 | 79.95 | 80.45 | 85.08 | 87.02 |
| 9 | 47.83 | 50.37 | 49.20 | 49.63 | 63.34 | 66.99 | 93.92 | 92.36 | 100.0 | 93.32 | 80.62 | 80.55 | 86.17 | 86.16 |
| 10 | 47.17 | 49.82 | 48.89 | 48.71 | 62.70 | 65.79 | 93.59 | 95.49 | 93.32 | 100.0 | 80.31 | 80.21 | 84.84 | 85.95 |
| 11 | 48.61 | 50.31 | 50.37 | 50.55 | 66.18 | 69.26 | 79.47 | 79.95 | 80.62 | 80.31 | 100.0 | 90.69 | 79.07 | 79.19 |
| 12 | 48.25 | 50.06 | 50.12 | 49.57 | 66.27 | 68.38 | 79.52 | 80.45 | 80.55 | 80.21 | 90.69 | 100.0 | 79.24 | 79.83 |
| 13 | 47.34 | 49.88 | 48.95 | 50.12 | 65.06 | 66.75 | 85.94 | 85.08 | 86.17 | 84.84 | 79.07 | 79.24 | 100.0 | 82.46 |
| 14 | 46.02 | 48.66 | 48.21 | 47.80 | 64.30 | 65.55 | 85.75 | 87.02 | 86.16 | 85.95 | 79.19 | 79.83 | 82.46 | 100.0 |
Fig 2A rooted phylogenetic tree based on percent sequence similarity analysis of myosin heavy chain II head domain of humans.
List the total number of amino acids from a particular chemical group followed by the order pairs count except self order pair for each sequence of myosin head domain.
| Seq. Nos. | #G1-#X1 | #G2-#X2 | #G3-#X3 | #G4-#X4 | #G5-#X5 | #G6-#X6 | #G7-#X7 | #G8-#X8 |
|---|---|---|---|---|---|---|---|---|
| 1 | 102-90 | 129-114 | 76-71 | 311-192 | 48-43 | 31-31 | 77-70 | 85-81 |
| 2 | 106-89 | 129-114 | 81-75 | 290-191 | 27-24 | 38-37 | 83-74 | 88-83 |
| 3 | 106-91 | 132-118 | 81-75 | 290-184 | 30-27 | 38-36 | 67-64 | 91-85 |
| 4 | 107-93 | 138-121 | 84-78 | 285-191 | 29-26 | 38-37 | 72-68 | 89-84 |
| 5 | 103-88 | 127-108 | 92-83 | 298-199 | 25-23 | 42-41 | 82-75 | 80-74 |
| 6 | 99-88 | 129-113 | 88-82 | 292-186 | 34-31 | 37-34 | 81-78 | 84-78 |
| 7 | 102-89 | 130-114 | 89-83 | 290-192 | 29-27 | 33-31 | 93-83 | 77-72 |
| 8 | 102-89 | 129-112 | 91-85 | 281-189 | 31-29 | 38-35 | 93-85 | 76-69 |
| 9 | 100-87 | 129-113 | 91-85 | 280-192 | 30-28 | 36-34 | 98-86 | 76-70 |
| 10 | 101-88 | 128-112 | 90-84 | 283-190 | 30-28 | 38-35 | 93-83 | 78-70 |
| 11 | 103-89 | 134-116 | 90-85 | 274-187 | 29-27 | 35-34 | 98-87 | 75-68 |
| 12 | 102-90 | 126-109 | 91-85 | 272-180 | 31-29 | 45-41 | 89-78 | 85-76 |
| 13 | 102-88 | 131-116 | 89-83 | 285-187 | 32-30 | 36-35 | 82-74 | 80-72 |
| 14 | 103-88 | 131-115 | 89-82 | 284-190 | 27-25 | 37-36 | 83-76 | 85-77 |
Here, #Gi is the number count of amino acids from Gi chemical group and #Xi is the the number count of order pairs except the pair (Gi,Gi).
Distinct ranges of every branch of the phylogenetic tree obtained from Table 11, percentage (%) identity of each branching point from Fig 2.
| Percent identity of original sequence (%) | Hitting Groups | Distinct Range in Respective Sub-Tree | Sequences Comparison | ||||
|---|---|---|---|---|---|---|---|
| From | From | Left sub-tree | Right sub-tree | ||||
| 40% | 46% | T2 | a) 88-92 | T1 | a) 76-84 | (1-4) Vs. (5-14) | |
| 64% | 73% | G3, G4, | T3 | a) 48 | T4 | a) 27-30 | (1) Vs. (2-4) |
| 76% | 83% | G2, G5, | T5 | a) 83 | T6 | a) 67-72 | (2) Vs. (3-4) |
| 80% | 85% | All, | T7 | a) 290 | T8 | a) 285 | (3) Vs. (4) |
| - | 65% | G2, G3, G4, | T9 | a) 25 | T10 | a) 27-34 | (5) Vs. (6-14) |
| - | 67% | G1, | T11 | a) 34 | T10 | a) 27-29 | (6) Vs. (7-14) |
| 80% | 80% | T13 | a) 89-93 | T14 | a) 82-83 | (7-12) Vs. (13-14) | |
| 82% | 82% | T15 | a) 129-134 | T16 | a) 126 | (7-11) Vs. (12) | |
| 85% | 85% | T17 | a) 128-130 | T18 | a) 112-114 | (7-10) Vs. (11) | |
| 93% | 93% | G4, | T19 | a) 32 | T20 | a) 27 | (13) Vs. (14) |
| 93-94% | 93-94% | T21-T22-T23-T24 | (7-10) | ||||
Conserved chemical patterns in myosin II all members and comparison between Non-Muscle Vs. Muscle Group.
| Length (Number Count) | Pattern of Length 5 aa and 6 aa | Existence of Pattern in Non-Muscle Group (Seq. Nos. 1-4) | Existence of Pattern in Muscle Group (Seq. Nos. 5-14) |
|---|---|---|---|
| 6 (2) | 352471 378124 | Yes | Yes |
| 5 (8) | 37812 52471 74281 78124 35247 48532 43827 64837 | Yes | Yes |
| 6 (11) | 524361 361428 286154 731846 184765 476518 651874 847651 628435 438276 827634 | Yes | No |
| 5 (49) | 16324 24361 36142 41632 12734 24731 27341 37241 47312 73412 24831 38214 82314 84321 45821 18426 61428 42718 48721 72184 28615 34571 54371 31846 48731 73184 84173 47651 14856 61548 85641 86154 51874 18476 76481 65187 76518 52436 28435 35824 43582 27634 46327 63274 62843 87243 38276 82763 84765 | Yes | No |
| 6 (3) | 813472 635247 748532 | No | Yes |
| 5 (9) | 13472 24318 43182 81347 63524 24678 67842 82467 74853 | No | Yes |
Given an input of pattern length (L), all possible combination of pattern of length L using the numerical number 1-8 are generated without repetition using Algorithm 2.
Two unique patterns of length 6 aa and one pattern of length 20 aa common to myosin 14 members, their position and corresponding amino acids.
| Seq. Nos. | Pattern of length 6 aa | Pattern of length 20 aa | |
|---|---|---|---|
| 352471 | 378124 | 34842742818772342342 | |
| Position-Sequence | Position-Sequence | Position-Sequence | |
| 1 | 560-FPKATD | 497-YTNEKL | 248-FGNAKTVKNDNSSRFGKFIR |
| 2 | 541-FPKATD | 478-YTNEKL | 228-FGNAKTVKNDNSSRFGKFIR |
| 3 | 534-FPKATD | 471-YTNEKL | 221-FGNAKTVKNDNSSRFGKFIR |
| 4 | 541-FPKATD | 478-YTNEKL | 228-FGNAKTVKNDNSSRFGKFIR |
| 5 | 552-FPKATD | 492-FTNEKL | 243-FGNAKTLRNDNSSRFGKFIR |
| 6 | 543-FPKASD | 483-FTNEKL | 233-FGNAKTLRNDNSSRFGKFIR |
| 7 | 543-FPKATD | 483-FTNEKL | 233-FGNAKTVRNDNSSRFGKFIR |
| 8 | 543-FPKATD | 483-FTNEKL | 233-FGNAKTVRNDNSSRFGKFIR |
| 9 | 543-FPKATD | 483-FTNEKL | 233-FGNAKTVRNDNSSRFGKFIR |
| 10 | 543-FPKATD | 483-FTNEKL | 233-FGNAKTVRNDNSSRFGKFIR |
| 11 | 541-FPKATD | 481-FTNEKL | 231-FGNAKTVRNDNSSRFGKFIR |
| 12 | 542-FPKATD | 482-FTNEKL | 232-FGNAKTVRNDNSSRFGKFIR |
| 13 | 540-FPKATD | 480-FTNEKL | 230-FGNAKTVRNDNSSRFGKFIR |
| 14 | 541-FPKATD | 481-FTNEKL | 231-FGNAKTVRNDNSSRFGKFIR |
First two patterns of length 6 aa are without repeating of chemical groups (using Algorithm 2) and last pattern of length 20 aa are with the repeating of chemical groups (using Algorithm 3).
Specific patterns of the amino acids and their location into different sub-domains of the myosin II head domain.
| Seq. Nos. | Length (aa) | Pattern | ATP Domain | Switch-1 | Switch-2 | Actin Domain |
|---|---|---|---|---|---|---|
| 5 | 74281 | Yes | Yes | |||
| 5 | 48532 | Yes | ||||
| 6 | 651874, 847651, 476518 | Yes | ||||
| 5 | 16324, 41632, 24731, 37241, 47312 | |||||
| 5 | 63274 | Yes | ||||
| 6 | 813472, 748532 | Yes | ||||
| 5 | 74853, 81347, 13472 | |||||
| 5 | 24678, 67824 | Yes |
Conserved chemical patterns in MYOI family members, their position and original amino acids sequences.
| Seq. Nos. | Pattern of length 10 aa and 9 aa | Patterns of length 6 aa | ||||
|---|---|---|---|---|---|---|
| 444143431 | 424224434 | 344127244 | 64258 | 74853 | ||
| Pos.-Seq. | Pos.-Seq. | Pos.-Seq. | Pos.-Seq. | Pos.-Seq. | Pos.-Seq. | |
| 1 | 141-VLEAFGNAKT | 377-GVLDIYGFE | 618-VRVRRAGYA | 181-YLLEKSRLV | 588-CIKPN | 47-SVNPY |
| 2 | 148-VLEAFGNAKT | 384-GVLDIYGFE | 625-VRVRRAGYA | 188-YLLEKSRVV | 595-CIKPN | 54-SVNPY |
| 3 | 144-VLEAFGNAKT | 380-GVLDIYGFE | 619-VRVRRAGFA | 184-YLLEKSRVI | 589-CIKPN | 48-SVNPY |
| 4 | 144-VLEAFGNART | 392-GVLDIYGFE | 631-VRVRRAGFA | 184-YLLEKSRVL | 601-CIKPN | 48-SVNPY |
| 5 | 180-VLEAFGNAKT | 418-GLLDIYGFE | 655-LRVRRAGFA | 220-YLLEKSRVV | 625-CIKPN | 86-SVNPY |
| 6 | 145-VLEAFGNART | 388-GLLDIYGFE | 625-LRVRRAGFA | 185-YLIEKSRVV | 595-CIKPN | 51-SVNPY |
| 7 | 152-LLEAFGNAKT | 385-GVLDIYGFE | 616-IRVRRAGYA | 192-FLLEKSRVV | 586-CIKPN | 58-SVNPF |
| 8 | 150-LLEAFGNAKT | 383-GVLDIYGFE | 614-IRVRRAGFA | 190-FLLEKSRVV | 584-CIKPN | 56-SVNPF |
Common motifs Myosin II and KIF 1A class of Kinesin, their position and sequences.
| Protein | Position-Sequence | Protein | Position-Sequence |
|---|---|---|---|
| 353-QIRCNAVI | 698-QLRCNGVL | ||
| 347-QIKCNAVI | 696-QLRCNGVL | ||
| 347-QIRCNAII | 695-QLRCNGVL | ||
| 715-QLRCNGVL | 696-QLRCNGVL | ||
| 698-QLRCNGVL | 693-QLRCNGVL | ||
| 691-QLRCNGVL | 696-QLRCNGVL | ||
| 698-QLRCNGVL | 692-QLRCNGVL | ||
| 704-QLRCNGVL | 694-QLRCNGVL | ||
| 699-QLRCNGVL |
Comparison of common motifs from Myosin II with MYOI class of myosin, their position and sequences.
| Seq. Nos. | Similarity (%) | Pattern of length 20 aa |
|---|---|---|
| “34842742818772342342” | ||
| Position-Sequence | ||
| MYH14 | 248-FGNAKTVKNDNSSRFGKFIR | |
| 1 | 85 | 145-FGNAKTIRNNNSSRFGKYMD |
| 2 | 90 | 152-FGNAKTVRNDNSSRFGKYMD |
| 3 | 85 | 148-FGNAKTNRNDNSSRFGKYMD |
| 4 | 80 | 148-FGNARTNRNHNSSRFGKYMD |
| 5 | 90 | 184-FGNAKTLRNDNSSRFGKYMD |
| 6 | 90 | 149-FGNARTLRNDNSSRFGKYMD |
| 7 | 85 | 156-FGNAKTVRNNNSSRFGKYFE |
| 8 | 85 | 154-FGNAKTVRNNNSSRFGKYFE |
Here, MYH14 (Seq. No. 1, Table 14) is taken as reference sequence from Myosin II family and Seq. Nos. 1-8 are MYOI family members.
Fig 3Proposed phylogenetic tree of KIF1A and Conventional myosins.