| Literature DB >> 24564966 |
Morihiro Hayashida, Mayumi Kamada, Jiangning Song, Tatsuya Akutsu.
Abstract
BACKGROUND: To uncover molecular functions and networks in biological cellular systems, it is important to dissect interactions between proteins and RNAs. Many studies have been performed to investigate and analyze interactions between protein amino acid residues and RNA bases. In terms of interactions between residues in proteins, it is generally accepted that an amino acid residue at interacting sites has coevolved together with the partner residue in order to keep the interaction between residues in proteins. Based on this hypothesis, in our previous study to identify residue-residue contact pairs in interacting proteins, we made calculations of mutual information (M I) between amino acid residues from some multiple sequence alignment of homologous proteins, and combined it with a discriminative random field (DRF) approach, which is a special type of conditional random fields (CRFs) and has been proved useful for the purpose of extracting distinguishing areas from a photograph in the image processing field. Recently, the evolutionary correlation of interactions between residues and DNA bases has also been found in certain transcription factors and the DNA-binding sites.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24564966 PMCID: PMC3866258 DOI: 10.1186/1752-0509-7-S2-S15
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1Illustration on calculation of mutual information. Illustration on calculation of mutual information between positions i and j in multiple sequence alignments for protein amino acid sequence A and RNA base sequence B. In this figure, an arrow indicates that sequences connected with each other by the arrow belong to the same organism, and the third sequence in the alignment for RNA B is ignored in calculation of mutual information because it does not have a partner protein sequence of the same organism. Sequences A and B are shown at the first line of multiple sequence alignments, respectively, and gaps inserted by alignment algorithms are deleted with the columns.
Figure 2Neighboring residue-base pairs with (. Neighboring pairs with (i, j) are defined as (i ± 1, j), and (i, j ± 1).
Figure 3Relationship between the random variable . Relationship between the random variable rand the observations of mutual information m, and the pair (a) of the i-th amino acid in protein sequence A and the j-th base in RNA sequence B, in our CRF model.
Dataset of thirteen interacting protein-RNA pairs
| protein sequence A | RNA sequence B | PDB code | # sequences in MSA | # contacts | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| UniProt | Pfam | chain | length | GenBank | Rfam | chain | length | ||||
| RL18_THETH | PF00861 | R | 110 | RF00001 | B | 117 | 1543 | 28 | 85 | ||
| RL27_THET8 | PF01016 | Z | 81 | RF01118 | A | 108 | 1356 | 20 | 67 | ||
| RL27_ECOLI | PF01016 | W | 77 | RF01118 | 8 | 108 | 1356 | 18 | 69 | ||
| RL33_THET8 | PF00471 | 5 | 48 | RF01118 | A | 108 | 1445 | 18 | 40 | ||
| RL35_ECOLI | PF01632 | 3 | 61 | RF01118 | 8 | 108 | 1337 | 12 | 38 | ||
| RS5_ECOLI | PF00333 | E | 67 | RF00177 | A | 1530 | 1701 | 13 | 57 | ||
| RS7_ECOLI | PF00177 | G | 147 | RF00177 | A | 1530 | 1941 | 25 | 127 | ||
| RS8_THET8 | PF00410 | K | 135 | RF00177 | A | 1515 | 1889 | 29 | 93 | ||
| RS10_THET8 | PF00338 | M | 97 | RF00177 | A | 1515 | 1711 | 20 | 84 | ||
| RS12_THET8 | PF00164 | O | 122 | RF00177 | A | 1515 | 1972 | 45 | 161 | ||
| RS15_ECO57 | PF00312 | O | 83 | RF00177 | A | 1530 | 1821 | 21 | 89 | ||
| RS17_ECOLI | PF00366 | Q | 69 | RF00177 | A | 1530 | 1690 | 18 | 85 | ||
| RS17_THET8 | PF00366 | T | 69 | RF00177 | A | 1515 | 1690 | 29 | 93 | ||
For each protein-RNA pair, the identifiers of UniProt, Pfam, and the chain in PDB, the length of protein sequence A, the identifiers of GenBank, Rfam, and the chain, the length of RNA sequence B, the PDB code, the number of sequences in the multiple sequence alignment (MSA) combined on the basis of the organisms, and the number of contacts within 3 Å and that within 5 Å are shown.
Figure 4Example of residue-base contacts. (A) Protein RS12_THET8, chain 'O' of PDB code '1yl4', and the atoms of RNA M26923, chain 'A' within 3 Å of the protein. (B) Protein RS12_THET8 and the atoms of RNA M26923 within 5 Å of the protein. It should be noted that for the RNA molecule, only atoms within 3 Å/5Å of the protein are shown.
Classification of amino acids
| # groups | classification of amino acids |
|---|---|
| 2 | MLVICGATSPFYW/DENQRKH |
| 4 | MLVIC/GATSP/FYW/DENQRKH |
| 8 | MLVIC/GA/TS/P/FYW/DENQ/RK/H |
| 10 | MLVI/C/G/A/TS/P/FYW/DENQ/RK/H |
| 15 | MLVI/C/G/A/T/S/P/FY/W/D/E/N/Q/RK/H |
Classification of amino acids by Murphy et al. [34]. The two groups are classified by the hydrophobic and hydrophilic properties of amino acid side-chains. The group of (FYW) is aromatic hydrophobic, (TS), (DENQ), and (RK) are polar.
Results on average AUC scores for test pairs using the contact definition of 3 Å
| # groups |
|
| label | ||
|---|---|---|---|---|---|
| without lasso ( | |||||
| 2 | 0.550 | 0.557 | 0.503 | 0.511 | 0.502 |
| 4 | 0.534 | 0.517 | 0.547 | 0.505 | 0.502 |
| 8 | 0.541 | 0.555 | 0.535 | 0.512 | 0.521 |
| 10 | 0.528 | 0.557 | 0.519 | 0.529 | 0.536 |
| 15 | 0.538 | 0.533 | 0.498 | 0.523 | |
| 20 | 0.539 | 0.574 | 0.546 | 0.561 | 0.557 |
| lasso ( | |||||
| 2 | 0.556 | 0.570 | 0.505 | 0.520 | 0.492 |
| 4 | 0.525 | 0.542 | 0.611 | 0.615 | 0.596 |
| 8 | 0.509 | 0.562 | 0.610 | 0.603 | 0.600 |
| 10 | 0.525 | 0.553 | 0.634 | 0.633 | 0.629 |
| 15 | 0.510 | 0.569 | 0.634 | 0.621 | |
| 20 | 0.510 | 0.579 | 0.625 | 0.631 | 0.622 |
| lasso ( | |||||
| 2 | 0.533 | 0.521 | 0.510 | 0.504 | 0.508 |
| 4 | 0.533 | 0.543 | 0.620 | 0.623 | 0.620 |
| 8 | 0.550 | 0.529 | 0.632 | 0.624 | 0.618 |
| 10 | 0.525 | 0.527 | 0.625 | 0.628 | 0.633 |
| 15 | 0.516 | 0.524 | 0.640 | 0.640 | |
| 20 | 0.514 | 0.546 | 0.626 | 0.641 | 0.642 |
Results on average AUC scores for test pairs using the contact definition of 3 Å, M I, M I, labels representing kinds of amino acids and bases, and the grouping of amino acids with lasso parameter C = 0, 1, and 2.
Results on average AUC scores for test pairs using the contact definition of 5 Å
| # groups |
|
| label | ||
|---|---|---|---|---|---|
| without lasso ( | |||||
| 2 | 0.550 | 0.520 | 0.568 | 0.547 | 0.565 |
| 4 | 0.543 | 0.506 | 0.584 | 0.563 | 0.581 |
| 8 | 0.541 | 0.576 | 0.584 | 0.578 | 0.570 |
| 10 | 0.527 | 0.545 | 0.528 | 0.560 | |
| 15 | 0.527 | 0.587 | 0.539 | 0.526 | 0.518 |
| 20 | 0.530 | 0.570 | 0.539 | 0.506 | 0.508 |
| lasso ( | |||||
| 2 | 0.527 | 0.570 | 0.564 | 0.575 | 0.562 |
| 4 | 0.552 | 0.555 | 0.582 | 0.571 | 0.575 |
| 8 | 0.510 | 0.559 | 0.581 | 0.584 | 0.590 |
| 10 | 0.511 | 0.567 | 0.587 | 0.579 | 0.590 |
| 15 | 0.523 | 0.571 | 0.571 | 0.578 | 0.574 |
| 20 | 0.514 | 0.572 | 0.581 | 0.587 | |
| lasso ( | |||||
| 2 | 0.543 | 0.585 | 0.581 | 0.567 | 0.566 |
| 4 | 0.513 | 0.557 | 0.582 | 0.584 | 0.580 |
| 8 | 0.509 | 0.568 | 0.576 | 0.574 | 0.579 |
| 10 | 0.500 | 0.563 | 0.594 | 0.588 | 0.590 |
| 15 | 0.505 | 0.591 | 0.583 | 0.576 | 0.582 |
| 20 | 0.502 | 0.566 | 0.594 | 0.598 | |
Results on average AUC scores for test pairs using the contact definition of 5 Å, M I, M I, labels representing kinds of amino acids and bases, and the grouping of amino acids with lasso parameter C = 0, 1, and 2.
Figure 5Average ROC curves of the best case in our experiments for training and test pairs. Average ROC curves for training and test pairs using both of M Iand labels with the classification of 15 groups with lasso parameter C = 2 using the contact definition of 3 Å.
Results on average elapsed time
| # groups |
| label | |||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 1 | 2 | 1 | 2 | ||||
| 2 | 55.5 | 43.5 | 41.5 | 46.2 | 46.9 | 46.2 | 80.8 | 64.6 | 62.6 |
| 4 | 57.9 | 51.9 | 42.8 | 50.6 | 47.7 | 48.0 | 127.7 | 63.8 | 62.5 |
| 8 | 56.9 | 55.8 | 55.6 | 54.3 | 50.5 | 50.8 | 194.9 | 68.2 | 67.1 |
| 10 | 54.2 | 57.4 | 52.5 | 57.1 | 52.2 | 51.8 | 235.1 | 73.0 | 72.8 |
| 15 | 55.6 | 57.2 | 55.2 | 65.2 | 55.5 | 55.1 | 342.5 | 79.8 | 79.2 |
| 20 | 57.8 | 60.4 | 55.2 | 68.1 | 58.2 | 58.3 | 320.8 | 84.6 | 82.9 |
Results on average elapsed time (sec) for an iteration of the cross validation using the contact definition of 3 Å, , labels representing kinds of amino acids and bases, and the grouping of amino acids with lasso parameter C = 0, 1, and 2.