| Literature DB >> 27451921 |
Abstract
Single genetic mutations are always followed by a set of compensatory mutations. Thus, multiple changes commonly occur in biological sequences and play crucial roles in maintaining conformational and functional stability. Although many methods are available to detect single mutations or covariant pairs, detecting non-synchronous multiple changes at different sites in sequences remains challenging. Here, we develop a novel algorithm, named Fastcov, to identify multiple correlated changes in biological sequences using an independent pair model followed by a tandem model of site-residue elements based on inter-restriction thinking. Fastcov performed exceptionally well at harvesting co-pairs and detecting multiple covariant patterns. By 10-fold cross-validation using datasets of different scales, the characteristic patterns successfully classified the sequences into target groups with an accuracy of greater than 98%. Moreover, we demonstrated that the multiple covariant patterns represent co-evolutionary modes corresponding to the phylogenetic tree, and provide a new understanding of protein structural stability. In contrast to other methods, Fastcov provides not only a reliable and effective approach to identify covariant pairs but also more powerful functions, including multiple covariance detection and sequence classification, that are most useful for studying the point and compensatory mutations caused by natural selection, drug induction, environmental pressure, etc.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27451921 PMCID: PMC4958985 DOI: 10.1038/srep30425
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Schematic of the correlated tandem model.
Clustering performance by covariant patterns.
| Cluster | Seq_num | Genotype | Accuracy | Matched genotypes | Patterns |
|---|---|---|---|---|---|
| 3 | 594 | A | A: 588, B: 0, C: 0, D: 0 | ||
| 1 | 594 | A | A: 387, B: 2, C: 1, D: 2 | ||
| 2 | 594 | A | A: 483, B: 0, C: 0, D: 2 | ||
| 20 | 2830 | BCD | A: 34, B: 961, C:1116, D: 749 | ||
| 21 | 2830 | BCD | A: 84, B: 961, C:1116, D: 749 | ||
| 8 | 961 | B | A: 0, B: 958, C: 0, D: 0 | ||
| 16 | 2463 | ACD | A: 594, B: 0, C:1112, D: 753 | ||
| 9 | 1116 | C | A: 0, B: 1, C:1104, D: 0 | ||
| 7 | 1116 | C | A: 0, B: 0, C: 927, D: 0 | ||
| 15 | 2308 | ABD | A: 594, B: 960, C: 15, D: 753 | ||
| 6 | 753 | D | A: 0, B: 0, C: 0, D: 749 | ||
| 4 | 753 | D | A: 0, B: 0, C: 0, D: 741 | ||
| 5 | 753 | D | A: 2, B: 0, C: 1, D: 739 | ||
| 19 | 2671 | ABC | A: 579, B: 961, C:1111, D: 8 | ||
| 18 | 2671 | ABC | A: 594, B: 961, C:1070, D: 19 | ||
| 17 | 2671 | ABC | A: 589, B: 955, C:1069, D: 7 | ||
| 10 | 1555 | AB | A: 592, B: 953, C: 2, D: 4 | ||
| 11 | 1555 | AB | A: 593, B: 959, C: 18, D: 2 | ||
| 12 | 1555 | AB | A: 593, B: 961, C: 21, D: 16 | ||
| 14 | 1869 | CD | A: 0, B: 1, C:1112, D: 752 | ||
| 13 | 1869 | CD | A: 0, B: 1, C:1089, D: 751 |
Note: the sites are denoted by the order of the aligned data.
Ten-fold cross-validation of performance for datasets of different scales
| Data Size | Seq num | Avg.sensitivity | Avg.specitity | Avg.accurracy |
|---|---|---|---|---|
Performance of different algorithms
| Fastcov | |||||||
| PSIcov | |||||||
| CAPS |
Figure 2Venn diagrams of co-pairs produced by Fastcov and PSIcov.
(A) At a threshold of 0.7 in Fastcov. (B) At a threshold of 0.5, with purity of one pair more than 0.8 in Fastcov. (C) Union of A and B in Fastcov.
Figure 3Covariant patterns corresponding to different families in the phylogenetic tree.
Note: The sites are denoted by the order of the aligned data.
Figure 4Covariant patterns corresponding to different families in the phylogenetic tree.
Note: The sites are denoted by the order of the aligned data.
Figure 5Distribution of covariant sites (yellow) between human and mouse TLR4 (A) left view; (B) right view.