| Literature DB >> 25238967 |
Elisa Cilia, Stefano Teso, Sergio Ammendola, Tom Lenaerts, Andrea Passerini1.
Abstract
BACKGROUND: Viruses are typically characterized by high mutation rates, which allow them to quickly develop drug-resistant mutations. Mining relevant rules from mutation data can be extremely useful to understand the virus adaptation mechanism and to design drugs that effectively counter potentially resistant mutants.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25238967 PMCID: PMC4261881 DOI: 10.1186/1471-2105-15-309
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Background knowledge predicates
| Background knowledge predicates | |
|---|---|
| position(AA,Pos) | Indicates an amino acid in the wild type sequence |
| mut(MutID,AA,Pos,AA1) | Indicates a mutation: mutation or mutant identifier, position and amino acids involved, before and after the substitution |
| res_against(MutID,Drug) | Indicates whether a mutation or mutant is resistant to a certain drug |
| color(Color,AA) | Indicates the coloring group of a natural amino acid |
| typeaa(T,AA) | Indicates the type (e.g. aliphafatic, charged, aromatic, polar) of a natural amino acid |
| same_color_type(AA1,AA2) | Indicates whether two amino acids belong to the same coloring group |
| same_typeaa(AA1,AA2,T) | Indicates whether two amino acids are of the same type T |
| same_color_type_mut(MutID, Pos) | Indicates a mutation to an amino acid of the same coloring group |
| different_color_type_mut(MutID, Pos) | Indicates a mutation changing the coloring group of the amino acid |
| same_type_mut_t(MutID, Pos, T) | Indicates a mutation to an amino acid of the same type T |
| different_type_mut_t(MutID, Pos) | Indicates a mutation changing the type of the amino acid |
| aamutations(Pos,AA1,AA2,Num) | Indicates whether a given mutation requires at least a single, double, or triple nucleotide substitution |
| close_to_site(Pos) | Indicates whether a specific position is close to a binding or active site if any |
| location(L,Pos) | Indicates in which fragment of the primary sequence the amino acid is located |
| conservation(Pos,ConsClass) | Indicates whether a position is highly conserved or not |
| in_ss(SS,N,Pos) | Indicates whether a mutation occurs within the Nth secondary structure element of a given type |
| in_motif(Pos,Motif) | Indicates whether a mutation occurs within a known sequence motif |
| catalytic_propensity(AA,CP) | Indicates whether an amino acid has a high, medium or low catalytic propensity |
| mutated_residue_cp(Rw,Pos,Rm,CPold,CPnew) | Indicates how, in a mutated position, the catalytic propensity has changed (e.g. from low to high) |
Summary of the background knowledge facts and rules. MutID is a mutation or a mutant identifier depending on the type of the learning problem.
Amino acid types encoded in color classes
| Color class | Amino acids | Description |
|---|---|---|
| Red | AVFPMILW | Small and/or hydrophobic and/or aromatic |
| Blue | DE | Acidic |
| Magenta | RK | Basic |
| Green | STYHCNGQ | Hydroxyl and/or polar and/or basic |
Classification of amino acid types in color classes originally proposed in [33] and used to define the color/2predicate.
Figure 1Mutation engineering algorithm. Schema of the mutation engineering algorithm.
Figure 2Model for the resistance to NNRTI learned from Dataset 1. An example of learned hypothesis for the NNRTI task with highlighted amino acid positions covered by the hypothesis clauses.
Figure 3Model for the resistance to NNRTI learned from Dataset 2. An example of learned hypothesis for the NNRTI task with highlighted amino acid positions covered by the hypothesis clauses.
Figure 4Mean recall trend by number of satisfied clauses (Dataset 1). Mean recall of the generated mutations on the resistance test set mutations from Dataset 1 by varying the number of satisfied clauses. The mean recall values in orange refer to the proposed generative algorithm. The mean recall values in green refer to a random generator of mutations.
Most frequent learned clauses (Dataset 1)
| # models | Learned clause |
|---|---|
|
| |
| 21.8 | mut(A,B,C,D) AND strand(C) |
| 20.5 | mut(A,B,C,D) AND location(11,C) |
| 17.1 | mut(A,B,C,D) AND strand(C) AND in_motif(C,’prf:RT_POL’) |
| 9.9 | mut(A,B,C,D) AND in_motif(C,’pfam_fs:RVT_1’) |
| 9.4 | mut(A,B,C,D) AND same_type_mut_t(A,C,neutral) AND strand(C) |
| 7.9 | mut(A,B,C,D) AND color(red,D) AND in_motif(C,’prf:RT_POL’) |
| 7.3 | mut(A,B,C,D) AND same_type_mut_t(A,C,nonpolar) |
| 6.8 | mut(A,B,C,D) AND in_motif(C,’prf:RT_POL’) |
| 6.1 | mut(A,B,C,D) AND color(red,B) |
| 5.9 | mut(A,y,C,D) |
|
| |
| 25.2 | mut(A,B,C,D) AND location(7,C) |
| 18.8 | mut(A,B,C,D) AND in_motif(C,’prf:RT_POL’) |
| 16.1 | mut(A,B,C,D) AND turn(C) AND in_motif(C,’prf:RT_POL’) |
| 12.1 | mut(A,B,C,D) AND same_type_mut_t(A,C,neutral) AND in_motif(C,’prf:RT_POL’) |
| 11.3 | mut(A,B,C,D) AND coil(C) AND conservation(C, high) |
| 11.1 | mut(A,B,C,D) AND conservation(C, high) |
| 11 | mut(A,B,C,D) AND same_color_type_mut(A,B) AND in_motif(B,’prf:RT_POL’) |
| 8.7 | mut(A,B,C,D) AND same_color_type_mut(A,B) |
| 7.3 | mut(A,B,C,D) AND in_motif(C,’pfam_fs:RVT_1’) |
| 7.3 | mut(A,B,C,D) AND color(red,B) AND in_motif(C,’prf:RT_POL’) |
List of the ten most frequent rules learned on Dataset 1, sorted by average number of models they appear in.
Figure 5Mean recall of the generated mutations on the resistance test set mutations from Dataset 2 by varying the threshold on the prediction confidence, and the corresponding average number of overall generated mutations (i.e., not necessarily in the test set), in blue. The red line refers to the random generator of mutants. (a) Left panel: results for the NNRTI case. (b) Right panel: results for the NRTI case.
Most frequent learned clauses (Dataset 2)
| # models | Learned clause |
|---|---|
|
| |
| All | mut(A,B,C,D) AND position(C,X) |
| 9 | mut(A,B,C,D) AND position(C,103) AND typeaa(neutral,D) |
| 6 | mut(A,B,C,D) AND position(C,106) AND typeaa(tiny,D) |
| 6 | mut(A,y,C,D) AND typeaa(neutral,D) AND strand(C) |
| 6 | mut(A,y,C,D) AND strand(C) |
| 5 | mut(A,B,C,a) AND position(C,106) |
| 5 | mut(A,y,C,D) AND typeaa(neutral,D) |
| 4 | mut(A,B,C,D) AND position(C,90) AND correlated_mut(A,C,E) |
| 4 | mut(A,B,C,D) AND position(C,143) AND same_type_aa(D,B,polar) |
| 3 | mut(A,B,C,D) AND typeaa(aromatic,B) AND strand(C) AND typeaa(neutral,D) |
|
| |
| All | mut(A,B,C,D) AND position(C,X) |
| 17 | mut(A,m,C,D) AND same_type_aa(B,D,nonpolar) |
| 13 | mut(A,m,C,D) AND highconservation(C) |
| 12 | mut(A,w,C,D) |
| 9 | mut(A,m,C,D) AND inMotif(C,pfam_ls:RVT_1) |
| 9 | mut(A,m,C,D) |
| 9 | mut(A,p,C,D) |
| 6 | mut(A,B,C,D) AND position(C,165) AND correlated_mut(A,C,E) |
| 6 | mut(A,B,C,D) AND position(C,188) AND correlated_mut(A,C,E) |
| 6 | mut(A,m,C,D) AND inMotif(C,prf:RT_POL) |
| 6 | mut(A,m,C,D) AND inMotif(C,pfam_fs:RVT_1) |
List of the ten most frequent learned rules for Dataset 2, sorted by number of models they appear in. The table also includes the clause position(C,X), which is present in all models for different values of X.