| Literature DB >> 25652056 |
Mina Cintho Ozahata1, Ester Cerdeira Sabino2, Ricardo Sobhie Diaz3, Roberto M Cesar4, João Eduardo Ferreira5.
Abstract
BACKGROUND: In this study, clustering was performed using a bitmap representation of HIV reverse transcriptase and protease sequences, to produce an unsupervised classification of HIV sequences. The classification will aid our understanding of the interactions between mutations and drug resistance. 10,229 HIV genomic sequences from the protease and reverse transcriptase regions of the pol gene and antiretroviral resistant related mutations represented in an 82-dimensional binary vector space were analyzed.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25652056 PMCID: PMC4344997 DOI: 10.1186/s12859-015-0452-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Pipeline summarizing the proposed framework. 1) Protease and reverse transcriptase sequences were gathered from patients from all over Brazil, 2) binarization of the sequences, 3) clustering of the mutations, 4) characterization of the clusters and 5) comparison with the Brazilian look-up-table predictions.
Related works
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Liu et al. 2008 [ | Protease | PI | PR1 to PR99 | (PR30 PR75 PR88), | 7758+8761 (Subtype B and non-Subtype B) | k-way clustering |
| (PR1–PR9 PR12–PR15 | ||||||
| PR17 PR19 PR20 PR22 | ||||||
| PR25 PR26 PR28 | ||||||
| PR31 PR35–PR42 | ||||||
| PR45 PR49 PR52) | ||||||
| (PR56 PR57 PR59 | ||||||
| PR61 PR65 PR68–PR70 | ||||||
| PR77 PR83 PR87 | ||||||
| PR89 PR96–PR99) | ||||||
| (PR1 PR2 PR9 PR26 PR30 | ||||||
| PR40 PR45 PR56 | ||||||
| PR59 PR75 PR81 PR88 PR98) | ||||||
| (PR13–PR15 PR20 PR35–PR38 | ||||||
| PR41 PR42 PR49 PR57 | ||||||
| PR69 PR70 PR77 PR83 PR89) | ||||||
| (PR10 PR23 PR24 | ||||||
| PR27 PR32–PR34 PR43 | ||||||
| PR46–PR48 PR50 PR53–PR55 | ||||||
| PR58 PR71 PR76 PR80 PR82) | ||||||
| (PR30 PR75 PR88) | ||||||
| (PR1 PR2 PR9 PR26 | ||||||
| PR40 PR45 PR59 PR87 PR98) | ||||||
| (PR13–PR15 PR20 PR35–PR38 | ||||||
| PR41 PR49 PR57 PR69 | ||||||
| PR70 PR77 PR83 PR89) | ||||||
| (PR10 PR23 PR24 | ||||||
| PR27 PR32–PR34 | ||||||
| PR42 PR43 PR46–PR48 | ||||||
| PR50 PR53–PR55 | ||||||
| PR58 PR71PR76 PR80 PR82) | ||||||
| Reuman et al. 2010 [ | Reverse transcriptase | NNRTI | RT90, RT94, RT98, | (RT101,RT181,RT190) | 13039 | Jaccard similarity |
| RT100, RT101, RT102 | (RT103,RT181,RT190) | (10504 Subtype B, | coefficient, | |||
| RT103, RT105, RT106, | (RT108,RT181,RT221) | 747 Subtype C, | Holm’s correction, | |||
| RT108, RT138, | (RT98,RT181,RT190) | 363 (CRF) 01_AE, | Poissoness plot | |||
| RT139, RT178, RT179, | (RT181,RT190,RT221) | 210 Subtype A, | ||||
| RT181, RT188, | (RT103,RT181,RT221) | 320 CRF 02_AG, | ||||
| RT190, RT221, RT223, | (RT103,RT108,RT221) | 895 others) | ||||
| RT225, RT227, | (RT101,RT108,RT181) | |||||
| RT230, RT232, | (RT101,RT108,RT190) | |||||
| RT234, RT236, | (RT103,RT108,RT181) | |||||
| RT237, RT238, | (RT108,RT190,RT221) | |||||
| RT241, RT242, RT318 | (RT98,RT108,RT181) | |||||
| (RT98,RT101,RT190) | ||||||
| (RT98,RT101,RT181) | ||||||
| (RT101,RT181,RT190) | ||||||
| (RT101,RT181,RT221) | ||||||
| (RT98,RT103,RT108) | ||||||
| (RT101,RT181,RT190) | ||||||
| (RT108,RT181,RT190) | ||||||
| (RT98,RT103,RT181) | ||||||
| Wu et al. 2003 [ | Protease | PI | PR1 to PR99 | (PR10 PR63 | 2244 (Subtype B) | binomial correlation |
| PR71 PR73 PR90) | coefficients, pca | |||||
| (PR10 PR63 | ||||||
| PR71 PR90 PR93) | ||||||
| (PR10 PR62 | ||||||
| PR63 PR90 PR93) | ||||||
| (PR10 PR62 | ||||||
| PR63 PR73 PR90) | ||||||
| (PR10 PR20 | ||||||
| PR71 PR73 PR90) | ||||||
| (PR10 PR20 | ||||||
| PR62 PR73 PR90) | ||||||
| (PR10 PR46 | ||||||
| PR71 PR90 PR93) | ||||||
| (PR10 (PR30) | ||||||
| PR73 PR84 PR90) | ||||||
| (PR10 (PR30) | ||||||
| PR46 PR84 PR90) | ||||||
| (PR10 PR71 PR73 PR84 PR90) | ||||||
| (PR10 PR46 PR71 PR84 PR90) | ||||||
| (PR10 PR24 PR46 | ||||||
| PR10 PR46 PR90) | ||||||
| (PR10 (PR30) | ||||||
| PR46 PR54 PR82) | ||||||
| (PR10 PR48 PR54 PR82) | ||||||
| (PR10 PR24 | ||||||
| PR46 PR54 PR82) | ||||||
| (PR32 PR46 PR82) | ||||||
| (PR10 PR46 PR53 | ||||||
| PR54 PR71 PR82) | ||||||
| (PR30 (PR82) PR88) | ||||||
| (PR13 PR30 PR88) | ||||||
| (PR30 PR75 PR88) | ||||||
| (PR10 PR46 | ||||||
| PR63 PR71 PR93) | ||||||
| (PR20 PR36 PR54) | ||||||
| (PR10 PR20 PR54 PR71) | ||||||
| (PR63 (PR64) PR71) | ||||||
| (PR10 PR77 PR93) | ||||||
| (PR20 PR36 PR62) | ||||||
| (PR20 PR35 PR36 (PR77)) | ||||||
| (PR15 PR20 PR36 (PR77)) | ||||||
| (PR10 PR24 PR89) | ||||||
| (PR10 PR20 PR73) | ||||||
| (PR10 PR73 PR77) |
Protease positions are represented by the prefix PR and reverse transcriptase positions by the prefix RT.
Related works
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Rhee et al. 2004 [ | Protease | PI, | PR24, PR30, PR32, | (PR30,PR88) (PR46,PR90) | 2795 | |
| and Reverse | NRTI, | PR46, PR47, PR48, | (PR73,PR90) | (27 Subtype C, | ||
| transcriptase | NNRTI | PR50, PR53, PR54, | (PR54,PR82,PR90) | 15 Subtype A, | ||
| PR73, PR82, PR84, | (PR24,PR46,PR54,PR82) | 7 Subtype D, | ||||
| PR88, PR90 | (PR73,PR84,PR90) | 2746 Subtype B) | ||||
| RT41, RT44, RT62, | (PR46,PR54,PR82,PR90) | |||||
| RT65, RT67, RT69, | (PR84,PR90) (PR46,PR88) | |||||
| RT70, RT74, RT115, | (PR46,PR73,PR90) (PR54,PR82) | |||||
| RT116, RT118, RT151, | (PR46,PR84,PR90) | |||||
| RT184, RT210, | (PR46,PR54,PR82,PR90) | |||||
| RT215, RT219 | (PR46,PR73,PR84 PR90) | |||||
| (PR30,PR88,PR90) | ||||||
| (PR48,PR54,PR82), | ||||||
| (PR32,PR46,PR82,PR90) | ||||||
| (PR24,PR46,PR54,PR82) | ||||||
| (PR53,PR54,PR82,PR90) | ||||||
| (PR24,PR46,PR82) (PR46,PR82) | ||||||
| (PR46,PR90) (PR30,PR46,PR88) | ||||||
| (RT41, RT184, RT215) | ||||||
| (RT41, RT184, RT210) | ||||||
| (RT41, RT215) | ||||||
| (RT67, RT70, RT184, RT219) | ||||||
| (RT70, RT184) | ||||||
| (RT41, RT210, RT215) | ||||||
| (RT184, RT215) | ||||||
| (RT41, RT118, RT184) | ||||||
| (RT210, RT215) | ||||||
| (RT41, RT67, RT118, RT210, RT215) | ||||||
| (RT74, RT184) | ||||||
| (RT67, RT70, RT184) | ||||||
| (RT67, RT69, RT70, RT184, RT219) | ||||||
| (RT41, RT67, RT184, | ||||||
| RT210, RT215) | ||||||
| (RT41, RT184) | ||||||
| (RT62, RT184) | ||||||
| (RT41, RT44, RT67, RT118) | ||||||
| (RT184, RT210, RT215) | ||||||
| (RT67, RT70, RT184, RT215, RT219) | ||||||
| (RT67, RT70, RT219) | ||||||
| (RT67, RT70) | ||||||
| (RT41, RT184, RT215) | ||||||
| (RT41, RT118, RT210, RT215) | ||||||
| (RT41, RT67, RT210, RT215) | ||||||
| (RT69, RT70) | ||||||
| (RT41, RT44, RT67, RT118, | ||||||
| RT210, RT215) | ||||||
| (RT41, RT74, RT184, RT215, RT69) | ||||||
| (RT103 RT181) | ||||||
| (RT100 RT103)(RT103 RT108) | ||||||
| (RT101 RT190) | ||||||
| (RT103 RT225) | ||||||
| (RT103 RT181 RT190) | ||||||
| (RT103 RT190) | ||||||
| (RT181 RT190) | ||||||
| (RT103 RT238)(RT101 RT103) | ||||||
| (RT108 RT181) | ||||||
| (RT101 RT181 RT190) | ||||||
| (RT98 RT103) | ||||||
| (RT103 RT108 RT181) | ||||||
| (RT103 RT188)(RT103 RT230) | ||||||
| Gonzales et al. 2003 [ | Protease | PI, | RT41, RT62, RT65, | (RT41,RT184,RT215) | 487 | Fisher’s |
| and Reverse | NRTI, | RT67, RT69, RT70, | (RT41,RT184,RT210,RT215) | (Subtype B) | exact | |
| transcriptase | NNRTI | RT74, RT75, RT77, | (RT67,RT70,RT215,RT219) | test, | ||
| RT115, RT116, RT151, | (RT41,RT67,RT69,RT210,RT215) | Benjamini- | ||||
| RT184, RT210, | (RT41,RT67,RT184,RT210, | Hochberg, | ||||
| RT215, and RT219 | RT215,RT219) | K-medoids | ||||
| PR24, PR30, PR32, | (RT41,RT67,RT69,RT70, | |||||
| PR46, PR47, PR48, | RT184,RT215,RT219) | |||||
| PR50, PR53, PR54, | (RT65,RT70,RT75,RT77,RT115„ | |||||
| PR73, PR88, PR82, | RT116,RT151,RT184,RT219) | |||||
| PR84, and PR90 | (PR54,PR73,PR84,PR90) | |||||
| (PR46,PR84,PR90) | ||||||
| (PR24,PR46,PR54,PR82) | ||||||
| (PR46,PR54,PR82,PR90) | ||||||
| (PR48,PR,54,PR82) | ||||||
| Sing et al. 2005 [ | Reverse | NRTI | RT41, RT43, RT44, RT62, | (RT41, RT210,RT215) | 1355 | hierarchical |
| transcriptase | RT67, RT69, RT70, | (RT67,RT70,RT219) | clustering, | |||
| RT74, RT75, RT77, | Fisher’s | |||||
| RT116, RT118, RT151, | exact test | |||||
| RT203, RT208, | ||||||
| RT210, RT215, RT215, | ||||||
| RT218, RT219, | ||||||
| RT219, RT223, | ||||||
| RT228, RT228 | ||||||
| Brehm et al. 2012 [ | Reverse | NNRTI | (RT184,RT348) | 12 | ||
| transcriptase | (Subtype C) |
Protease positions are represented by the prefix PR and reverse transcriptase positions by the prefix RT.
Related works
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Hoffman et al. 2003 [ | Protease | PI | PR10, PR12, PR13, PR14, | (PR10,PR93) (PR12,PR19) | 1179 | Mutual |
| PR15, PR19, PR20, PR30, | (PR35,PR38)(PR63,PR64) | (Subtype B) | information | |||
| PR32, PR35, PR36, PR37, | (PR37,PR41)(PR62,PR71) | |||||
| PR41, PR46, PR48, PR54, | (PR71,PR77) (PR71,PR93) | |||||
| PR57, PR60, PR62, PR63, | (PR77,PR93)(PR12,PR19) | |||||
| PR64, PR69, PR71, PR72, | (PR15,PR77)(PR20,PR36) | |||||
| PR73, PR77, PR82, PR84, | (PR30,PR88)(PR35,PR36) | |||||
| PR88, PR90, PR93 | (PR35,PR37)(PR36,PR62) | |||||
| (PR36,PR77)(PR46,PR82) | ||||||
| (PR46,PR84)(PR48,PR54) | ||||||
| (PR48,PR82)(PR54,PR82) | ||||||
| (PR63,PR64)(PR63,PR90) | ||||||
| (PR77,PR93)(PR84,PR90) | ||||||
| (PR73,PR90) | ||||||
| Alteri et al. 2009 [ | Reverse | PI, | RT41, RT65, RT67, | (RT215,RT41,RT210) | 213 | Binomial |
| transcriptase | NRTI, | RT69, RT70, RT74, RT75, | (RT60,RT103) | (Subtype B) | correlation | |
| NNRTI | RT77, RT100, RT101, | coefficient, | ||||
| RT103, RT106, RT115, | Benjamini- | |||||
| RT116, RT151, RT181, | Hochberg | |||||
| RT184,RT188, RT190, | method | |||||
| RT210, RT215, RT219, | ||||||
| RT225, RT230, RT236, | ||||||
| Doherty et al. 2011 [ | Protease | PI | PR10, PR24, PR30, | (PR10,PR32,PR33, | 398 | Optimal |
| PR32, PR33, PR43, | PR46,PR47,PR54, | integer | ||||
| PR46, PR47, PR48, | PR71,PR73,PR84,PR90) | programming- | ||||
| PR50, PR53, PR54, | (PR10,PR33,PR43,PR46, | based | ||||
| PR71, PR73, PR74, | PR54,PR71,PR82,PR84,PR90) | clustering | ||||
| PR76, PR82, PR83, | (PR10,PR24,PR46, | |||||
| PR84, PR88, PR90 | PR54,PR71,PR74,PR82) | |||||
| (PR32,PR33,PR46,PR53, | ||||||
| PR54,PR71,PR84,PR90) | ||||||
| (PR10,PR30,PR32,PR33,PR46, | ||||||
| PR54,PR71,PR84,PR88,PR90) | ||||||
| (PR10,PR33,PR43,PR46,PR48, | ||||||
| PR50,PR54,PR71,PR82) | ||||||
| (PR10,PR32,PR46, | ||||||
| PR71,PR82,PR84) | ||||||
| (PR10,PR46,PR54,PR82,PR90) | ||||||
| (PR10,PR48,PR54,PR71, | ||||||
| PR73,PR76,PR84,PR90) | ||||||
| (PR10,PR24,PR32,PR33, PR43, | ||||||
| PR46,PR54,PR71,PR82,PR84) | ||||||
| (PR10,PR24,PR30, | ||||||
| PR33,PR43,PR53,PR88) | ||||||
| (PR10,PR43,PR47,PR48, | ||||||
| PR53,PR54,PR71,PR82,PR84) | ||||||
| (PR10,PR32,PR46, | ||||||
| PR47,PR71,PR82,PR90) | ||||||
| (PR10,PR33,PR54, | ||||||
| PR73,PR84,PR90) | ||||||
| (PR10,PR46,PR71,PR84,PR90) | ||||||
| (PR10,PR54,PR71, | ||||||
| PR73,PR82,PR90) | ||||||
| (PR10,PR32,PR33, | ||||||
| PR47,PR71,PR82,PR90) | ||||||
| (PR10,PR46,PR54, | ||||||
| PR71,PR82,PR90) | ||||||
| (PR10,PR24,PR33,PR46, | ||||||
| PR54,PR71,PR82) | ||||||
| (PR10,PR48,PR54,PR82,PR90) | ||||||
| (PR10,PR32,PR43, | ||||||
| PR46,PR47,PR82) | ||||||
| (PR10,PR54,PR71,PR82) | ||||||
| (PR10,PR46,PR47, | ||||||
| PR71,PR88,PR90) | ||||||
| (PR10,PR33,PR43,PR46, | ||||||
| PR50,PR54,PR71, | ||||||
| PR73,PR82,PR90) | ||||||
| (PR10,PR33,PR46, | ||||||
| PR54,PR71,PR88,PR90) | ||||||
| (PR10,PR46,PR71, | ||||||
| PR74,PR88,PR90) | ||||||
| (PR10,PR54,PR74,PR76,PR82) | ||||||
| (PR73,PR90) | ||||||
| (PR10,PR46,PR90) | ||||||
| (PR10,PR71,PR90) | ||||||
| (PR10,PR46,PR71) | ||||||
| (PR10,PR24,PR46,PR54,PR82) | ||||||
| Heider et al. 2013 [ | Reverse | NRTI | RT1 to RT240 | (RT41,RT70, | 600 | Multilabel |
| transcriptase | RT210,RT215) | (Subtype B) | classification | |||
| (RT41,RT65,RT67, | ||||||
| RT70,RT210,RT215,RT219) | ||||||
| (RT65,RT74,RT115) | ||||||
| (RT151,RT62,RT69, | ||||||
| RT75,RT77,RT116) | ||||||
| Yahi et al. 1999 [ | Protease | PI, | PR63, PR77,PR71, | (PR10,PR46) (PR46,PR71) | 287 | Chi-square |
| and | NRTI, | PR10, PR93, PR 36 | (PR46,PR90) (PR71,PR82) | or Kendall | ||
| Reverse | NNRTI | PR82, PR46, PR20, | (PR10,PR82) (PR54,PR82) | and | ||
| transcriptase | PR90 and PR54 | (PR82,PR90) (PR71,PR90) | Fisher’s | |||
| RT215, RT41, RT67, | (PR10,PR90) (PR46,PR90) | two-tailed | ||||
| RT69, RT70, RT184, | (PR54,PR90) (PR77,PR90) | |||||
| RT210 and RT219 | (PR82,PR90) | |||||
| (RT41,RT210) (RT67,RT70) | ||||||
| (RT69,RT70) (RT70,RT219) | ||||||
| (RT41,RT210) (RT184,RT210) | ||||||
| (RT210,RT215) (RT70,RT219) | ||||||
| (RT67,RT219) (RT69,RT219) | ||||||
| Melikian et al. 2013 [ | Reverse | NNRTI | (RT101,RT103,RT106, | 1752 | Least | |
| transcriptase | RT181,RT188,RT190) | (1681 | angle | |||
| (RT100,RT101,RT103, | Subtype B) | regression | ||||
| RT106,RT188,RT190) | (LARS) | |||||
| (RT101,RT181,RT190,RT227) | ||||||
| (RT100,RT101,RT181, | ||||||
| RT190,RT227) |
Protease positions are represented by the prefix PR and reverse transcriptase positions by the prefix RT.
Protease and reverse transcriptase amino acid positions considered in the present study
|
|
|
|
| |
|---|---|---|---|---|
| 1 | Reverse transcriptase | 41 | Protease | 8 |
| 2 | Reverse transcriptase | 44 | Protease | 10 |
| 3 | Reverse transcriptase | 50 | Protease | 11 |
| 4 | Reverse transcriptase | 65 | Protease | 13 |
| 5 | Reverse transcriptase | 67 | Protease | 15 |
| 6 | Reverse transcriptase | 69 | Protease | 16 |
| 7 | Reverse transcriptase | 70 | Protease | 20 |
| 8 | Reverse transcriptase | 74 | Protease | 24 |
| 9 | Reverse transcriptase | 75 | Protease | 30 |
| 10 | Reverse transcriptase | 77 | Protease | 32 |
| 11 | Reverse transcriptase | 98 | Protease | 33 |
| 12 | Reverse transcriptase | 100 | Protease | 34 |
| 13 | Reverse transcriptase | 101 | Protease | 35 |
| 14 | Reverse transcriptase | 103 | Protease | 36 |
| 15 | Reverse transcriptase | 106 | Protease | 41 |
| 16 | Reverse transcriptase | 108 | Protease | 43 |
| 17 | Reverse transcriptase | 115 | Protease | 45 |
| 18 | Reverse transcriptase | 116 | Protease | 46 |
| 19 | Reverse transcriptase | 118 | Protease | 47 |
| 20 | Reverse transcriptase | 151 | Protease | 48 |
| 21 | Reverse transcriptase | 157 | Protease | 50 |
| 22 | Reverse transcriptase | 179 | Protease | 53 |
| 23 | Reverse transcriptase | 180 | Protease | 54 |
| 24 | Reverse transcriptase | 181 | Protease | 57 |
| 25 | Reverse transcriptase | 184 | Protease | 58 |
| 26 | Reverse transcriptase | 188 | Protease | 60 |
| 27 | Reverse transcriptase | 190 | Protease | 62 |
| 28 | Reverse transcriptase | 208 | Protease | 63 |
| 29 | Reverse transcriptase | 210 | Protease | 67 |
| 30 | Reverse transcriptase | 211 | Protease | 69 |
| 31 | Reverse transcriptase | 214 | Protease | 70 |
| 32 | Reverse transcriptase | 215 | Protease | 71 |
| 33 | Reverse transcriptase | 219 | Protease | 73 |
| 34 | Reverse transcriptase | 225 | Protease | 74 |
| 35 | Reverse transcriptase | 227 | Protease | 76 |
| 36 | Reverse transcriptase | 230 | Protease | 77 |
| 37 | Reverse transcriptase | 236 | Protease | 82 |
| 38 | Reverse transcriptase | 333 | Protease | 83 |
| 39 | Protease | 84 | ||
| 40 | Protease | 85 | ||
| 41 | Protease | 88 | ||
| 42 | Protease | 89 | ||
| 43 | Protease | 90 | ||
| 44 | Protease | 93 |
Figure 2Black and white figure of kmeans clusters for subtype B sequences of the HIV protease. The figure displays the different mutation patterns characterizing each subtype B protease cluster. The columns in the figure represent the amino acid positions selected to the clustering and the rows, the protein sequences. Blue lines delimit the six classes, the black pixels represent mutations and the white pixels the absence of mutations. The number identifying each cluster is on the left and the number of the sequences in the cluster on the right.
Figure 3Black and white figure of kmeans clusters for subtype B sequences of the HIV reverse transcriptase. The figure displays the different mutation patterns characterizing each subtype B reverse transcriptase cluster. The columns in the figure represent the amino acid positions selected for clustering and the rows represent the protein sequences. Blue lines delimit the six classes, the black pixels represent mutations and the white pixels represent the absence of mutations. The number identifying each cluster is on the left and the number of the sequences in the cluster on the right.
Figure 4Black and white figure of k-medoids clusters for subtype B sequences of the HIV protease. The figure displays the different mutation patterns characterizing each subtype B protease cluster. The columns in the figure represent the amino acid positions selected to the clustering and the rows, the protein sequences. Blue lines delimit the six classes, the black pixels represent mutations and the white pixels the absence of mutations.
Figure 5Black and white figure of k-medoids clusters for subtype B sequences of the HIV reverse transcriptase. The figure displays the different mutation patterns characterizing each subtype B reverse transcriptase cluster. The columns in the figure represent the amino acid positions selected for clustering and the rows represent the protein sequences. Blue lines delimit the six classes, the black pixels represent mutations and the white pixels represent the absence of mutations.
Figure 6Histogram showing the frequency of mutations in the protease kmeans clusters. Histograms containing the frequencies of mutations for each selected amino acid position in protease for each of the six clusters in subtype B at k=6. Each histogram represents one cluster found by K-means for k=6 in the protease sequences. Each bar in the histogram represents a protein position and the percentage of sequences in the cluster that contain a mutation at that position.
Figure 7Histogram showing the frequency of mutations in reverse transcriptase kmeans clusters. Histograms containing the frequencies of mutations for each selected amino acid position in the reverse transcriptase for each of the six clusters in subtype B at k=6. Each histogram represents one cluster found by K-means for k=6 in the reverse transcriptase sequences. Each bar in the histogram represents a protein position and the percentage of sequences in the cluster that contain a mutation at that position.
Figure 8Colored figure of the kmeans clusters for subtype B sequences of the HIV protease. The figure displays the predictions of drug resistance from the Brazilian look-up table for each cluster. The columns in the colored figure represent the nine drugs selected (ATV/R, DRV/R, FPV/R, IDV/R, LPV/R, SQV/R and TPV/R, in that order) and the rows represent the protein sequences. Black lines delimit the classes. The number identifying each cluster is on the left and the number of the sequences in the cluster on the right.
Figure 9Colored figure of the kmeans clusters for subtype B sequences of the HIV reverse transcriptase. The columns in the colored figure represent the nine drugs selected (3TC, ABC, AZT, d4T, ddI, TDF, EFV, ETV and NVP, in that order) and the rows represent the protein sequences. Black lines delimit the classes. The number identifying each cluster is on the left and the number of the sequences in the cluster is on the right.
Reverse transcriptase amino acid positions with mutations in at least 50% of the sequences by kmeans cluster
|
| ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Cluster B6.1 | 2010 | X | X | X | X | |||||||
| Cluster B6.2 | 2195 | X | X | |||||||||
| Cluster B6.3 | 823 | X | X | X | X | X | ||||||
| Cluster B6.4 | 1570 | X | X | X | X | X | X | |||||
| Cluster B6.5 | 1639 | X | ||||||||||
| Cluster B6.6 | 1992 | X | X | X | X | X | X | X | ||||
| Cluster C6.1 | 89 | X | X | |||||||||
| Cluster C6.2 | 60 | X | X | X | X | X | X | |||||
| Cluster C6.3 | 37 | X | X | X | X | X | X | X | X | X | X | |
| Cluster C6.4 | 106 | X | X | X | X | |||||||
| Cluster C6.5 | 53 | X | X | X | X | X | X | X | X | |||
| Cluster C6.6 | 59 | X | X | X | X | X | X | |||||
| Cluster F6.1 | 159 | X | X | X | X | X | ||||||
| Cluster F6.2 | 164 | X | X | |||||||||
| Cluster F6.3 | 99 | X | X | X | X | X | X | X | ||||
| Cluster F6.4 | 54 | X | X | |||||||||
| Cluster F6.5 | 162 | X | X | X | X | X | X | X | ||||
| Cluster F6.6 | 94 | X | X | X | X | |||||||
Protease amino acid positions with mutations in at least 50% of the sequences by kmeans cluster
|
| |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Cluster B6.1 | 2425 | X | |||||||||||||||||
| Cluster B6.2 | 1952 | X | X | X | X | X | X | X | X | X | X | X | |||||||
| Cluster B6.3 | 1071 | X | X | X | X | X | X | X | |||||||||||
| Cluster B6.4 | 1752 | X | X | ||||||||||||||||
| Cluster B6.5 | 1663 | X | |||||||||||||||||
| Cluster B6.6 | 1366 | X | X | X | X | X | X | X | |||||||||||
| Cluster C6.1 | 53 | X | X | X | X | X | X | X | X | ||||||||||
| Cluster C6.2 | 138 | X | X | X | X | ||||||||||||||
| Cluster C6.3 | 114 | X | X | X | X | ||||||||||||||
| Cluster C6.4 | 31 | X | X | X | X | X | X | X | X | X | |||||||||
| Cluster C6.5 | 52 | X | X | X | X | X | X | X | X | X | X | X | |||||||
| Cluster C6.6 | 16 | X | X | X | X | X | X | X | |||||||||||
| Cluster F6.1 | 89 | X | X | X | X | X | X | X | X | X | |||||||||
| Cluster F6.2 | 70 | X | X | X | X | X | X | X | X | X | |||||||||
| Cluster F6.3 | 81 | X | X | X | X | X | X | X | X | X | X | X | X | ||||||
| Cluster F6.4 | 247 | X | X | X | X | X | |||||||||||||
| Cluster F6.5 | 98 | X | X | X | X | X | X | X | X | X | X | X | |||||||
| Cluster F6.6 | 147 | X | X | X | X | ||||||||||||||