| Literature DB >> 21172023 |
María C Méndez-Ortega1, Silvia Restrepo, Luis M Rodríguez-R, Iván Pérez, Juan C Mendoza, Andrés P Martínez, Roberto Sierra, Gloria J Rey-Benito.
Abstract
BACKGROUND: HIV-1 can be inhibited by RNA interference in vitro through the expression of short hairpin RNAs (shRNAs) that target conserved genome sequences. In silico shRNA design for HIV has lacked a detailed study of virus variability constituting a possible breaking point in a clinical setting. We designed shRNAs against HIV-1 considering the variability observed in naïve and drug-resistant isolates available at public databases.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21172023 PMCID: PMC3022682 DOI: 10.1186/1743-422X-7-369
Source DB: PubMed Journal: Virol J ISSN: 1743-422X Impact factor: 4.099
Figure 1Crystallographic structure of RT indicating Selected Regions. (a) RT crystallographic structure 2ZD1 (1.8 Å) highlights the residues within the selected regions, Dark gray = p66 subunit, light gray = p55, dark blue = active site residues involved in dNTP binding (K65, R72, D110, V111, G112, D113, A114, Y115, Q151), green = active site residues involved in DNA binding (L74, V75, D76, R78, N81, E89, Q91, L92, I94, G152, K154, P157, M230, G231), purple = active site residues with no specific annotations (W24, P25, F61), pink = YMDD motif (Y183, M184, D185, D186), and light blue = residues involved in NNRTI binding (L100, K101, K102, K103, V179, Y188, G190, F227; not conserved). Ribbon shows continuity between amino acid chains.
Target regions within Conserved Domain RT_rtv
| Residue Position | Residue | Function annotation | HXB2 coordinates | Mutation in RT2 and prevalence3 | Evaluated Region in MA4 | |
|---|---|---|---|---|---|---|
| 1 | 24, 25 | DBS | 2619-2622 | - | 2610 - 2630 | |
| 2 | 60 | V | - | 2727 | I(14) | 2700 - 2760 |
| 61 | AS | 2730 | - | |||
| 62 | A | - | 2733 | V(14) | ||
| 64 | K | - | 2736 | R(1.9) | ||
| 65 | dBS | 2742 | R(2.1) | |||
| 67 | D | - | 2748 | N(38), G(2.5) | ||
| 3 | 72 | dBS | 2763 | - | 2750 -2800 | |
| 74-76, 78, 81 | DBS-AS | 2769-2790 | - | |||
| 4 | 89 | DBS-AS | 2814 | - | 2800 - 2840 | |
| 90 | V | - | 2828 | I(3.3) | ||
| 91,92,94 | DBS-AS | 2820-2829 | - | |||
| 5 | 100-103 | L,K,K,K | NNBS | 2847-2856 | - | 2835 - 2870 |
| 6 | 110-115 | dBS-AS | 2877-2892 | - | 2865 - 2905 | |
| 7 | 151 | dBS-AS | 3000 | M(3.4) | 2985-3020 | |
| 152,154,157 | DBS-AS | 3003-3018 | - | |||
| 8 | 178 | I | - | 3081 | M(7.5), L(6.4) | 3070-3130 |
| 179 | V | NNBS | 3084 | I(7.9), D(1.3) | ||
| 181 | Y | - | 3090 | C(14) | ||
| 183 | NNBS-DBS-AS | 3096 | - | |||
| 184 | M | variable | 3099 | V(50), I(1.4) | ||
| 185 | dBS-AS | 3102 | - | |||
| 186 | AS | 3105 | - | |||
| 188 | Y | NNBS | 3111 | L(3.5) | ||
| 190 | G | NNBS | 3117 | - | ||
| 9 | 227 | NNBS | 3228 | L(1.6) | 3210-3255 | |
| 228 | L | - | 3231 | H(12), R(4.2) | ||
| 230 | DBS-AS | 3237 | L(1.8) | |||
| 231 | DBS-AS | 3240 | - | |||
1Positions according to the HXB2 reference genome numbering system (coordinate map)
2RT drug resistance mutations prevalence was calculated from 17,167 sequences exposed to either of these drug types (HIV Drug Resistance Database)
3Mutation prevalence (percent) data are available at the HIV Drug Resistance Database
4MSA: multiple alignment
In bold, residues directly involved in enzyme activity
DBS, DNA binding site
dBS, dNTP binding site
NNBS, non-nucleoside reverse transcriptase inhibitor binding site
AS, active site
MSA coverage by shRNAs
| a MSA | b NSI | c VV | d W | e E | f SV | g ST-SV | h PC -SV (%) | i ST-DV | j PC-DV (%) |
|---|---|---|---|---|---|---|---|---|---|
| Pol Subtype B no Recombinants | 747 | 35 | 1 | 1.36 | 11 | 712 | 95.31 | 588 | 78.71 |
| Pol Group M plus Recombinants | 1143 | 46 | 1 | 1.42 | 12 | 1088 | 95.18 | 913 | 79.88 |
| Pol All Subtypes | 1160 | 52 | 1 | 1.59 | 14 | 1102 | 95 | 916 | 78.97 |
| Genome Subtype B no Recombinants | 760 | 35 | 1 | 1.35 | 12 | 728 | 95.79 | 599 | 78.82 |
| Genome Group M plus Recombinants | 1153 | 46 | 1 | 1.41 | 12 | 1098 | 95.22 | 918 | 79.62 |
| Genome All Subtypes | 1169 | 52 | 1 | 1.60 | 13 | 1107 | 94.7 | 920 | 78.69 |
| ZDV-3TC-EFV | 1185 | 27 | 1 | 1.27 | 10 | 1169 | 98.65 | 1013 | 85.49 |
| 1201 | 30 | 2 | 1.52 | 14 | 1177 | 98 | 926 | 77.1 | |
| 1348 | 53 | 3 | 1.94 | 13 | 1303 | 96.66 | 741 | 54.97 | |
| 2299_resistant_isolates | 1547 | 26 | 1 | 1.72 | 14 | 1552 | 98.84 | 1255 | 80.86 |
| D4T-3TC-NVP | 79 | 13 | 1 | 1.9 | 4 | 68 | 86.08 | 52 | 65.82 |
| ZDV-3TC-ABC | 52 | 10 | 1 | 1.78 | 3 | 41 | 78.85 | 33 | 63.46 |
| ZDV-3TC-NVP | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
a MSA, multiple sequence alignment
b NSI, number of sequences included in the analysis (sequences having gaps and ambiguous codons were discarded)
c VV, total number of viral variants (these last defined as those having nucleotide changes with respect to HXB2)
d W, number of selected windows throughout the MSA, with a score threshold of 2 (windows satisfied specific requirements, see Methods)
e E, entropy per window
f SV, number of subdominant variants (are sequences that appear more than 4 times in an MSA, see Methods)
g ST-SV, sequences targeted by the group of subdominant variants.
h PC-SV, percentage of coverage by SV
I ST-DV, number of sequences targeted by the dominant variant.
j PC-DV, percentage of coverage by the dominant variant
Best shRNAs targeting sequences in more than one MSA.
| a HXB2 Coordinates | b shRNA Sequence | c Score | d Targeted MSAs | e Min_ST | f Max_ST | g Total |
|---|---|---|---|---|---|---|
| h GCCTGAAAATCCATACAATACTCC | 5 | 7,8,9,10 | 33 (7) | 741 (9) | 848 | |
| GCCTGAAAATCCATA | 6.5 | 2,9 | 6 (2) | 223(9) | 229 | |
| GCCTGAAAA | 5 | 2,9 | 4 (2) | 62 (9) | 66 | |
| n2333-2356 | AGCAGATGATACAGTA | 6 | 1,2,3,4,5,6 | 10 (1) | 18 (6) | 85 |
| AGCAGATGATACAGT | 6 | 1,2,3,4,5,6 | 23 (1) | 33 (4,6) | 174 | |
| AGCAGATGATACAGTATTAGA | 3 | 1,2,3,4,5,6 | 12 (1,2) | 15 (3,5) | 82 | |
| AGCAGATGATACAGTA | 6 | 1,2,4,5,6,10 | 6 (10) | 17 (4,5,6) | 81 | |
| AGCAGATGA | 7 | 1,2,3,4,5,6 | 21 (1,2) | 31 (3,4,5,6) | 166 | |
| AGCAGATGATACAGTATT | 6 | 1,2,3,4,5,6 | 11 (1,2) | 15 (3,4,5,6) | 82 | |
| 7 | 1,2,3,4,5,6 | 16 (1) | 27 (6) | 139 | ||
| h AGCAGATGATACAGTATTAGAAGA | 7 | 1,2,3,4,5,6,10 | 21 (10) | 920 (6) | 4854 | |
| r2556-2579 | AG | 2.5 | 9 | 1013 (9) | 1013 (9) | 1013 |
| r2574-2597 | CCAGTAAAATTAAA | 2 | 9, 10 | 60 (9) | 74 (10) | 208 |
| h CCAGTAAAATTAAAGCCAGGAATG | 3 | 9, 10 | 926 (9) | 1267 (10) | 3448 | |
| CCAGTAAAATT | 2 | 9, 10 | 48 (9) | 77 (10) | 202 | |
| r2702-2725 | GCCTGAAAATCC | 3.5 | 9 | 67 | 67 | 67 |
a Genome position according HXB2 numbering system
b In lowercase, nucleotides different from HXB2 reference genome
c Score is given by the accomplishment of specific sequence features
d Multiple sequence alignments numbered as follows:
1. POL_DNA_No_Recombinants
2. GENOME_DNA_No_Recombinants
3. POL_DNA_GroupM_Recombinants
4. GENOME_DNA_GroupM_Recombinants
5. POL_DNA_All_Subtypes
6. GENOME_DNA_All_Subtypes
7. ZDV-3TC-ABC
8. D4T-3TC-NVP
9. ZDV-3TC-EFV
10. 2299_Resistant_Isolates
e Min_ST, Minimun number of sequences targeted in an MSA. In parenthesis, the specific number of MSA, to which targeted sequences belong.
f Max_ST, Maximun number of sequences targeted in an MSA. In parenthesis, the specific number of MSA, to which targeted sequences belong.
g Total number of sequences targeted in all the MSAs
h shRNA sequence corresponds to HXB2 reference genome
nshRNAs from these regions were found in non-resistant MSA, despite some of them might target resistant viral sequences.
rshRNAs from these regions were found in resistant MSA.
Multiple Comparisons for Score, and Proportion of dominant variants.
| MSA | 2299 Resistant Isolates w1 | 2299 Resistant Isolates w2 | AZT-3TC-ABC | D4T-3TC-NVP | GENOME DNA All Subtype | GENOME DNA GroupM plus Recombinants | GENOME DNA No Recombinants | POL DNA All Subtypes | POL DNA GroupM plus Recombinants | Pol DNA No recombinants | ZDV-3TC-EFV w1 | ZDV-3TC-EFV w2 | ZDV-3TC-EFV w3 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| (A) | (B) | (C) | (D) | (E) | (F) | (G) | (H) | (I) | (J) | (K) | (L) | (M) | |
| A B K L | A B K L | A B C D K L M | A B C D K L M | A B C D K L M | A B C D K L M | A B C D K L M | A B C D K L M | A B L | A B K L | ||||
| K | K | K | K | K | K | K | K | K | A B E F G H I J K L | ||||
| M | M | M | M | M | M | M | M | C D E F G H I J L M | M | ||||
a. Weighted average of the score was used for multiple comparisons between de MSAs
b. In the comparisons of the proportion of dominant variants, number 1 represents the dominant viral variants while number 0 represents the rest of viral variants (subdominant and infrequent).
For weighted average score, a multiple comparison Student t-test was used to evaluate mean equality between each pair of groups. The MSA was assigned as the segmenting categorical variable and the score was the continuous variable for which the mean was calculated. For the comparison between pairs of proportions of dominant variants, a Z-test was used. The MSA was assigned as the segmenting categorical variable, and the proportion was assigned the categorical variable that revealed the presence or absence of the event of interest. In the second and third rows appear the corresponding letters of the groups that showed significant differences with the MSA of the column. In both cases p values were corrected with Bonferroni-Dunn test with an alpha of 0.05. See Methods, for further understanding on how weighted average scores were calculated.
Figure 2Score Distribution among MSAs. No scores under 2.0 are shown because this score value was the threshold used for selection by the algorithm. Circles indicate outlier values and stars indicate outlier extreme values.
Figure 3Proportion of dominant or most frequent viral variants. The total number of sequences is the amount of sequences that the algorithm analyzed. In the case of MSAs that have more than one window, the total number of analyzed sequences may be different. Other viral variants correspond to subdominant or totally infrequent viral sequences.
Figure 4Information Entropy and Scores correlation. The ellipses highlight the score distribution for resistant MSAs (a.) and the correlation observed for non- resistant MSAs (b.).
Figure 5Silencing Model. Targeting dominant variants from two or more regions leaves several subdominant viral variants untargeted. The optimal approach would be a cocktail of carefully selected molecules targeting dominant as well as subdominant variants from more than one conserved region. The figure shows a schematic representation of HIV-1 genome and an MSA of HIV-1 pol gene, in which the strategy of silencing is drawn. Some sequences would be targeted by two shRNAs, some just by one, and a few would not be targeted at all, but are not frequent. W1 and W2 represent the hypothetical targeted regions, where "W" stands for "window".
Figure 6shRNA diagram. (a) Schematic representation of shRNA showing important sequence features with corresponding positions in the antisense strand. (b) DNA antisense strand indicating the correction of positions with respect to antisense strand from shRNA (c) DNA vector (only shRNA is shown, partial sequence). Correction of sequence features positioning is fundamental for seeking them in MSA with the algorithm, since MSA are DNA sequences.