| Literature DB >> 19580659 |
Sinu Paul1, Helen Piontkivska.
Abstract
BACKGROUND: Studies have shown that in the genome of human immunodeficiency virus (HIV-1) regions responsible for interactions with the host's immune system, namely, cytotoxic T-lymphocyte (CTL) epitopes tend to cluster together in relatively conserved regions. On the other hand, "epitope-less" regions or regions with relatively low density of epitopes tend to be more variable. However, very little is known about relationships among epitopes from different genes, in other words, whether particular epitopes from different genes would occur together in the same viral genome. To identify CTL epitopes in different genes that co-occur in HIV genomes, association rule mining was used.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19580659 PMCID: PMC2716299 DOI: 10.1186/1742-4690-6-62
Source DB: PubMed Journal: Retrovirology ISSN: 1742-4690 Impact factor: 4.602
List of 62 HIV-1 reference sequences (including 44 non-recombinant sequences, grouped by subtypes, and 18 circulating recombinant forms (CRFs) included in the study (2005 subtype reference set of the HIV sequence database, Los Alamos National Laboratory).
| Subtype | Sequence name | Subtype | Sequence name |
| A1 | A1.KE.94.Q23_17.AF004885 | J | J.SE.93.SE7887.AF082394 |
| A1.SE.94.SE7253.AF069670 | J.SE.94.SE7022.AF082395 | ||
| A1.UG.92.92UG037.U51190 | K | K.CD.97.EQTB11C.AJ249235 | |
| A1.UG.98.98UG57136.AF484509 | K.CM.96.MP535.AJ249239 | ||
| A2 | A2.CD.97.97CDKTB48.AF286238 | O | O.BE.87.ANT70.L20587 |
| A2.CY.94.94CY017_41.AF286237 | O.CM.91.MVP5180.L20571 | ||
| B | B.FR.83.HXB2-LAI-IIIB-BRU.K03455 | O.CM.98.98CMU2901.AY169812 | |
| B.NL.00.671_00T36.AY423387 | O.SN.99.SEMP1300.AJ302647 | ||
| B.TH.90.BK132.AY173951 | N | N.CM.02.DJO0131.AY532635 | |
| B.US.98.1058_11.AY331295 | N.CM.95.YBF30.AJ006022 | ||
| C | C.BR.92.BR025-d.U52953 | N.CM.97.YBF106.AJ271370 | |
| C.ET.86.ETH2220.U46016 | |||
| C.IN.95.95IN21068.AF067155 | CRFs | 01_AE.TH.90.CM240.U54771 | |
| C.ZA.04.SK164B1.AY772699 | 02_AG.NG.-.IBNG.L39106 | ||
| D | D.CD.83.ELI.K03454 | 03_AB.RU.97.KAL153_2.AF193276 | |
| D.CM.01.01CM_4412HAL.AY371157 | 04_CPX.CY.94.CY032.AF049337 | ||
| D.TZ.01.A280.AY253311 | 05_DF.BE.-.VI1310.AF193253 | ||
| D.UG.94.94UG114.U88824 | 06_CPX.AU.96.BFP90.AF064699 | ||
| F1 | F1.BE.93.VI850.AF077336 | 07_BC.CN.97.CN54.AX149771 | |
| F1.BR.93.93BR020_1.AF005494 | 08_BC.CN.97.97CNGX_6F.AY008715 | ||
| F1.FI.93.FIN9363.AF075703 | 09_CPX.GH.96.96GH2911.AY093605 | ||
| F1.FR.96.MP411.AJ249238 | 10_CD.TZ.96.96TZ_BF061.AF289548 | ||
| F2 | F2.CM.02.02CM_0016BBY.AY371158 | 11_CPX.GR.-.GR17.AF179368 | |
| F2.CM.95.MP255.AJ249236 | 12_BF.AR.99.ARMA159.AF385936 | ||
| F2.CM.95.MP257.AJ249237 | 13_CPX.CM.96.1849.AF460972 | ||
| F2.CM.97.CM53657.AF377956 | 14_BG.ES.99.X397.AF423756 | ||
| G | G.BE.96.DRCBL.AF084936 | 15_01B.TH.99.99TH_MU2079.AF516184 | |
| G.KE.93.HH8793_12_1.AF061641 | 16_A2D.KR.97.97KR004.AF286239 | ||
| G.NG.92.92NG083.U88826 | 18_CPX.CM.97.CM53379.AF377959 | ||
| G.SE.93.SE6165.AF061642 | 19_CPX.CU.99.CU38.AY588970 | ||
| H | H.BE.93.VI991.AF190127 | ||
| H.BE.93.VI997.AF190128 | |||
| H.CF.90.056.AF005496 |
The last number in each sequence name is the GenBank accession number.
Figure 1Criteria for the inclusion of CTL epitopes. The longer CTL epitope was selected from completely overlapping epitopes if they did not harbor any amino acid sequence differences among them, whereas both epitopes were included if at least one amino acid difference existed.
Summary of the discovered CTL epitope association rules.
| Number of epitope associations with support >= 0.75 * & confidence >= 0.95 | 1961 | 1095 | 1867 | 1944 |
| Unique epitope associations# | ||||
| Associations with 2 epitopes $ | 46 | 48 | 45 | 46 |
| Associations with 3 epitopes | 217 | 166 | 71 | 217 |
| Associations with 4 epitopes | 153 | 102 | 59 | 151 |
| Associations with 5 epitopes | 59 | 26 | 27 | 58 |
| Associations with 6 epitopes | 9 | 2 | 7 | 9 |
| Associations with 7 epitopes | 0 | 0 | 1 | 0 |
| 484 | 344 | 210 | 481 | |
| Unique epitope associations with epitopes from only one gene | ||||
| Epitopes from | 9 | 12 | 3 | 9 |
| Epitopes from | 94 | 81 | 47 | 94 |
| Epitopes from | 0 | 0 | 0 | 0 |
| 103 | 93 | 50 | 103 | |
| Unique epitope associations with epitopes from two genes | ||||
| 329 | 234 | 145 | 326 | |
| 26 | 11 | 7 | 26 | |
| 3 | 1 | 1 | 3 | |
| 358 | 246 | 153 | 355 | |
| Unique epitope associations with epitopes from all three genes ( | 23 | 5 | 7 | 23 |
* Total number of associations includes all identified association rules that had a minimum support of 75% and 95% confidence. For CRFs, the support is 95%.
# "Unique" rules combine associations between the same epitopes into a single, "unique" rule regardless of the order of epitopes within a rule (i.e., A occurs with B and B occurs with A are considered the same "unique" rule).
$ i.e., association rules that involve two distinct CTL epitopes.
Figure 2Twenty-three association rules that include epitopes from three genes, and the respective amino acid sequences of the involved CTL epitopes (level of support >= 75%, confidence >= 95%), identified in 62 reference sequences of HIV-1 genomes (including 18 CRFs). Amino acid coordinates within each gene (Gag, Pol or Nef) are given relative to the epitope position in the HXB2 reference sequence (GenBank accession number K03455). Each line corresponds to a single association rule, and dashes designate amino acid sites that are NOT involved in the association rule. Drawn not to scale, "//" marks long stretches of non-included amino acid residues, and | indicates the border of a protein-coding gene. The numbers on the right side indicate the presence of the respective epitope association in other data sets: 1: 44-non-CRFs, 2: 18-CRFs and 3: Both 44-non-CRFs and 18-CRFs.
Figure 3Venn diagram showing the number of epitope association rules involving each gene. Out of the 484 unique epitope associations, there were 9 associations in which epitopes from the Gag gene only (shown in red) were involved and 94 from the Pol gene only(blue). There was no association in which epitopes from solely Nef (green) were involved. There were 329 associations in which epitopes from Gag and Pol took part, whereas in 26 associations epitopes were only from the Nef and Pol genes, and in 3 associations epitopes were only from the Gag and Nef genes. There were 23 associations in which epitopes from all three genes were involved.
Average pairwise dN and dS values estimated at non-epitope and CTL epitope regions.
| dN | SE# | dS | SE | P value * | |
| CTL epitopes involved in association rules | 0.01696 | 0.00982 | 0.37794 | 0.20974 | < 0.01 |
| CTL epitopes not involved in association rules | 0.12168 | 0.06814 | 0.50929 | 0.18780 | < 0.01 |
| Non-epitope regions | 0.14698 | 0.10288 | 0.53472 | 0.12572 | < 0.01 |
This involves all CTL epitopes and non-epitope regions from all the HIV-1 genomic sequences included in the study. CTL Epitope regions are divided into those involved in association rules and those not involved.
# Standard errors were estimated with 100 bootstrap replications in MEGA4.
* In pairwise t-tests, the null hypothesis of dS = dN was rejected in all three comparisons.
Properties of 22 CTL epitopes that frequently co-occur together in the reference HIV-1 genomes (per the 62-all sequence set).
| Start | End | |||||||
| Gag | p24 | 1 | SPRTLNAWV | B*0702 | 16 | 24 | 2 | 2 |
| 2 | SEGATPQDL# | B*4001 | 44 | 52 | 127 | 72 | ||
| 3 | GHQAAMQML# | B*1510, B*3901 | 61 | 69 | 214 | 110 | ||
| 4 | KRWIILGLNK## | B*2705 | 131 | 140 | 7 | 95 | ||
| GLNKIVRMY | B*1501 | 137 | 145 | 195 | ||||
| VRMYSPVSI | Cw18 | 142 | 150 | 1 | ||||
| Pol | RT | 5 | IETVPVKL | B*4001 | 5 | 12 | 9 | 9 |
| 6 | KLVDFRELNK | A*0301 | 73 | 82 | 188 | 100 | ||
| 7 | GIPHPAGLK | A*0301 | 93 | 101 | 124 | 73 | ||
| 8 | TVLDVGDAY### | B*3501 | 107 | 115 | 38 | 31 | ||
| 9 | NETPGIRYQY | B18 | 137 | 146 | 14 | 9 | ||
| IRYQYNVL | B*1401 | 142 | 149 | 11 | ||||
| 10 | LVGKLNWASQIY | B*1501 | 260 | 271 | 112 | 51 | ||
| KLNWASQIY | A*3002 | 263 | 271 | 114 | ||||
| RT-RNase | 11 | IVTDSQYAL | Cw*0802 | 495 | 503 | 149 | 69 | |
| VTDSQYALGI | B*1503 | 496 | 505 | 153 | ||||
| RT-Integrase | 12 | LFLDGIDKA | B81 | 560 | 8 | 121 | 68 | |
| Integrase | 13 | KTAVQMAVF | B*5701 | 173 | 181 | 52 | 39 | |
| AVFIHNFKRK | A*0301, A*1101 | 179 | 188 | 15 | ||||
| FKRKGGIGGY | B*1503 | 185 | 194 | 4 | ||||
| 14 | VPRRKAKII | B42 | 260 | 268 | 2 | 2 | ||
| Nef | 15 | FLKEKGGL### | B*0801 | 90 | 97 | 52 | 34 | |
These epitopes are harbored by 15 different protein-coding genomic regions. For each epitope and genomic region the number of "unique" association rules (as defined in Table 2) is shown, as well as the corresponding HLA alleles that recognize that particular CTL epitope.
* HLA alleles as defined in the HIV Molecular Immunology Database, Los Alamos National Laboratory.
#, ## and ### designate potentially promiscuous CTL epitopes that are: (#) recognized by alternative HLA alleles, (##) potentially embedded, or (###) shared between allele pairs (per [59]).