| Literature DB >> 19893742 |
Youlin Xia1, Yoshio Yamaoka, Qi Zhu, Ivan Matha, Xiaolian Gao.
Abstract
Chronic Helicobacter pylori infection is known to be associated with the development of peptic ulcer, gastric cancer and gastric lymphoma. Currently, the bacterial factors of H. pylori are reported to be important in the development of gastroduodenal diseases. CagA protein, encoded by the cagA, is the best studied virulence factor of H. pylori. The pathogenic CagA protein contains a highly polymorphic Glu-Pro-Ile-Tyr-Ala (EPIYA) repeat region in the C-terminal. This repeat region is reported to be involved in the pathogenesis of gastroduodenal diseases. The segments containing EPIYA motifs have been designated as segments A, B, C, and D; however the classification and disease relation are still unclear. This study used 560 unique CagA sequences containing 1,796 EPIYA motifs collected from public resources, including 274 Western and 286 East Asian strains with clinical data obtained from 433 entries. Fifteen types of EPIYA or EPIYA-like sequences are defined. In addition to four previously reported major segment types, several minor segment types (e.g., segment B', B'') and more than 30 sequence types (e.g., ABC, ABD) were defined using our classification method. We confirm that the sequences from Western and East Asian strains contain segment C and D, respectively. We also confirm that strains with two EPIYA segment C have a greater chance of developing gastric cancer than those with one segment C. Our results shed light on the relationships between the types of CagAs, the country of origin of each sequence type, and the frequency of gastric disease.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19893742 PMCID: PMC2768901 DOI: 10.1371/journal.pone.0007736
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Definitions of segments around EPIYA motif (EPIYA or EPIYA-like sequences).
The upper sequences are typical CagA sequences with Western type and the lower sequences are typical CagA sequences with East Asian type. Segments A, B, B′, and B′′have subscripts C and D, indicating that the sequences containing segments A, B, B′, and B′′ contain segments C and D, respectively. For example, the notation EPIYA-AC signifies segment A from a CagA sequence containing the segment C.
Frequencies of the 15 types of EPIYA motifs.
| Motif | EPIYA | EPIYT | ESIYA | ESIYT | EPIYV | EHIYA | ELIYA | EPVYA |
| Freq. | 1657 | 92 | 24 | 7 | 3 | 2 | 2 | 2 |
| Motif | EPIYD | EPIYS | EPKYA | EPRYA | ETIYA | KPIYA | NPIYA | Total |
| Freq. | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1,796 |
Representative segments of EPIYA motifsa.
| Type | Freq. | Representative sequence |
| AC | 272 | KELNAKLGNFNNNNNNGLKN..EPIYAKVNKKK |
| AD | 295 | KELNEKLFGNSNNNNNGLKNNTEPIYAQVNKKK |
| BC | 262 | TGQVASPEEPIYAQVAKKVNAKIDRLNQIASGLGGVGQAAG |
| BD | 281 | TGQATSPEEPIYAQVAKKVSAKIDQLNEATS |
| C | 343 | FPLKRHDKVDDLSKVGRSVSPEPIYATIDDLGGP |
| D | 284 | AINRKIDRINKIASAGKGVGGFSGAGRSASPEPIYATIDFDEAN |
| B′C | 10 | AGQAASPEEPIYAKVNKKK |
| B′D | 14 | AGQATSPEEPIYAQVNKKK |
| B′′D | 19 | AINRKIDRINKIASAGKGVGGFSGAGRSANPEPIYAQVARKVSA-KIDQLNEATS |
| Total | 1,780 |
Note: the values in the table are the frequencies of similar sequences, not the number of identical sequences within a sequence type. Other segments of 16 EPIYA motifs are listed in Table S1.
Frequencies of the 32 sequence typesa.
| Seq. Type | Freq. | Seq. Type | Freq. | Seq. Type | Freq. | Seq. Type | Freq. |
| ABD | 240 | AB′-ABD | 4 | C | 2 | ABCCCC | 1 |
| ABC | 167 | A-D | 4 | A | 1 | A-B″D | 1 |
| ABCC | 51 | A-ABD | 3 | AB′B′BC | 1 | AB-D | 1 |
| ABB″D | 16 | AB-ABD | 2 | ABB″BD | 1 | ABD-ABD | 1 |
| AB | 15 | AB′B′BD | 2 | AB′BCC | 1 | ABD-BD | 1 |
| ABCCC | 10 | AB′BD | 2 | AB′-C | 1 | ABD-D | 1 |
| AB′BC | 6 | ABCCCCC | 2 | AB-C | 1 | A-CCC | 1 |
| A-C | 5 | AB′D | 2 | ABCB″CC | 1 | CC | 1 |
All sequence types are listed in Table S3. Other sequence types are listed in Table S4.
Two most frequent EPIYA segmentsa.
| Segment | Ratio | |
| AC | KELNAKLGNFNNNNNNGLKN..EPIYAKVNKKK | 53/272 |
| AC | KELNAKLGNFNNNNNNGLKNSTEPIYAKVNKKK | 22/272 |
| AD | KELNEKLFGNSNNNNNGLKNNTEPIYAQVNKKK | 53/272 |
| AD | XXXXXKLFGNSNNNNNGLKNNTEPIYAQVNKKK | 22/272 |
| BC | TGQVASPEEPIYAQVAKKVNAKIDRLNQIASGLGGVGQAAG | 25/262 |
| BC | AGQAASPEEPIYAQVAKKVNAKIDRLNQIASGLGGVGQAAG | 19/262 |
| BD | TGQATSPEEPIYAQVAKKVSAKIDQLNEATS | 25/262 |
| BD | TGQVASPEEPIYAQVAKKVSAKIDQLNEATS | 19/262 |
| C | FPLKRHDKVDDLSKVGRSVSPEPIYATIDDLGGP | 144/343 |
| C | FPLKRHDKVDDLSKVGRAVSPEPIYATIDDLGGP | 50/343 |
| D | AINRKIDRINKIASAGKGVGGFSGAGRSASPEPIYATIDFDEAN | 144/343 |
| D | AINRKIDRINKIASAGKGVGGFSGAGRSASPEPIYATIDFDETN | 50/343 |
X represents unknown amino acids; the amino acids which are different in two sequences shown are highlighted; Ratio = (Frequency of the type)/(Total frequency).
Figure 2WebLogos of aligned segments of EPIYA-A, -B, and -C/D.
The numbers of sequences for each WebLogo are indicated. The sequences were aligned using BioEdit. Z represents space inserted by BioEdit and X represents unknown amino acids.
Frequency of CagAs with respect to countrya.
| Country | total # | # of seq. containing EPIYA-C | # of seq. containing EPIYA-D |
| Japan | 249 | 21 | 228 |
| China | 48 | 4 | 44 |
| Korea | 6 | 1 | 5 |
| Viet Nam | 4 | 0 | 4 |
| Thailand | 5 | 2 | 3 |
| Malaysia | 3 | 2 | 1 |
| Iran | 5 | 5 | 0 |
| India | 4 | 4 | 0 |
| Kazakhstan | 3 | 3 | 0 |
| Greece | 100 | 100 | 0 |
| Italy | 34 | 34 | 0 |
| Sweden | 5 | 5 | 0 |
| Ireland | 3 | 3 | 0 |
| USA | 22 | 22 | 0 |
| Costa Rica | 33 | 33 | 0 |
| Colombia | 24 | 24 | 0 |
Austria, Chile, and Germany each have one strain. The country information of 11 sequences or strains is not available.
Frequency and percentage of strains of certain type diseasea.
| Disease | G | DU | GU | GC | E | MALT | Total |
| Occurrence | 181 | 90 | 43 | 87 | 21 | 5 | 433 |
| Percentage | 42% | 21% | 10% | 20% | 5% | 1% | 100% |
The diseases are designated in the text.
EPIYA types and clinical outcomesa.
| Total | G | PU | GC | |
| ABC | 129 | 65, 50%, 1.0 | 42, 33%, 1.0 | 22, 17%, 1.0 |
| ABD | 168 | 66, 39%, 0.8 | 64, 38%, 1.2 | 38, 23%, 1.3 |
| ABCC | 43 | 18, 42%, 0.8 | 8, 19%, 0.6 | 17, 40%, 2.4 |
PU = DU + GU. Other diseases are designated in the text. The strains with unavailable disease information are not included.