| Literature DB >> 34846284 |
Vo Phuoc Tuan1,2, Koji Yahara3, Ho Dang Quy Dung1, Tran Thanh Binh1, Pham Huu Tung1, Tran Dinh Tri1, Ngo Phuong Minh Thuan1, Vu Van Khien4, Tran Thi Huyen Trang5, Bui Hoang Phuc2,6, Evariste Tshibangu-Kabamba2, Takashi Matsumoto2, Junko Akada2, Rumiko Suzuki2, Tadayoshi Okimoto7, Masaaki Kodama7, Kazunari Murakami7, Hirokazu Yano8,9,10, Masaki Fukuyo11,9,10, Noriko Takahashi12,9,10, Mototsugu Kato13,14, Shin Nishiumi15,16, Takashi Azuma15, Yoshitoshi Ogura17,18, Tetsuya Hayashi17, Atsushi Toyoda19, Ichizo Kobayashi12,20,9,10, Yoshio Yamaoka2,21.
Abstract
Genome-wide association studies (GWASs) can reveal genetic variations associated with a phenotype in the absence of any hypothesis of candidate genes. The problem of false-positive sites linked with the responsible site might be bypassed in bacteria with a high homologous recombination rate, such as Helicobacter pylori, which causes gastric cancer. We conducted a small-sample GWAS (125 gastric cancer cases and 115 controls) followed by prediction of gastric cancer and control (duodenal ulcer) H. pylori strains. We identified 11 single nucleotide polymorphisms (eight amino acid changes) and three DNA motifs that, combined, allowed effective disease discrimination. They were often informative of the underlying molecular mechanisms, such as electric charge alteration at the ligand-binding pocket, alteration in subunit interaction, and mode-switching of DNA methylation. We also identified three novel virulence factors/oncoprotein candidates. These results provide both defined targets for further informatic and experimental analyses to gain insights into gastric cancer pathogenesis and a basis for identifying a set of biomarkers for distinguishing these H. pylori-related diseases.Entities:
Keywords: GWAS; Helicobacter pylori; duodenal ulcer; gastric cancer; population genomics; recombination
Mesh:
Substances:
Year: 2021 PMID: 34846284 PMCID: PMC8743543 DOI: 10.1099/mgen.0.000680
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Fig. 1.Core-genome phylogeny and metadata of the 240 strains from an hspEAsia population. Left: mid-point rooted core-genome phylogeny. Heatmap: column 1, host disease status (DU or GC). Column 2: country of isolation. Columns 3–14 correspond to the genes carrying a nucleotide associated with GC. GC, gastric cancer; DU, duodenal ulcer.
Fig. 2.Q-Q plot to assess the GWAS results. Each dot in (a) indicates a SNP, whereas that in (b) indicates a unitig. Y-axis: observed of each SNP or unitig, where P is its P-value. X-axis: expected under the null hypothesis of no association. Non-synonymous and intergenic SNPs with P<10-4 and associated with GC are presented as red and green, respectively. GC, gastric cancer; SNP, single nucleotide polymorphism.
Synonymous SNPs and less significant non-synonymous SNPs identified by GWAS.
|
-log10(P) |
Genomic position |
Locus including or closest to the SNP |
Gene name |
Description |
Function |
Variant associated with gastric cancer |
Position in the gene |
Amino acid change |
Corresponding locus in the strain 26695 |
|---|---|---|---|---|---|---|---|---|---|
|
4.7 |
485 026 |
HPF57_0479 [483 859–485079] (-) |
|
bifunctional 2-C-methyl- |
isoprenoid synthesis |
T |
54 |
– |
HP1020 |
|
4.6 |
549 551 |
HPF57_0538 [549 128–550432] (+) |
|
plasminogen binding protein |
anti-immunity |
C |
424 |
– |
HP0508 |
|
4.5 |
163 266 |
HPF57_0158 [162 843–164288] (-) |
|
L-lactate dehydrogenase to pyruvate |
anaerobic catabolism |
C |
1023 |
– |
HP0138 |
|
4.5 |
713 996 |
HPF57_0678 [711 939–714599] (+) |
|
outer membrane protein assembly factor; surface antigen D15 |
assembly of outer membrane β-barrel proteins |
C |
2058 |
– |
HP0655 |
|
4.4 |
101 839 |
HPF57_0101 [101 587–103584] (-) |
|
RNA polymerase sigma factor σ70 |
transcription initiation |
C |
1746 |
– |
HP0088 |
|
4.2 |
389 519 |
HPF57_0382 [389 375–390157] (-) |
|
NH3-dependent NAD+ synthetase |
NAD synthesis |
G |
639 |
– |
HP0329 |
|
4.2 |
598 098 |
HPF57_0574 [594 999–598511] (+) |
|
cytotoxin-associated gene A |
interferences with signal transduction |
G |
3100 |
A1034T,X,S |
HP0547 |
|
4.2 |
146 365 |
HPF57_0139 [145 564–146508] (-) |
|
human C-terminal binding protein homolog |
interference with gene expression? oncoprotein? |
G |
144 |
E48EDX |
HP0096 |
|
4.1 |
1 102 964 |
HPF57_1035 [1 102 345– 1 103 568] (+) |
|
zinc-metallo protease acting on isoprenylated protein |
|
A |
620 |
K207R |
HP0382 |
|
4.1 |
1 292 278 |
HPF57_1206 [1 292 110– 1 293 132] (-) |
|
DNA polymerase III subunit delta |
DNA replication |
G |
855 |
– |
HP1247 |
|
4.1 |
1 111 855 |
HPF57_1042 [1 111 772– 1 112 203] (+) |
|
haem interaction |
C |
84 |
– |
HP0375 | |
|
4.1 |
686 823 |
HPF57_0653 [686 461–688509] (+) |
|
modification-specific restriction |
restriction |
C |
363 |
– |
no ortholog |
position in F57 reference genome.
Genetic variations associated with gastric cancer identified by GWAS.
|
Type |
-log10(P) |
Genomic position |
Locus including or closest to the SNP |
Gene name |
Description |
Function |
Variant associated with gastric cancer |
Position in the gene |
Amino acid change |
Corresponding locus in the strain 26 695 |
|---|---|---|---|---|---|---|---|---|---|---|
|
SNP |
5.5 |
533 482 |
91 bp upstream of HPF57_0521 [532 255–533391] (-) |
|
potassium channel |
potassium conductance regulator |
G |
|
HP0490 | |
|
SNP |
5.5 |
256 111 |
HPF57_0250 [255 679–256476] (+) |
|
thiol:disulfide interchange protein |
disulfide bond formation for secretion |
A |
433 |
K145E |
HP0231 |
|
SNP |
5.2 |
155 782 |
HPF57_0151 [155 438–156307] (+) |
|
BIR, Dps/NapA, RAD21 similarity |
host interference? |
A |
345 |
K115K,X |
HP0130 |
|
SNP |
5.1 |
96 796 |
HPF57_0094 [95 423–96958] (-) |
|
chemotaxis sensor |
chemotaxis |
A |
163 |
K55E,Q |
HP0082 |
|
SNP |
5.0 |
1 459 449 |
HPF57_1382 [1 459 397–1460092] (+) |
|
outer membrane protein of OmpA family |
uptake and outer membrane structure |
T |
53 |
V53A |
HP1467 |
|
SNP |
4.7 |
96 807 |
HPF57_0094 [95 423–96958] (-) |
|
methyl-accepting chemotaxis sensor |
chemotaxis |
C |
152 |
N51S |
HP0082 |
|
SNP |
4.6 |
1 434 839 |
HPF57_1355 [1 434 577–1435371] (-) |
|
inactive Ser protease |
inhibitor of proteases/chaperones? |
G |
533 |
G178E |
HP1440 |
|
SNP |
4.3 |
904 207 |
14 bp downstream of HPF57_0865 |
|
thiamine-phosphate synthase |
supplying vitamin B1 |
A |
|
HP0776 | |
|
SNP |
4.2 |
1 296 088 |
HPF57_1209 [1 295 855–1296457] (-) |
|
cell shape determinant |
helical cell shape |
A |
370 |
N124H,Y,D |
HP1250 |
|
SNP |
4.2 |
871 135 |
HPF57_0829 [870 769–873144] (-) |
|
iron importer in outer membrane |
iron uptake |
C |
2010 |
S670X,S |
HP0807 |
|
SNP |
4.2 |
839 132 |
29 bp upstream of HPF57_0798 [838 879–839103] (-) |
|
RNA polymerase subunit omega |
prophage silencing? stringenet control? |
A |
|
HP0915 | |
|
motif |
6.9 |
498631–498661 |
HPF57_0490 [497 344–498975] (-) |
|
Type I restriction enzyme M protein |
DNA methyltransferase |
not TAACGATAAC GATTTACACCT AAAGCTAGAC |
315–345 |
multiple e.g. D115D,X |
HP0463 |
|
motif |
5.6 |
1522425–1522455 |
HPF57_1436 [1 522 369–1524234] (-) |
|
DNA recombinase |
|
not ATTGACTTAG CCAAAGATGA AAACATTATCG |
1780–1810 |
multiple |
HP1523 |
|
motif |
5.5 |
978037–978077 |
HPF57_0925 [978 055–980244] (-) |
|
iron importer in outer membrane |
|
TTGAAATTTC TTATAAGTTT TAATAATGGA TCTAAAAATGA |
2168- C terminus-18bp downstream |
multiple |
HP_0915 |
position in F57 reference genome.
∗designated in this work.
isp, inactive serine protease; triH, triple halves.
Fig. 3.Manhattan plot summarizing the GWAS results. The nonsynonyous SNPs, integenic SNPs, and DNA motifs associated with gastric cancer are colored in red, light green, and orange, respectively. The black dots corresponds to all the other SNPs used in the SNP-GWAS.
Fig. 4.Predicted structures of proteins with discriminatory non-synonymous SNPs. (a) TlpC (HPF57_0094). (i) Model on the homolog of strain 26695 (PDB 5wbf) [38]. K55 in HPF57_0094 corresponds to E217 in HP0082, which is split into two genes in the Japanese reference strain F57. (ii)–(iii) Surface electric charges. E55 mutant protein was generated from the model by mutagenesis in silico (PyMOL). (iv) N51 has direct interaction with lactose. (v) S51 from mutagenesis (PyMOL). S51 is farther from lactose and would accommodate larger ligands. (b) HsdM (HPF57_0490). (i) Reaction steps of a Type I modification enzyme [41]. (ii) Model on 5ybb in PDB, two methyltransferase molecules, each 2M+1S, of . D115 corresponds to L95. (iii)–(v) Model on 2y7h in PDB, a model of EcoKI methyltransferase based on EMD-1534 [74]. D115 corresponds to R72. SNP, Single nucleotide polymorphism.
Fig. 5.Predicted structures of three new virulence factor/oncoprotein candidates. (a) Isp (inactive Ser protease). (i) F57_1355 modelled on E. coli DegP. (ii) Active site of F57_1355 modelled on 3lgi.1 in PDB (E. coli DegS). The three amino acids (HDS triad) responsible for activity are all replaced. (iii)–(v) Surface electric charge distribution in E. coli DegS without PDZ [75] (3lgi.1 in PDB), HPF57_1355 modelled on it, and the E178G mutant generated in silico. (b) TriH, Triple halves. HPF57_0151 (HP0130). (i) Map. ‘Disordered’ is from UniProt. Nuclear localization signal is by cNLS Mapper. ‘Diversifying selection’ is from a previous study [49]. (ii)–(iv) Similarity to three half domains. (iii) Modelled on NapA (strain YS39, 4evd in PDB) and aligned with iron-soaked NapA (YS39, 3ta8 in PDB). Fe-interacting residues as well as the GWAS residues are in sticks. The 2c6r in PDB is Dps2 in . Note the difference in NapA coordinates in the literature [48, 76]. (iv) HPF57_0151 modelled on PDB 4pjw (human).