Literature DB >> 28078297

Comparative Genomics of H. pylori and Non-Pylori Helicobacter Species to Identify New Regions Associated with Its Pathogenicity and Adaptability.

De-Min Cao¹, Qun-Feng Lu², Song-Bo Li¹, Ju-Ping Wang³, Yu-Li Chen³, Yan-Qiang Huang³, Hong-Kai Bi⁴.

Abstract

The genus Helicobacter is a group of Gram-negative, helical-shaped pathogens consisting of at least 36 bacterial species. Helicobacter pylori (H. pylori), infecting more than 50% of the human population, is considered as the major cause of gastritis, peptic ulcer, and gastric cancer. However, the genetic underpinnings of H. pylori that are responsible for its large scale epidemic and gastrointestinal environment adaption within human beings remain unclear. Core-pan genome analysis was performed among 75 representative H. pylori and 24 non-pylori Helicobacter genomes. There were 1173 conserved protein families of H. pylori and 673 of all 99 Helicobacter genus strains. We found 79 genome unique regions, a total of 202,359bp, shared by at least 80% of the H. pylori but lacked in non-pylori Helicobacter species. The operons, genes, and sRNAs within the H. pylori unique regions were considered as potential ones associated with its pathogenicity and adaptability, and the relativity among them has been partially confirmed by functional annotation analysis. However, functions of at least 54 genes and 10 sRNAs were still unclear. Our analysis of protein-protein interaction showed that 30 genes within them may have the cooperation relationship.

Entities: CellLine Chemical Disease Species

Mesh：

Substances：
Bacterial Proteins

Year: 2016 PMID： 28078297 PMCID： PMC5203880 DOI： 10.1155/2016/6106029

Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411

1. Introduction

H. pylori is a Gram-negative, spiral-shaped epsilon-proteobacterium. It colonizes 50% of the world's human population, even as high as 80% in developing countries, making it one of the most successful pathogens [1, 2]. This bacterium can cause gastrointestinal disease, such as gastritis, peptic ulcer disease, gastric adenocarcinoma, and mucosa-associated lymphoid tissue (MALT) lymphoma [3-5]. As research continues, a great number of non-pylori Helicobacter species (NPHS) inhabiting in a wide variety of human beings, mammals, and birds have been found [6]. Until now, there are at least 36 species of the Helicobacter genus that have been studied (http://www.bacterio.net/helicobacter.html). The Helicobacter genus strains have been detected in more than 142 vertebrate species [7]. Among them, H. pylori is the major pathogenic bacterium in human beings. Besides H. pylori, some NPHS were also found to associate with human body function disorders [8]. For instance, H. heilmannii, H. winghamensis, H. pullorum, and H. canis were considered as causative agent of stomach and intestinal diseases [9-11]. Many genome regions of H. pylori, involved in the mechanism of pathogenesis and adaption to the host environment, have been identified and studied. The well-known Cag-pathogenicity island, an approximately 40 kb DNA region that encodes type IV secretion system (T4SS) and effector molecule cancer-associated gene toxin (cagA), has been proved to play a significant role in pathogenicity [12, 13]. The urea enzymes encoded by urease gene cluster can catalyze the hydrolysis of urea to ammonium and carbon dioxide. It is an influential colonization factor and contributes to gastric acid resistance [14]. Vacuolating cytotoxin (VacA) is a pore-forming toxin that implicates in altering host cell biology, including autophagy, apoptosis, cell vacuolation, and inhibition of T-cell proliferation [15-17]. In the past two decades, the whole genome of H. pylori and NHPS have been widely sequenced, which give us a more open field of version to study its pathogenicity and adaption mechanism. Previous studies indicated that H. pylori has a high rate of gene recombination and unusual genetic flexibility, and those traits were considered to be helpful for the adaption to the dynamic environment [18, 19]. Even though massive virulence factors of them have been studied, the mechanisms that the essential genome components of H. pylori lead to its large scale epidemic and gastrointestinal environment adaptation within human beings remain to be further elucidated. In this study, comparative analysis of whole genome was made to reveal general character and characteristics of Helicobacter genus [20]. H. pylori and NHPS genomes that are available on public databases were used in the analysis. We intended to identify potential regions of H. pylori genomes that are responsible for its epidemicity and adaptability. In addition, comparative genome analysis among Helicobacter genus species can give a comprehensive insight into the genomic diversity in each species and help us to understand the relationship well among them.

2. Materials and Methods

2.1. Data Selection and Management

Helicobacter genus involves at least 36 species, while H. pylori is given more prominence for medicine. There are multiple complete genomes of them available on public databases, and the genomic data was acquired from NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/) in this study. 99 genomes were selected, including 75 complete H. pylori genomes and 24 NPHS genomes, which belong to 19 species (released at the analysis time). To ensure the accuracy and consistency of initial data, chromosome, plasmids, and scaffolds of each candidate strain were concatenated by sequence “NNNNNCATTCCATTCATTAATTAATTAATGAATGAATGNNNNN” to establish a pseudochromosome for further analysis [21]. In order to get the accordance dataset and avoid contradiction that was caused by difference of the gene prediction method applied in different projects, a single gene finding program, Glimmer version 3.02 [22], was used to predict open reading frames (ORFs). The ORFs were removed while their start or end position was inside the sideward sequence. The predicted results and raw databases information were corroborated to one another. And the program RNAmmer-1.2 [23] was used to predict full length of rRNA gene sequences. The size, GC content, number of genes, source, and other characteristics of all selected genomes were listed in Table 1.

Table 1

Helicobacter species genomic information used in the present study.

Organism	Size (bp)	GC (%)	Scaffolds	Plasmids	CDS	rRNA	tRNA	Natural host
H. acinonychis str. Sheeba	1557588	38.17	1	1	1706	6	36	Cheetah, tiger
H. ailurogastricus	1578404	47.05	9	0	1633	3	36	Feline
H. bilis ATCC 43879	2530521	34.7	9	0	2728	3	36	Mice
H. bilis WiWa	2559659	34.68	17	0	2751	9	40	Mice
H. bizzozeronii CIII-1 ^∗	1807534	45.66	1	1	1998	6	36	Dog, cat
H. canadensis MIT 98-5491 ^∗	1623845	33.69	1	0	1624	9	40	Barnacle, geese, rodent
H. canis NCTC 12740 ^∗	1932823	44.82	1	0	1914	6	39	Dogs
H. cetorum MIT 00-7128	1960111	34.53	1	1	1897	6	38	dolphin, whale
H. cetorum MIT 99-5656	1847790	35.54	1	1	1852	6	36	dolphin, whale
H. cinaedi CCUG 18818 ATCC BAA-847 ^∗	2240130	38.34	1	0	2510	6	39	Human
H. cinaedi PAGU611 ^∗	2101402	38.55	1	1	2329	6	39	Human
H. felis ATCC 49179 ^∗	1672681	44.51	1	0	1776	5	36	Cat, dog, rabbit, cheetah
H. fennelliae MRY12-0050 ^∗	2155647	37.9	49	0	2503	3	38	Human
H. heilmannii ASB1.4 ^∗	1804601	47.38	1	0	2113	7	41	Human
H. hepaticus ATCC 51449	1799146	35.93	1	0	1863	3	37	Mice
H. himalayensis strain YS1	1829936	39.89	1	0	1896	6	39	Marmota himalayana
H. macacae MIT 99-5501	2369528	40.41	4	0	2669	6	39	Macaques
H. mustelae 12198	1578097	42.47	1	0	1745	6	38	Ferret
H. pametensis ATCC 51478	1435066	40.08	11	0	1432	8	38	Pig, bird
H. pullorum 229313-12 ^∗	1691799	34.56	60	0	1754	3	36	Poultry
H. pullorum MIT 98-5489 ^∗	1951667	33.58	44	0	2105	3	36	Poultry
H. suis HS1 ^∗	1635292	39.91	136	0	1814	5	38	Pig, macaque
H. typhlonius	1920832	38.85	1	0	2109	6	39	Mouse
H. winghamensis ATCC BAA-430 ^∗	1690216	34.74	21	0	1742	3	36	human liver
H. pylori 2017	1548238	39.3	1	0	1595	3	36	Human
H. pylori 2018	1562832	39.29	1	0	1604	3	36	Human
H. pylori 26695-1CH	1667302	38.87	1	0	1667	7	36	Human
H. pylori 26695-1CL	1667239	38.87	1	0	1667	7	36	Human
H. pylori 26695-1	1667638	38.87	1	0	1667	7	36	Human
H. pylori 26695-1MET	1667303	38.87	1	0	1669	7	36	Human
H. pylori 26695	1667867	38.87	1	0	1681	7	36	Human
H. pylori 29CaP	1667159	38.81	1	0	1704	7	36	Human
H. pylori 35A	1566655	38.87	1	0	1583	6	36	Human
H. pylori 51	1589954	38.77	1	0	1606	6	36	Human
H. pylori 52	1568826	38.94	1	0	1578	6	36	Human
H. pylori 7C	1631276	39.01	1	1	1627	7	36	Human
H. pylori 83	1617426	38.72	1	0	1634	6	36	Human
H. pylori 908	1549666	39.3	1	0	1605	3	36	Human
H. pylori Aklavik117	1636125	38.73	1	2	1607	6	36	Human
H. pylori Aklavik86	1507930	39.21	1	2	1487	6	36	Human
H. pylori B38	1576758	39.16	1	0	1582	7	36	Human
H. pylori B8	1680029	38.78	1	1	1673	6	36	Human
H. pylori BM012A	1660425	38.88	1	0	1679	7	36	Human
H. pylori BM012B	1659060	38.88	1	0	1676	7	36	Human
H. pylori BM012S	1660469	38.88	1	0	1683	7	36	Human
H. pylori BM013A	1604233	38.96	1	0	1584	7	36	Human
H. pylori BM013B	1604212	38.96	1	0	1586	7	36	Human
H. pylori Cuz20	1635449	38.86	1	0	1616	6	36	Human
H. pylori ELS37	1669876	38.88	1	1	1676	6	36	Human
H. pylori F16	1575399	38.88	1	0	1593	6	36	Human
H. pylori F30	1579693	38.8	1	1	1582	6	36	Human
H. pylori F32	1581461	38.86	1	1	1587	6	36	Human
H. pylori F57	1609006	38.73	1	0	1619	6	36	Human
H. pylori G27	1663013	38.87	1	1	1672	7	36	Human
H. pylori Gambia9424	1712468	39.12	1	1	1694	6	36	Human
H. pylori Hp238	1586473	38.7	1	0	1616	5	36	Human
H. pylori HPAG1	1605736	39.07	1	1	1595	6	36	Human
H. pylori HUP-B14	1607584	39.04	1	1	1597	6	36	Human
H. pylori India7	1675918	38.9	1	0	1664	6	36	Human
H. pylori J166	1650561	38.93	1	0	1630	6	36	Human
H. pylori J99	1643831	39.19	1	0	1629	6	36	Human
H. pylori Lithuania75	1640673	38.87	1	1	1659	6	36	Human
H. pylori ML1	1629815	38.69	1	0	1701	6	36	Human
H. pylori ML2	1562125	38.92	1	0	1764	6	36	Human
H. pylori ML3	1635334	38.64	1	1	1744	4	36	Human
H. pylori NY40	1696917	38.81	1	0	1751	6	36	Human
H. pylori OK113	1616617	38.73	1	0	1649	6	36	Human
H. pylori OK310	1595436	38.77	1	1	1595	6	36	Human
H. pylori oki102	1633212	38.81	1	0	1630	6	36	Human
H. pylori oki112	1637925	38.81	1	0	1635	6	36	Human
H. pylori oki128	1553826	38.97	1	0	1565	6	36	Human
H. pylori oki154	1599700	38.8	1	0	1626	6	36	Human
H. pylori oki422	1634852	38.83	1	0	1641	6	36	Human
H. pylori oki673	1595058	38.82	1	0	1623	6	36	Human
H. pylori oki828	1600345	38.8	1	0	1618	6	36	Human
H. pylori oki898	1634875	38.83	1	0	1612	6	36	Human
H. pylori P12	1684038	38.79	1	1	1688	6	36	Human
H. pylori PeCan18	1660685	39.02	1	0	1629	6	36	Human
H. pylori PeCan4	1638269	38.91	1	1	1622	6	36	Human
H. pylori Puno120	1637762	38.9	1	1	1617	6	36	Human
H. pylori Puno135	1646139	38.82	1	0	1616	6	36	Human
H. pylori Rif1	1667883	38.87	1	0	1678	7	36	Human
H. pylori Rif2	1667890	38.87	1	0	1674	7	36	Human
H. pylori Sat464	1567570	39.09	1	1	1553	6	36	Human
H. pylori Shi112	1663456	38.77	1	0	1651	6	36	Human
H. pylori Shi169	1616909	38.86	1	0	1593	6	36	Human
H. pylori Shi417	1665719	38.77	1	0	1623	6	36	Human
H. pylori Shi470	1608548	38.91	1	0	1612	6	36	Human
H. pylori SJM180	1658051	38.9	1	0	1640	6	36	Human
H. pylori SNT49	1610830	39	1	1	1599	6	36	Human
H. pylori SouthAfrica20	1622903	38.57	1	0	1701	6	36	Human
H. pylori SouthAfrica7	1679829	38.42	1	1	1689	6	36	Human
H. pylori UM032	1593537	38.82	1	0	1613	6	36	Human
H. pylori UM037	1692794	38.89	1	0	1708	6	36	Human
H. pylori UM066	1658047	38.62	1	0	1651	6	36	Human
H. pylori UM298	1594544	38.82	1	0	1618	6	36	Human
H. pylori UM299	1594569	38.82	1	0	1617	6	36	Human
H. pylori v225d	1595604	38.94	1	1	1608	6	36	Human
H. pylori XZ274	1656544	38.57	1	1	1798	7	36	Human

Note: (1) NPHS associated with gastric disease in humans.

(2) Latin name, genome size, GC-content, scaffolds number, plasmid number, information of genes, and natural host are listed.

2.2. Phylogenetic Analysis of 16S rRNA

In order to better understand the phylogenetic relationships among Helicobacter species, a phylogenetic tree was constructed using the 16S rRNA genes obtained from the 99 genomes. In addition, Campylobacter jejuni and Campylobacter fetus were used as outgroup. Multiple sequence alignment of 101 16S rRNA genes was performed using MAFFT version 7.123b [24]. The phylogenetic tree was inferred by the Neighbor-Joining method [25] using MEGA7 [26]. To estimate the consensus tree, 1000-bootstrap resampling was done.

2.3. Cluster Analysis of Core and Pan Genome

Orthologous group analyses were performed with software OrthoMCL version 2.0.9 [27], which could generate a similarity matrix normalized by species representation relationship of sequences, and it was then grouped using the Markov Clustering Algorithm (MCL) [28]. All-against-all BLASTP comparisons were used to get pair sequences of protein dataset in OrthoMCL at start. An E-value cutoff of 1e − 5 and the aligned sequence length longer than the coverage of 50% of a query sequence was chosen to perform OrthoMCL. A family matrix, which was generated from the genome pairwise comparison of the gene contents of any two genomes, was visualized. The gene families obtained from the OrthoMCL were used to get core and pan genome datasets. The number of unique genes and gene families for each individual species relative to other 98 genomes was calculated and visualized with bar graph.

2.4. Functional Classification of the Core and Accessory Genome

The dataset was combined into three groups: 75 H. pylori genomes alone, 24 NPHS genomes alone, and all the tested 99 Helicobacter genomes. For core and accessory genome of three groups, functional annotation and category were analyzed by performing BLASTP program against database Clusters of Orthologous Groups (COGs, 2014 update, https://www.ncbi.nlm.nih.gov/COG/), respectively [29, 30]. The percentage of each function category was illustrated by bar chart. All the heatmap and bar were plotted by R (https://www.r-project.org/).

2.5. Unique Regions Analysis of H. pylori

Each of the genomes was aligned to H. pylori 26695 using BLASTN program. Then, the genome regions shared by at least 80% of the H. pylori meanwhile lacked in NHPS were detected by a Perl script. The genomic lengths of unique regions only greater than 200 bp were considered. If the genomic length between each adjacent unique regions is less than 300 bp, it was regarded as a part of unique region. DOOR (Database for prOkaryotic OpeRons) [31] was used to predicate operons of H. pylori 26695 genome. Virulence factor database (VFDB) [32], COG database [29], InterProScan [33], and nonredundant (NR) protein database [34] were used to annotate and predict the functions of these genes within the target region. Furthermore, pfam [35], KEGG [36], GO [37], and TrEMBL [38] were used to discover more about the putative function of the hypothetical proteins of them. Small noncoding RNAs (sRNAs) are ubiquitous regulators existing in all living organisms. They can impact various biological processes via interacting with mRNA targets or binding to regulatory proteins [39, 40]. RNAspace.org (http://RNAspace.org/), which is a comprehensive prediction and annotation tool of ncRNA [41], was used to predict ncRNA of H. pylori. Then, the particular ones contained by unique regions of H. pylori (URHP) were detected. The analysis results were virtualized by BLAST ring image generator (BRIG) [42]. Five H. pylori strains, 26695, Cuz-20, J99, PeCan4, and SouthAfrica7, were drawn on the inner rings to represent the H. pylori species. URHP were drawn on the outer ring and twenty-four NHPS were drawn between them.

2.6. Protein-Protein Interaction Network Analysis of URHP Proteins

To better understand the role of URHP proteins in the H. pylori adaption and pathogenicity, protein-protein interaction network analysis of URHP proteins was carried out using Search Tool for the Retrieval of Interacting Genes/Proteins (STRING version 10.0) [43]. The STRING database (http://string-db.org/) is a comprehensive database that could provide a strict assessment and integration of protein-protein interactions, including physical as well as functional interrelationships.

3. Results and Discussion

3.1. Genome Statistics and Features

H. pylori was discovered by Warren and Marshall in 1983 and proved to be the pathogen that caused gastritis [44]. Then, the important pathogen strain H. pylori 26695 genome was completely sequenced in 1997 [45]. Altogether, ninety-nine genomes were used in this study and listed in Table 1, including 75 complete H. pylori genomes and 24 NPHS genomes, and plasmids were identified within 27 genomes (Table 1). The NPHS, which can be classified into 20 Helicobacter species, includes 11 completed genomes. Average genome size of all strains is 1,689,380 bp, ranging from 1,435,066 bp (H. pametensis ATCC 51478) to 2,559,659 bp (H. bilis WiWa). The genomes are relatively small and compact compared with other bacteria, which may indicate a specific adaptation for their obligate pathogenic lifestyles [46, 47]. This genus has a low GC content, whose average GC content is 38.91%, ranging from 33.58% (H. pullorum MIT 98-5489) to 47.38% (H. heilmannii ASB1.4). The average number of protein coding sequences predicted is 1,730, ranging from 1,432 (H. pametensis ATCC 51478) to 2,751 (H. bilis WiWa). The hosts of this genus species have great variety. All the H. pylori strains and H. cinaedi, H. fennelliae, H. heilmannii, and H. winghamensis were originally isolated from humans. The natural hosts of H. canis, H. bizzozeronii, H. Canadensis, H. felis, H. pullorum, and H. suis are mammals or birds, including pig, cat, dog, and geese. At the same time, the above six NHPS were also found to associate with gastric disease in humans [48-51]. H. acinonychis, H. ailurogastricus, H. bilis, H. cetorum, H. hepaticus, H. himalayensis, H. macacae, H. mustelae, H. pametensis, and H. typhlonius were isolated from nonhuman sources only, which had not been reported in human infection before [52-54].

3.2. Phylogenetic Analysis of 16S rRNA

Helicobacter genus species have a wide range of hosts. However, H. pylori is one of the most prevalent pathogenic bacteria that comigrated and evolved with human beings all around the world [55]. Each Helicobacter species has its own specific or broad hosts or even only survives in several host's organs [56], suggesting that each one of them has developed a balance of adaption with its hosts. In order to better understand the pattern of evolution in this genus, a phylogenetic tree based on 16S rRNA has been constructed for 99 Helicobacter species with Campylobacter fetus and Campylobacter jejuni as outgroup. After multiple alignments, the common gaps and missing data were masked. In the final dataset, there were 1,489 bp of each aligned sequence. As shown in Figure 1, H. acinonychis and H. cetorum, whose nature hosts are cats and aquatic mammals, respectively, are the closest species to H. pylori, and H. pylori strains have a very close relationship among them.

Figure 1

16S rRNA phylogenetic tree of 99 Helicobacter genus strains and 2 Campylobacter species was constructed by Neighbor-Joining (NJ) algorithms. The sum of branch length of the optimal tree is 0.47957369. The evolutionary distances were computed using the p-distance method.

3.3. Homologous Proteome Analysis by Pairwise Comparisons

The whole predicted proteins (proteome) of each strain used in this study were compared to estimate the amount of proteins they shared. The homolog between any two different proteomes ranged from 43.71% (H. heilmannii ASB 1.4 versus H. bilis ATCC 43879) to 99.87% (H. pylori BM013A versus H. pylori BM013B), while it is generally to be above 80% within the H. pylori strains (Figure 2). The results also showed that H. acinonychis (average 81.7%) and H. cetorum (average 75.59%) had the highest similarity with H. pylori. The relationships shown by the homologous analysis are consistent when compared with the phylogenetic tree. The internal homology against its own proteome ranged from 1.45% (H. pullorum 229313-12) to 9.52% (H. heilmannii ASB1.4) with average 3.50%, which indicates that this genus's strains have a low redundancy in their genome composition.

Figure 2

Homologous proteins analysis among proteomes (orthologous) and internal proteomes (paralogous) in the Helicobacter genus species. The blocks on the diagonal represent paralogous data and the others represent orthologous data. The percentage of orthologous and paralogous proteins are represented by red and green, respectively. The similarity is indicated by depth of color. The number of homologs and percentage of similarities between/within proteomes are shown in corresponding block.

3.4. Core-Pan Genome Analysis

The core genome, which is responsible for the basic life processes and major phenotypic characteristics, is composed of the gene families that are shared by all the Helicobacter species strains. The pan genome is the overall gene families existing in any Helicobacter species strain. The pan genome size of 75 H. pylori genomes is 4,409 with an average of about 39 new gene families extended with followed addition of genome. The increasing speed of pan genome size is almost the same with previous analysis of Ali et al., and their sample size is 39 genomes [57]. For 24 NPHS genomes along, the pan genome size is 12,010, including 4,412 singleton genes. When all NPHS and H. pylori genomes were used, the pan genome size was rapidly increased to 14,686, including 8,243 singleton genes. It is more than thrice the size of 75 H. pylori pan genome size. The above pan genome analytic results suggest that the genomes of Helicobacter genus species are open and have diversity. Nevertheless, the core genome size is relatively stable. There are 1,173 gene families shared by all the H. pylori genomes, which represent more than 74% of their average gene family contents (~1,565). For all the NPHS genomes along, the core genome size is 682, which is almost the same with the size (673) for all H. pylori with NPHS genomes together. It is interesting that there is an obvious difference between the core genome size of H. pylori and NPHS. This may indicate that those unique gene families shared by H. pylori strains are very relevant to their adaption to unique living environment, pathogenicity, and epidemic. Estimation of the size of unique genes and gene families for each individual species relative to all 99 genomes was simultaneously carried out (Figure 3). H. macacae MIT 99-5501 has the largest number of unique genes and gene families, which are 1,016 and 964, respectively. It accounts for 38.07 percent of its gene contents. The number of unique genes of H. pylori is relatively few. This may be due to the fact that too many H. pylori genomes were compared with each other. For example, H. pylori BM013A genome and H. pylori BM013B genome exhibit a high degree of similarity, so only few unique genes exist between them. For all the NHPS, the average number of unique genes and gene families are 325 and 303. It once again implies the obvious genomic plasticity among Helicobacter species living in different habits and possessing diverse lifestyles.

Figure 3

The number of unique genes and gene families for each individual species relative to all 99 genomes. Orange and turquoise bar graphs represent unique genes and gene families for each individual species, respectively.

3.5. COG Category of Core Genome and Accessory Genome

The core genome and accessory genome of 99 Helicobacter strains were composed of 673 and 14,013 protein families, separately. For 75 H. pylori genomes along, the core genome and accessory genome sizes were 1,173 and 3,236, as well as 682 and 11,328 for 24 NPHS genomes along. According to COG category analysis of the above six datasets, possible functions of their gene clusters were identified and subdivided into 23 subcategories. The unassigned gene clusters were put into the same class with function unknown (Figure 4). For three core genome datasets, more than 90% protein clusters were assigned to COG function category. Nevertheless, average 28.1% protein clusters were assigned for three accessory datasets, suggesting that there are still a plenty of proteins without clear biological functions that need to be studied.

Figure 4

Functional classification of core genome and accessory genome by COG database. Core genome and accessory genome of 99 Helicobacter genomes and core genome and accessory genome of 75 H. pylori genomes, along with core genome and accessory genome of 24 NPHS genomes are shown using different colors, respectively.

In line with what we expected, the significant protein clusters belonging to core genome were assigned to the groups of housekeeping functions. For core genome of 99 Helicobacter strains, translation, ribosomal structure, biogenesis (category J), and cell wall/membrane/envelope biogenesis (category M) take up 17.26% and 9.65%, respectively, and the percentages are far more than accessory genome. On the contrary, for functional subcategories extracellular structures (category W), mobilome, prophages, transposons (category X), and defense mechanisms (category V), the proportion of accessory genome is greater than core genome. Most of these protein clusters closely related to the interaction of strains and their living environment [58-60]. For instance, type IV pilus (TFP) assembly proteins (category W) are important components of TFP pilus which help H. pylori colonization [61]; multiple transposase genes (category X) which can cause antibiotic resistance and transposition are also important to create genetic diversity within species and adaptability to dynamic living conditions [62]; ABC-type multidrug transport system proteins (category V) are used to drug resistance [63] and so on. In addition, the poorly characterized part accounting for more than 70% may be involved in specific adaptations that help Helicobacter species survive in novel environments.

3.6. Identification of H. pylori Unique Regions

A reasonable hypothesis often made in studying bacteria evolution is that the numerous host specific adaptation that a bacterial species displays will be correlated with its specific regions and genes [64]. In this study, seventy-nine sequence segments, total length of 202,359 bp, about 12.4% of the H. pylori genome, were identified as unique regions. These regions are shared by H. pylori strains but absent from NHPS. The lengths of the unique regions range from 211 bp to 27,269 bp and median length is 1,502. A total of 155 genes are contained in them. Functional annotation of the above genes was performed by VFDB, COG database, InterProScan, and NR database, respectively. Furthermore, the results were integrated (Table S1, in Supplementary Material available online at http://dx.doi.org/10.1155/2016/6106029) and classified into different function categories (Figure 5). Besides, a total of 28 sRNAs within the URHP were identified (Table S2).

Figure 5

Regions conserved in H. pylori and absent from NHPS. From inside to outside, rings 1 and 2 are GC content and GC skew of reference genome H. pylori 26695, respectively; rings 3–7 represent H. pylori strains while rings 8–31 represent NHPS. The depth of color of rings 3–31 indicates the sequence similarity. Outer ring is the unique regions of H. pylori and absent from NHPS. Inside the outer ring, different colors represent different function categories: purple: Cag-PAI; blue: membrane genes; green: transport and metabolism genes; gray: cell growth, division, and basic metabolism; aqua: other functional genes; black: hypothetical genes; red: sRNAs.

In the circular graph, the largest H. pylori unique region named UR_26 containing 28 genes can be observed obviously. Average about two genes were contained in each unique region. However, about 82.3% unique regions contain two genes or less. Operons, as the basic units of transcription and cellular functions, have been proved that they are extensively existing in H. pylori genome [65]. Within H. pylori, sixty unique genes, more than three quarters, are contained in nineteen unique polycistrons. Twenty-three polycistrons are located partly in URHP, in addition to seventy-one monocistrons (Table S1). The known acid induction of H. pylori adaptability and virulence operons, such as cag-pathogenicity island, transcriptional regulator (tenA), catalase, and membrane protein (hopT), are included in them [65-67]. These results indicate that H. pylori can regulate the expression of those unique genes by control of operons depending on environmental conditions. A total of 101 genes could get the certain functional annotation within the URHP, compared to the above 4 databases. Unique region UR_26 represents the T4SS, which can deliver effector protein cancer-associated gene toxin (cagA) into gastric epithelial cells. It is reported that T4SS plays a crucial role in the pathogenesis of gastric cancer [12, 60]. Besides T4SS, a plenty of genes, which have been proved strongly to correlate to pathogenicity and adaption, are contained in the unique regions. For instance, membrane proteins babB/hopT, sabB/hopO, and sabA/hopP, and so forth are involved in cell adhesion. These genes facilitate colonization of H. pylori and increase immune response, resulting in enhanced mucosal inflammation [68-70]; abundant restriction-modification (RM) system proteins have large effects on gene expression and genome maintenance. They give H. pylori the ability to adapt to dynamic environmental conditions during long-term colonization [71]; ABC transporters, MFS transporter, sugar efflux transporter, short-chain fatty acids transporter, and so forth, which are important virulence factors because they play roles in nutrient uptake and secretion of toxins and antimicrobial agents, are important for their interactions with complicated and changeable environments [72-74]. Even though pfam, KEGG, GO, and TrEMBL databases were used for functional annotation, the other 54 genes still cannot get the clear function information, accounting for nearly a third of all URHP genes. Noncoding small RNAs act as posttranscriptional regulators that fine-tune important physiological processes in pathogens to adapt dynamic, intricate environment [75, 76]. To investigate the regulatory roles of the putative unique sRNAs, we mapped them to the genome of H. pylori 26695 [76]. Eighteen of them have matches with genes, unexpectedly (Table S2). Ten sRNAs (SR1, SR2, SR6, SR15, SR20, SR21, SR22, SR23, SR13, and SR25) match perfectly with the known acid induction genes, including eight membrane proteins, DNA polymerase III subunits gamma, tau, and adenine-specific DNA methyltransferase [67, 77]. Besides, SR5 matches with HcpA, which is considered as a virulence factor to trigger the release of a concerted set of cytokines to active the inflammatory response [78]. The small CRISPR RNAs SR7 and SR18 are guides of the CRISPR-Cas system, which was reported as potential participants in bacteria stress responses and virulence [79]. Altogether, it has been proved that the close associations exist between most of the operons, genes, or sRNAs within URHP and adaptability or virulence of H. pylori. However, some of them cannot get the certain functional information via current databases, which indicates that our genetic knowledge is still incomplete to explain pathogenicity and adaption mechanism of H. pylori fully and these function unknown genes need to be further studied.

3.7. Protein-Protein Interaction Network Analysis

The 155 URHP genes and 54 genes with unknown functions of H. pylori were analyzed using STRING to build protein-protein interaction map, respectively. As shown in Figure S1, a total of 125 genes were assigned into an independent interaction network. It is easy to find two main protein-protein interaction groups: one is well-known cag-pathogenicity island, and the other takes succinyl-CoA-3-ketoacid CoA transferase (encoded by scoA and scoB of operon UO_54), acetone carboxylase (encoded by C694_03570, C694_03590, and C694_03595 of operon UO_55), and acetyl-CoA acetyltransferase (encoded by C694_03555 of operon UO_54) as the center of the interaction map. The second main protein-protein interaction group genes are involved in acetone metabolism. Brahmachary et al. proved that those genes play an important role in survival and colonization of the H. pylori in gastric mucosa [80, 81]. Figure 6 shows a possible protein-protein interaction map of the 54 URHP function unknown genes. Thirty proteins were targeted to two divided interaction maps. One includes 18 proteins; the other includes 12 ones. These genes may have synergistic effect on surviving characteristics of H. pylori. They could be used as the most possible proteins to further explore the common pathogenic behavior of this pathogen.

Figure 6

Protein-protein interaction networks of URHP function unknown genes. Thirty proteins are shown in two interaction networks. Network nodes represent proteins and edges represent protein-protein associations. Different colors represent the types of evidence for the interaction.

4. Conclusions

H. pylori is an age-old pathogenic microorganism that has infected more than half of the population with strong adaptability. In this study, we presented a comparative genomics analysis of 75 representative H. pylori complete genomes and 24 NHPS ones. Pan genome analysis showed that both all Helicobacter genus strains and only H. pylori species had an open and diverse genome, which may be the result of the different strains that cope with their specific living conditions. However, the core genome is conserved relatively higher. We found 1173 conserved protein families for 75 H. pylori strains and 673 for all the 99 Helicobacter genus strains. The regions and genes, which are conserved among H. pylori genomes but absent from NHPS genomes, were considered as potential targets that were associated with H. pylori pathogenicity and adaptation. Functional annotation of 155 genes within 79 URHP indicated that most of them are well-known pathogenic and adaptive associated ones, such as cag-pathogenicity island, babB, sabB, and ABC transporter, whereas there are still 54 genes of which the biological functions remain unclear. Protein-protein interaction network analysis showed that 30 of them could be assigned to two different interaction networks. Besides, the functional analysis of the operons and sRNAs which were unique to H. pylori also showed the intimate association between these genomic structures and its pathogenicity and adaptation. All the URHP, especially those components whose functions remain unclear, could be as potential candidates for further studying and deeply understanding the mechanism of widespread epidemics and pathogenicity in H. pylori. In addition, the analysis tools and pipeline used in this study could be as a reference applied to other species. Supplemental Information includes: Table S1, Unique regions of H. pylori (URHP) and function annotations of relative genes; Table S2, sRNAs shared by all H. pylori but absent from NHPS; Fig. S1 Protein-protein interaction networks of 155 URHP genes.

77 in total

1. RNAspace.org: An integrated environment for the prediction, annotation, and analysis of ncRNA.

Authors: Marie-Josée Cros; Antoine de Monte; Jérôme Mariette; Philippe Bardou; Benjamin Grenier-Boley; Daniel Gautheret; Hélène Touzet; Christine Gaspin
Journal: RNA Date: 2011-09-23 Impact factor: 4.942

2. The complete genome sequence of the gastric pathogen Helicobacter pylori.

Authors: J F Tomb; O White; A R Kerlavage; R A Clayton; G G Sutton; R D Fleischmann; K A Ketchum; H P Klenk; S Gill; B A Dougherty; K Nelson; J Quackenbush; L Zhou; E F Kirkness; S Peterson; B Loftus; D Richardson; R Dodson; H G Khalak; A Glodek; K McKenney; L M Fitzegerald; N Lee; M D Adams; E K Hickey; D E Berg; J D Gocayne; T R Utterback; J D Peterson; J M Kelley; M D Cotton; J M Weidman; C Fujii; C Bowman; L Watthey; E Wallin; W S Hayes; M Borodovsky; P D Karp; H O Smith; C M Fraser; J C Venter
Journal: Nature Date: 1997-08-07 Impact factor: 49.962

Review 3. Pathogenesis of Helicobacter pylori Infection.

Authors: Dionyssios N Sgouras; Tran Thi Huyen Trang; Yoshio Yamaoka
Journal: Helicobacter Date: 2015-09 Impact factor: 5.753

4. Characterization of the Helicobacter pylori cysteine-rich protein A as a T-helper cell type 1 polarizing agent.

Authors: Ludwig Deml; Michael Aigner; Jochen Decker; Alexander Eckhardt; Christian Schütz; Peer R E Mittl; Sascha Barabas; Stefanie Denk; Gertrud Knoll; Norbert Lehn; Wulf Schneider-Brachert
Journal: Infect Immun Date: 2005-08 Impact factor: 3.441

5. The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Authors: N Saitou; M Nei
Journal: Mol Biol Evol Date: 1987-07 Impact factor: 16.240

Review 6. Helicobacter pylori VacA, a paradigm for toxin multifunctionality.

Authors: Timothy L Cover; Steven R Blanke
Journal: Nat Rev Microbiol Date: 2005-04 Impact factor: 60.633

Review 7. CagA-mediated pathogenesis of Helicobacter pylori.

Authors: Abolghasem Tohidpour
Journal: Microb Pathog Date: 2016-01-12 Impact factor: 3.738

Review 8. Gastric helicobacters in domestic animals and nonhuman primates and their significance for human health.

Authors: Freddy Haesebrouck; Frank Pasmans; Bram Flahou; Koen Chiers; Margo Baele; Tom Meyns; Annemie Decostere; Richard Ducatelle
Journal: Clin Microbiol Rev Date: 2009-04 Impact factor: 26.132

9. Persistent infection of rhesus monkeys with 'Helicobacter macacae' and its isolation from an animal with intestinal adenocarcinoma.

Authors: Robert P Marini; Sureshkumar Muthupalani; Zeli Shen; Ellen M Buckley; Cynthia Alvarado; Nancy S Taylor; Floyd E Dewhirst; Mark T Whary; Mary M Patterson; James G Fox
Journal: J Med Microbiol Date: 2010-04-22 Impact factor: 2.472

10. A comprehensive analysis of Helicobacter pylori plasticity zones reveals that they are integrating conjugative elements with intermediate integration specificity.

Authors: Wolfgang Fischer; Ute Breithaupt; Beate Kern; Stella I Smith; Carolin Spicher; Rainer Haas
Journal: BMC Genomics Date: 2014-04-27 Impact factor: 3.969

9 in total

1. Biochemical investigation of an N-acetyltransferase from Helicobacter pullorum.

Authors: William A Griffiths; Keelan D Spencer; James B Thoden; Hazel M Holden
Journal: Protein Sci Date: 2021-12 Impact factor: 6.725

2. Investigation of the enzymes required for the biosynthesis of an unusual formylated sugar in the emerging human pathogen Helicobacter canadensis.

Authors: Colton J Heisdorf; William A Griffiths; James B Thoden; Hazel M Holden
Journal: Protein Sci Date: 2021-08-31 Impact factor: 6.993

3. Helicobacter spp. in the Stomach of Cats: Successful Colonization and Absence of Relevant Histopathological Alterations Reveals High Adaptation to the Host Gastric Niche.

Authors: Sílvia Teixeira; Dulce Filipe; Manuela Cerqueira; Patrícia Barradas; Francisco Cortez Nunes; Fátima Faria; Freddy Haesebrouck; João R Mesquita; Fátima Gärtner; Irina Amorim
Journal: Vet Sci Date: 2022-05-10

4. Whole-Genome Sequencing and Comparative Genomics of Three Helicobacter pylori Strains Isolated from the Stomach of a Patient with Adenocarcinoma.

Authors: Montserrat Palau; Núria Piqué; M José Ramírez-Lázaro; Sergio Lario; Xavier Calvet; David Miñana-Galbis
Journal: Pathogens Date: 2021-03-12

5. The microbiomes of blowflies and houseflies as bacterial transmission reservoirs.

Authors: Ana Carolina M Junqueira; Aakrosh Ratan; Enzo Acerbi; Daniela I Drautz-Moses; Balakrishnan N V Premkrishnan; Paul I Costea; Bodo Linz; Rikky W Purbojati; Daniel F Paulo; Nicolas E Gaultier; Poorani Subramanian; Nur A Hasan; Rita R Colwell; Peer Bork; Ana Maria L Azeredo-Espin; Donald A Bryant; Stephan C Schuster
Journal: Sci Rep Date: 2017-11-24 Impact factor: 4.379

6. Helicobacter pylori virulence factors: relationship between genetic variability and phylogeographic origin.

Authors: Aura M Rodriguez; Daniel A Urrea; Carlos F Prada
Journal: PeerJ Date: 2021-11-26 Impact factor: 2.984

Review 7. Infections With Enterohepatic Non-H. pylori Helicobacter Species in X-Linked Agammaglobulinemia: Clinical Cases and Review of the Literature.

Authors: Carolina Romo-Gonzalez; Juan Carlos Bustamante-Ogando; Marco Antonio Yamazaki-Nakashimada; Francisco Aviles-Jimenez; Francisco Otero-Mendoza; Francisco Javier Espinosa-Rosales; Sara Elva Espinosa-Padilla; Selma Cecilia Scheffler Mendoza; Carola Durán-McKinster; Maria Teresa García-Romero; Marimar Saez-de-Ocariz; Gabriela Lopez-Herrera
Journal: Front Cell Infect Microbiol Date: 2022-02-04 Impact factor: 5.293

8. Evolution of Helicobacter spp: variability of virulence factors and their relationship to pathogenicity.

Authors: Carlos F Prada; Maria A Casadiego; Caio Cm Freire
Journal: PeerJ Date: 2022-08-29 Impact factor: 3.061

Review 9. Helicobacter pylori BabA-SabA Key Roles in the Adherence Phase: The Synergic Mechanism for Successful Colonization and Disease Development.

Authors: Dalla Doohan; Yudith Annisa Ayu Rezkitha; Langgeng Agung Waskito; Yoshio Yamaoka; Muhammad Miftahussurur
Journal: Toxins (Basel) Date: 2021-07-13 Impact factor: 4.546

9 in total