| Literature DB >> 32047063 |
Abstract
Flagellin, the agent of prokaryotic flagellar motion, is very widely distributed and is the H antigen of serology. Flagellin molecules have a variable region that confers serotype specificity, encoded by the middle of the gene, and also conserved regions encoded by the two ends of the gene. We collected all available prokaryotic flagellin protein sequences and found the variable region diversity to be at two levels. In each species investigated, there are hypervariable region (HVR) forms without detectable homology in protein sequences between them. There is also considerable variation within HVR forms, indicating that some have been diverging for thousands of years and that interphylum horizontal gene transfers make a major contribution to the evolution of such atypical diversity.IMPORTANCE Bacterial and archaeal flagellins are remarkable in having a shared region with variation in housekeeping proteins and a region with extreme diversity, perhaps greater than for any other protein. Analysis of the 113,285 available full-gene sequences of flagellin genes from published bacterial and archaeal sequences revealed the nature and enormous extent of flagellin diversity. There were 35,898 unique amino acid sequences that were resolved into 187 clusters. Analysis of the Escherichia coli and Salmonella enterica flagellins revealed that the variation occurs at two levels. The first is the division of the variable regions into sequence forms that are so divergent that there is no meaningful alignment even within species, and these corresponded to the E. coli or S. enterica H-antigen groups. The second level is variation within these groups, which is extensive in both species. Shared sequence would allow PCR of the variable regions and thus strain-level analysis of microbiome DNA.Entities:
Keywords: evolution and diversity; hypervariable region of flagellin; prokaryotic flagellin
Year: 2020 PMID: 32047063 PMCID: PMC7018530 DOI: 10.1128/mSystems.00705-19
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 6.496
FIG 1Structure of typcial flagellins. Typical crystal structures of an archaeal flagellin (from Methanospirillum hungatei, PDB accession no. 5TFY) (A) and a bacterial flagellin (from S. enterica serovar Typhimurium, PDB accession no. 3A5X) (B) are shown. In each panel, the flagellin is depicted on the left, with color-coded secondary structures, and boxes with dashed edges link each domain to the corresponding area on the amino acid sequence shown on the right. In panel B, the gene is also shown, with locations of the two conserved regions and the variable region colored gray and dark yellow, respectively.
FIG 2Minimum spanning tree of all flagellin clusters. One bacterial subtree on the left (A) and one archaeal subtree plus three minor groups of archaea on the right (B) are shown by linked pie charts which represent flagellin clusters. In each pie chart, colored segments represent the phyla and their proportions within the cluster, and the circled area reflects the total number of unique sequences in the cluster. The names of clusters mentioned in the text are displayed in a large font. Twenty-five phyla, listed below, have low proportions of observed flagellins, so they are combined and displayed in gray, while the proteins without taxonomic information are displayed in white. The clusters including E. coli flagellins are circled in green. The thickness of branches relates to the distances between clusters (see Materials and Methods). The rare phyla shown in gray are as follows: “Candidatus Glassbacteria,” “Candidatus Handelsmanbacteria,” “Candidatus Margulisbacteria,” Fusobacteria, candidate division NC10, Calditrichaeota, “Candidatus Omnitrophica,” Coprothermobacterota, Elusimicrobia, Rhodothermaeota, “Candidatus Latescibacteria,” ”Candidatus Magasanikbacteria,” Chlamydiae, Balneolaeota, Cyanobacteria, “Candidatus Hydrogenedentes,” “Candidatus Raymondbacteria,” “Candidatus Kryptonia,” “Candidatus Marsarchaeota,” “Candidatus Wallbacteria,” “Candidatus Lindowbacteria,” “Candidatus Rokubacteria,” Gemmatimonadetes, Lentisphaerae, and Nanoarchaeota.
FIG 3Distribution of E. coli flagellins according to strain phylogenetic groups. The heat map demonstrates the relative numbers of E. coli genomes with each serotype on specific branches. The phylogenetic groups and serotypes are ordered by the phylogenetic trees shown on the left and at the top, respectively (see Materials and Methods). Branches for major E. coli or Shigella phylogroups are highlighted in red or yellow, respectively. Morphotypes of each serotype are color coded, and the boxes representing the presence of the serotypes use the same color scheme, with gradation to illustrate the relative numbers of genomes.
FIG 4Diversity of flagellins in each phylum. The logarithms of unique flagellin protein numbers and total genome sequence numbers for each phylum are marked by circles on the two-dimension point chart. Each circle is color coded for the phylum. The regression line indicates the average level of diversity within a phylum covering 98.21% of the difference with statistically high significance (P value < 0.001). The 95% confidence interval of the regression line is shown as a gray band.