Literature DB >> 35900091

Divergent Host-Microbe Interaction and Pathogenesis Proteins Detected in Recently Identified Liberibacter Species.

Allison K Hansen1, Ariana N Sanchez1, Younghwan Kwak1.   

Abstract

Candidatus (Ca.) Liberibacter taxa are economically important bacterial plant pathogens that are not culturable; however, genome-enabled insights can help us develop a deeper understanding of their host-microbe interactions and evolution. The draft genome of a recently identified Liberibacter taxa, Ca. Liberibacter capsica, was curated and annotated here with a total draft genome size of 1.1 MB with 1,036 proteins, which is comparable to other Liberibacter species with complete genomes. A total of 459 orthologous clusters were identified among Ca. L. capsica, Ca. L. asiaticus, Ca. L. psyllaurous, Ca. L. americanus, Ca. L. africanus, and L. crescens, and these genes within these clusters consisted of housekeeping and environmental response functions. We estimated the rates of molecular evolution for each of the 443 one-to-one ortholog clusters and found that all Ca. L. capsica orthologous pairs were under purifying selection when the synonymous substitutions per synonymous site (dS) were not saturated. These results suggest that these genes are largely maintaining their conserved functions. We also identified the most divergent single-copy orthologous proteins in Ca. L. capsica by analyzing the ortholog pairs that represented the highest nonsynonymous substitutions per nonsynonymous site (dN) values for each pairwise comparison. From these analyses, we found that 21 proteins which are known to be involved in pathogenesis and host-microbe interactions, including the Tad pilus complex, were consistently divergent between Ca. L. capsica and the majority of other Liberibacter species. These results further our understanding of the evolutionary genetics of Ca. L. capsica and, more broadly, the evolution of Liberibacter. IMPORTANCE "Candidatus" (Ca.) Liberibacter taxa are economically important plant pathogens vectored by insects; however, these host-dependent bacterial taxa are extremely difficult to study because they are unculturable. Recently, we identified a new Ca. Liberibacter lineage (Ca. Liberibacter capsica) from a rare insect metagenomic sample. In this current study, we report that the draft genome of Ca. Liberibacter capsica is similar in genome size and protein content compared to the other Ca. Liberibacter taxa. We provide evidence that many of their shared genes, which encode housekeeping and environmental response functions, are evolving under purifying selection, suggesting that these genes are maintaining similar functions. Our study also identifies 21 proteins that are rapidly evolving amino acid changes in Ca. Liberibacter capsica compared to the majority of other Liberibacter taxa. Many of these proteins represent key genes involved in Liberibacter-host interactions and pathogenesis and are valuable candidate genes for future studies.

Entities:  

Keywords:  Ca. Liberibacter capsica; Candidatus (Ca.) Liberibacter; Tad pilus complex; horizontal gene transfer; purifying selection

Mesh:

Year:  2022        PMID: 35900091      PMCID: PMC9430466          DOI: 10.1128/spectrum.02091-22

Source DB:  PubMed          Journal:  Microbiol Spectr        ISSN: 2165-0497


INTRODUCTION

Microbiome sequencing during the last few decades has revealed two critical observations. First is that the overwhelming majority of microbes have never been cultured in a laboratory, and second is that this “unseen majority” have important ecological and economic impacts (1). For these fastidious microbes, genome-enabled technologies facilitate a deeper understanding of gene function, host-microbe interactions, and evolution (2). One group of microbes that are unculturable but extremely important to identify and understand at an evolutionary genetic level are the economically important group of plant pathogens that belong to the bacterial genus Liberibacter (3, 4). These bacteria are harbored and exclusively vectored by sap-sucking insects in the superfamily Psylloidea (5). In consequence, uncovering Liberibacter genes that are important for both insect and plant survival is important for elucidating how Liberibacter responds and interacts in both of its hosts’ environments (6). A total of nine Liberibacter species have been identified to date, and these include Candidatus (Ca.) Liberibacter capsica, Ca. L. psyllaurous, Ca. L. asiaticus, Ca. L. africanus, Ca. L. americanus, Ca. L. europaeus, Ca. L. brunswickensis, Ca. L. ctenarytainae, and L. crescens (5). Only one culturable Liberibacter species, L. crescens, has been identified and sequenced to date, and it is the most divergent species within the Liberibacter genus (7). The Liberibacter taxa Ca. L. africanus, Ca. L. americanus, and especially Ca. L. asiaticus are all associated with the devastating disease of citrus called Huanglongbing (also known as citrus greening disease) (8), and Ca. L. psyllaurous widely infects plants within the families Solanaceae (9), Apiaceae (10), and Urticaceae (11) and is associated with psyllid yellows (9) and zebra chip disease (12) in the economically important crop of potato. In addition to not being culturable, Candidatus Liberibacter species are hard to sequence with high coverage because these fastidious microbes need to be sequenced in either their psyllid or plant host tissues (13). Nevertheless, four unculturable Liberibacter species have been fully sequenced to date: Ca. L. asiaticus, Ca. L. psyllaurous, Ca. L. americanus, Ca. L. africanus (14). The most recent Ca. Liberibacter species identified using a metagenomic and phylogenetic approach is Ca. Liberibacter capsica (5). This metagenomic sample was sequenced and identified from a rare DNA sample extracted from a single pair of adult male and female psyllids that belonged to the psyllid species Russelliana capsici. These psyllids were collected from Brazil for use in a 16S rRNA analysis and were unexpectedly infected with the new Ca. Liberibacter taxa, Ca. L. capsica (5). The psyllid R. capsici is a pest on pepper (Capsicum annuum) in Argentina and Brazil (15); however, it is unknown if Ca. Liberibacter capsica can infect the psyllid’s host plant and/or cause plant disease at this time. To further understand Ca. Liberibacter capsica’s biology and evolution through a genome-enabled approach, we curated and annotated the draft genome assembly of Ca. Liberibacter capsica to determine how this new taxon is similar to and different from other fully sequenced Liberibacter genomes, such as L. crescens, Ca. L. asiaticus, Ca. L. psyllaurous, Ca. L. americanus, and Ca. L. africanus. We also identified orthologous gene clusters in Ca. Liberibacter capsica and determined which one-to-one orthologs are evolving due to natural selection. We further identified orthologous proteins in Ca. Liberibacter capsica that display the highest rates of nonsynonymous substitutions per nonsynonymous sites (dN) compared to other Liberibacter taxa. Based on these evolutionary analyses, we highlight candidate proteins in Ca. Liberibacter capsica that may be important for host-microbe interactions in the evolution of this Liberibacter lineage.

RESULTS

Annotation of Ca. Liberibacter capsica’s draft genome.

The draft genome of Ca. Liberibacter capsica (5) is similar in total nt length, 1,111,229 nt (1,626 total contigs), compared to other fully sequenced reference Liberibacter species that are fastidious and nonculturable (Table 1). A total of 1,036 proteins were annotated here from the draft genome of Ca. Liberibacter capsica (5), which is comparable to the average number of genes (~1,122) annotated in other Liberibacter species with complete genomes (Table 1). Nevertheless, 40% of the 736 single copy orthologs which are present in the five reference Liberibacter species genomes, based on OrthoVenn2, are missing from the draft genome of Ca. Liberibacter capsica. These missing single copy orthologs are primarily comprised of genes involved in genetic information processing, indicating that the draft genome from this rare and limited metagenomic sample is not fully complete (Table S1). In consequence, the evolutionary analyses presented here are conservative and are based only on Liberibacter homologs that are present in the Ca. Liberibacter capsica draft genome.
TABLE 1

Genome statistics of the draft genome of Ca. Liberibacter capsica compared to fully sequenced reference genomes of Liberibacter species

SpeciesNCBI iDnt lengthNumber of proteinsClustersSingletons
Ca. Liberibacter capsicaThis study1,111,2291,036580409
Ca. Liberibacter africanus NZ_CP004021.1 1,192,2321,075912131
Ca. Liberibacter americanus NC_022793.1 1,195,20199289769
Ca. Liberibacter asiaticus NZ_CP019958.1 1,225,1621,082916131
Ca. Liberibacter psyllaurous NC_014774.1 1,258,2781,13295595
Liberibacter crescens NZ_CP010522.1 1,522,1191,331871396
Genome statistics of the draft genome of Ca. Liberibacter capsica compared to fully sequenced reference genomes of Liberibacter species

Identification of orthologous gene clusters in Ca. Liberibacter capsica.

A total of 459 orthologous clusters containing 2,791 genes (Table S2), which consist of 443 single copy orthologs (Table S3), are shared among Ca. L. capsica, Ca. L. asiaticus, Ca. L. psyllaurous, Ca. L. americanus, Ca. L. africanus, and L. crescens (Fig. 1A). Approximately 91.2% of these gene clusters belong to GO term categories (Table S2). The GO term categories that contain the most proteins include those that are involved in housekeeping functions, stress response, pathogenesis, and host-microbe interactions (Table S2; Fig. 1B). A total of 10, 8, 26, 4, 8, and 26 unique orthologous gene clusters were found in Ca. L. capsica, Ca. L. asiaticus, Ca. L. psyllaurous, Ca. L. americanus, Ca. L. africanus, and L. crescens, respectively (Fig. 1A). The 10 unique orthologous clusters in Ca. L. capsica were composed of 51 genes that have no annotations based on Swiss-Prot, GO Terms, or the best BLAST hit based on the NCBI nr database. Two of these clusters contained the majority of these genes (14 and 10 genes), and they were relatively short in amino acid length and annotated as hypothetical. Some of these hypothetical proteins occurred on contigs alone or with other Liberibacter orthologous proteins. The majority of novel orthologous clusters found in other Liberibacter species also were not associated with annotations based on Swiss-Prot, GO Terms, or the best BLAST hit based on the NCBI nr database. However, there were several cluster(s) identified in Ca. L. americanus (N = 1), Ca. L. asiaticus (N = 2), Ca. L. psyllaurous (N = 1), and L. crescens (N = 9) for which we could provide functional assignments (Table 2). Liberibacter crescens annotations were primarily associated with transporters, cell membrane components, and pathogenicity related factors (Table 2). In Ca. L. americanus, one cluster that consisted of three proteins that belong to the Major Facilitator Superfamily (MFS) based on Swiss-Prot (P16482) and NCBI BLASTP were annotated by Prokka as two proline/betaine (ProP) transporters and one glycine betaine/proline/ectoine/pipecolic acid transporter (OusA) (Table 2). Interestingly, based on our phylogenetic analysis, all three of these genetically distinct MFS transporters appear to be related to a Wolbachia ancestor, potentially suggesting that there was a horizontal gene transfer event from Wolbachia into the genome of Ca. L. americanus (Fig. 2).
FIG 1

Orthologous gene clusters among Ca. L. capsica, Ca. L. asiaticus, Ca. L. psyllaurous, Ca. L. americanus, Ca. L. africanus, and L. crescens. (A) Overlap of orthologous gene clusters that are unique and shared between the six Liberibacter species. Numbers on the Venn diagram indicate the number of orthologous clusters shared by the species with overlapping sectors. Note that multiple genes per taxa can be present within an orthologous gene cluster. See Tables S2 and S3 for detail. (B) A total of 459 orthologous clusters are shared among all six Liberibacter taxa, and these clusters belong to 172 unique biological GO term categories, which are represented as circles in the figure (see Table S2 for detail). Multidimensional scaling (MDS) was used in Revigo to reduce the dimensionality of the matrix for these 172 GO terms based on pairwise similarities in biological function. The size of the circle and color indicates the relative number of proteins within each unique GO term category. The top 17 gene clusters that contain the most proteins are labeled in the figure.

TABLE 2

Annotated protein clusters that are unique for each Liberibacter species

SpeciesProtein countSwiss-Prot hitGO annotation
Ca. Liberibacter americanus3 P16482 GO:0006101; P:citrate metabolic process
Ca. Liberibacter asiaticus3 Q9PL92 GO:0006260; P:DNA replication
2 P10484 GO:0009307; P:DNA restriction-modification system
Ca. Liberibacter psyllaurous3 P78285 GO:0009253; P:peptidoglycan catabolic process
2 P44189 NA
Liberibacter crescens 6 Q44664 GO:0016021; C:integral component of membrane
4 Q9T1T5 GO:0009253; P:peptidoglycan catabolic process
2 Q6GJ96 GO:0015293; F:symporter activity
2 P0A535 GO:0006535; P:cysteine biosynthetic process from serine
2 P31224 GO:0042493; P:response to drug
2 P23214 GO:0016747; F:transferase activity
2 B9JXW8 GO:0000105; P:histidine biosynthetic process
2 P45082 GO:0042883; P:cysteine transport
 2 Q8G3D4 GO:0006865; P:amino acid transport
FIG 2

Phylogenetic analysis of MFS transporter proteins in Ca. L. americanus. The tree was rooted with the Firmicute outgroup. Bootstrap values are indicated for nodes with 50% or above.

Orthologous gene clusters among Ca. L. capsica, Ca. L. asiaticus, Ca. L. psyllaurous, Ca. L. americanus, Ca. L. africanus, and L. crescens. (A) Overlap of orthologous gene clusters that are unique and shared between the six Liberibacter species. Numbers on the Venn diagram indicate the number of orthologous clusters shared by the species with overlapping sectors. Note that multiple genes per taxa can be present within an orthologous gene cluster. See Tables S2 and S3 for detail. (B) A total of 459 orthologous clusters are shared among all six Liberibacter taxa, and these clusters belong to 172 unique biological GO term categories, which are represented as circles in the figure (see Table S2 for detail). Multidimensional scaling (MDS) was used in Revigo to reduce the dimensionality of the matrix for these 172 GO terms based on pairwise similarities in biological function. The size of the circle and color indicates the relative number of proteins within each unique GO term category. The top 17 gene clusters that contain the most proteins are labeled in the figure. Phylogenetic analysis of MFS transporter proteins in Ca. L. americanus. The tree was rooted with the Firmicute outgroup. Bootstrap values are indicated for nodes with 50% or above. Annotated protein clusters that are unique for each Liberibacter species A total of 409 proteins from Ca. L. capsica were singletons (Table 1) and did not form significant orthologous clusters with proteins from other Liberibacter species or itself. Only 48 of these singletons had significant hits to the NCBI nr database, and most of them (16) had a best hit to hypothetical proteins, phage tail proteins, and integrases in other Liberibacter species; however, they were not clustered as significant orthologous clusters by OrthoVenn2. The remaining three singletons had 60 to 90% sequence similarity to hypothetical proteins that belong to other Alphaproteobacterial species.

Identification of genes experiencing natural selection in Ca. Liberibacter capsica.

We analyzed the single copy orthologous genes (N = 443) that were found in all six Liberibacter species for signatures of natural selection by calculating the dN (nonsynonymous substitutions per nonsynonymous site) and dS (synonymous substitutions per synonymous site) values among each species pair with our target species, Ca. L. capsica. For all 1:1 single copy orthologous pairs in which dS was not saturated, the dN/dS ratio was <1, indicating that purifying selection is widespread among Liberibacter single copy gene orthologs. For the species pair Ca. L. capsica and Ca. L. americanus, we found 439 gene pairs with an average dN/dS ratio of 0.13 (stdev = 0.08). A total of 365 gene pairs are under purifying selection between Ca. L. capsica and Ca. L. psyllaurous and had an average dN/dS ratio of 0.09 (stdev = 0.04). For the species pair Ca. L. capsica and Ca. L. asiaticus, we found 315 gene pairs with an average dN/dS ratio of 0.09 (stdev = 0.06), and for the species pair Ca. L. capsica and Ca. L. africanus, we found 285 gene pairs with an average dN/dS ratio of 0.09 (stdev = 0.04). For the most divergent species pair, Ca. L. capsica and Ca. L. crescens, although we found only 38 gene pairs that did not have saturated dS values, the average dN/dS ratio was still just 0.12 (stdev = 0.09). As expected, based on Swiss-Prot and GO term annotations, all gene pairs under purifying selection primarily represent housekeeping genes (Table S3). We also identified the most divergent single copy orthologous proteins in Ca. L. capsica by analyzing ortholog pairs that represented the highest dN values (top 10%) for each pairwise comparison (N = 44) (Fig. 3). Within the top 10% of dN values for each pairwise comparison, the same 16 proteins were consistently found between Ca. L. capsica and the other five Liberibacter species (Table 3). When comparing Ca. L. capsica between all four unculturable Ca. Liberibacter species, 21 proteins were consistently found in the top 10% of dN (Table 3; Fig. S1). Seven of these latter proteins may be important for host-microbe interactions, such as the three Tad pilus apparatus proteins, CpaC, TadG/RcpC, and CpaI (Table 3). The Tad pilus complex is conserved in all Liberibacter species and is involved in host cell interactions in addition to the uptake of DNA (17, 18). Phylogenetic analyses for all three rapidly evolving Tad pilus proteins show significant clustering of all Liberibacter taxa, with branch supports between 81 and 100%, and all share a most recent common ancestor with taxa in the Alphaproteobacteria (Fig. 4; Fig. S1 to S3). Upon further investigation of Tad pilus genes, we identified a total of 9 out 13 Tad pilus genes in Ca. L. capsica’s draft genome using BLASTP: cpaC, cpaD, cpaE, cpaF, cpaI, tadB, tadC, tadD, and tadG (Table S4). Out of these Tad pilus proteins, CpaD was also in the top 10% of dN values for all Ca. L. capsica pairwise comparisons, except for that with its nearest relative, Ca. L. americanus (Table S4).
FIG 3

Box and whisker plots showing the divergence between 1:1 orthologs of Ca. L. capsica and other Liberibacter species based on nonsynonymous substitutions per nonsynonymous site (dN). All points shown are outliers, and red points are dN values for proteins that are consistently in the top 10% per species pair for all Liberibacter species comparisons. All dN values are shown per pairwise comparison (N = 443).

TABLE 3

Annotations of the most diverged one-to-one orthologs between Ca. L. capsica and other Liberibacter species

Protein descriptionaScientific name% identityE valueKOb
4-hydroxybenzoate octaprenyltransferaseCa. L. europaeus738.00E-70 K03179
1 pyruvate dehydrogenase complex dihydrolipoamide acetyltransferaseCa. L. americanus69.92.55E-72 K00627
trypsin-like peptidase domain-containing protein, DegP*Ca. L. americanus64.16.30E-216 K04771
hypothetical protein*Ca. L. americanus68.11.18E-72
DNA polymerase III subunit chi*Ca. L. europaeus67.93.05E-19
PAS domain S-box protein*Ca. L. americanus72.90
hypothetical protein, COG5462*Ca. L. americanus65.57.63E-28
Flp pilus assembly protein, secretin CpaCCa. L. americanus74.86.17E-208 K02280
Ribosomal protein L10*Ca. L. americanus752.66E-47 K02864
preprotein translocase subunit SecG*Ca. L. americanus60.31.51E-37 K03075
hypothetical protein*Ca. L. americanus70.87.00E-236
pilus assembly protein, TadG/RcpC*Ca. L. americanus51.67.22E-23
DUF1217 domain-containing protein; similar to FlgF*Ca. L. americanus68.35.47E-145
50S ribosomal protein L9Ca. L. americanus65.74.80E-20 K02939
pilus assembly protein, N-terminal domain-containing, CpaI*Ca. L. americanus44.84.58E-36
hypothetical protein*Ca. L. americanus70.52.66E-85 K11719
outer membrane protein assembly factor BamA*Ca. L. americanus70.62.69E-306 K07277
RIP metalloprotease RseP*Ca. L. americanus62.42.18E-118 K11749
M23 family metallopeptidaseCa. L. americanus61.96.15E-109
disulfide bond formation protein B*Ca. L. americanus73.78.12E-78
peptidylprolyl isomerase*Ca. L. americanus69.31.56E-89 K03771

Proteins that are consistently the most divergent (top 10% of dN values) for all Liberibacter pairwise comparisons with Ca. L. capsica are indicated with an asterisk (*). Proteins that are not marked with an asterisk are consistently the highest values of dN (top 10%) for all Liberibacter pairwise comparisons except for the most divergent species, L. crescens. Protein descriptions, closest sequence match, identity, and E values are based on BLASTP against the NCBI nr database.

KO term based on BlastKOALA‘s best hit.

FIG 4

Phylogenetic analyses of Tad pilus genes in Liberibacter taxa and relatives for (A) TadG, (B) CpaI, and (C) CpaC genes. Branches are colored according to their bacterial classes within the Proteobacteria, and ranges of bootstrap values are indicated by filled or open circles as indicated in the key. See Fig. S1, S2, and S3 in the supplemental material for trees with detailed taxa names and accession numbers.

Box and whisker plots showing the divergence between 1:1 orthologs of Ca. L. capsica and other Liberibacter species based on nonsynonymous substitutions per nonsynonymous site (dN). All points shown are outliers, and red points are dN values for proteins that are consistently in the top 10% per species pair for all Liberibacter species comparisons. All dN values are shown per pairwise comparison (N = 443). Phylogenetic analyses of Tad pilus genes in Liberibacter taxa and relatives for (A) TadG, (B) CpaI, and (C) CpaC genes. Branches are colored according to their bacterial classes within the Proteobacteria, and ranges of bootstrap values are indicated by filled or open circles as indicated in the key. See Fig. S1, S2, and S3 in the supplemental material for trees with detailed taxa names and accession numbers. Annotations of the most diverged one-to-one orthologs between Ca. L. capsica and other Liberibacter species Proteins that are consistently the most divergent (top 10% of dN values) for all Liberibacter pairwise comparisons with Ca. L. capsica are indicated with an asterisk (*). Proteins that are not marked with an asterisk are consistently the highest values of dN (top 10%) for all Liberibacter pairwise comparisons except for the most divergent species, L. crescens. Protein descriptions, closest sequence match, identity, and E values are based on BLASTP against the NCBI nr database. KO term based on BlastKOALA‘s best hit. The remaining four proteins that may be involved in host-microbe interactions and were consistently found in the top 10% of dN values are SecG, DegP, RseP, and a M23 family metallopeptidase protein (Table 3; Fig. S4). SecG is associated with the SecYEG complex, which is a Sec-dependent secretory pathway that is important for translocating virulence factors in intracellular pathogens (19). The two proteins DegP and RseP are known to protect bacterial pathogens from temperature and other stressors in the host environment (20, 21). The DegP protein was identified with significant domain hits to the periplasmic serine protease and PDZ domains, and the RseP protein was identified with significant domain hits to the peptidase M50 and PDZ domains based on significant NCBI conserved domain hits (Table 3). The M23 family metallopeptidase protein was identified with significant domain hits to the NlpD domain and to the M23 peptidase superfamily based on significant NCBI conserved domain hits (Table 3). This M23 family metallopeptidase protein is known to be important for lysing bacterial cell wall peptidoglycan for strain competition and for peptidoglycan turnover, and it was also identified as a virulence factor for Pseudomonas aeruginosa (22).

DISCUSSION

Here, we identify patterns of genome-wide sequence evolution in Ca. L. capsica, including a group of proteins that are diverging rapidly (Table 3; Fig. S4). Several of these proteins are known to be important for host-microbe interactions and pathogenesis in Liberibacter (6, 17, 18, 23) and therefore have potential to further inform us on the evolution of Liberibacter-host interactions. Moreover, protein changes in these candidate genes may have major implications for the lineage-specific adaptation of Ca. L. capsica to its host environment(s). Currently, Russelliana capsici is the only known psyllid host of Ca. L. capsica. The psyllid R. capsica has been observed to feed on Capsicum annuum across its native distribution in Brazil (Minas Gerais, Paraná, Santa Catarina, São Paulo) and Argentina (Buenos Aires, Entre Ríos) (24). As such, it will be of interest for future studies to determine if this new Ca. Liberibacter taxa can infect plant tissue as well and potentially be a plant pathogen and/or commensal on the psyllid’s host plant (5). Interestingly, four proteins (CpaC, CpaD, CpaI, and TadG) that work together to form the Tad pilus complex are rapidly diverging in Ca. L. capsica’s genome compared to the majority of Liberibacter taxa (Table S4). The Tad pilus complex is hypothesized to be important in the attachment and colonization of psyllid midguts because the Tad pilus genes are conserved in all Liberibacter taxa, and these Tad pilus genes are expressed more highly in psyllids compared to plants (17). Recent findings in Cai et al. (18) suggest that the Tad pilus complex may also be important in DNA uptake in both psyllid and plant host environments due to the demonstration of twitching motility in Liberibacter and the fact that Liberibacter taxa have lost the machinery to make their own nucleotides. Here, two Tad pilus associated proteins that we identified to be highly diverging in Ca. L. capsica are associated with the outer membrane of the Tad pilus complex (Table S4). Specifically, CpaC is the outer membrane secretin complex, and CpaD is linked to CpaC on the outer membrane and helps assemble CpaC (17, 25). As for the remaining two Tad pilus proteins that are diverging in Ca. L. capsica, the TadG protein may serve as an anchor for the Flp pilus, and CpaI is hypothesized to be important in the Tad pilus assembly in Liberibacter (17). It will be of interest for future studies to determine if these amino acid changes have changed the function, binding specificity, and/or role of the Tad pilus complex in Ca. L. capsica compared to other Liberibacter taxa. Other proteins that are highly diverged in Ca. L. capsica, such as DegP and RseP, are important for a microbe’s adaptation to the host environment because of their role in stress response. For example, DegP is a periplasmic serine protease that is essential for survival in high temperature environments (26). DegP has also been associated in virulence in Salmonella typhimurium because degP mutants are unable to survive and replicate in host cell tissues due to oxidative stress (27). The protein DegP is regulated by the alternative sigma factor E (RpoE) which responds to extracellular stress, such as heat shock and other stressors that result in misfolded proteins (28, 29). The other divergent protein, RseP, is an integral membrane zinc-containing site-2 metalloprotease and is important in activating the sigma factor E (RpoE) by cleaving the anti-sigma factor RseA with DegS (21). Currently, the roles of RseP, DegP, and the alternative sigma factor in Liberibacter taxa are unknown; however, we hypothesize that these proteins may be important for the successful colonization of both psyllid and plant host tissues. The Sec-dependent secretion systems in Liberibacter are hypothesized to be important in the secretion of pathogenesis factors, as Liberibacter does not encode a Type III secretion system (23). In this study, we found that SecG is highly divergent in Ca. L. capsica compared to other Liberibacter species (Table 3; Fig. S4). The protein SecG forms a complex with SecYE and stimulates translocation in bacteria, especially at low temperatures (30). Previous studies in Liberibacter have revealed that Sec-dependent proteins likely play a different role in their psyllid versus plant host environments. For example, differential expression of Sec-dependent complex proteins and effectors in Ca. L. asiaticus was observed between psyllid and plant host tissues (6, 23). In regard to the SecYEG complex, Yan et al. (23) revealed that secE and secD/F mRNAs were both upregulated in sweet orange (Citrus sinensis) compared to the Asian citrus psyllid (Diaphorina citri) (23). It will be of interest for future studies in Liberibacter to determine how amino acid mutations in Sec-dependent complex proteins, such as SecYEG complex, may impact Liberibacter pathogenesis and host-microbe interactions. Shared and unique orthologous protein clusters were also identified among Liberibacter taxa in this study (Fig. 1). As expected, most of the shared orthologous protein clusters are single copy genes, primarily under purifying selection, that are important in both housekeeping functions and environmental response (Fig. 1B; Table S2 and S3). This phenomenon of widespread purifying selection for orthologous genes has been documented in many host-restricted insect endosymbionts with reduced genomes, such as Buchnera, Blochmannia, Carsonella, and Wolbachia (31–33), potentially because these genes are critical for the life and function of these microbes as mutualists or pathogens. Most orthologous gene cluster functions that are unique to each Liberibacter lineage have no known relatives and/or are hypothetical or unknown. It is currently unclear if this orphan gene status in Liberibacter lineages is due to limited sampling effort in databases, fast evolving genes, and/or if these genes recently formed de novo. Nevertheless, we identified an orthologous MFS transporter gene cluster in Ca. L. americanus that is unique to this Liberibacter lineage and appears to have been acquired from a donor that is closely related to Wolbachia. These results suggest that these genes may have been horizontally transferred from Wolbachia into Ca. L. americanus’s genome. Alternatively, the MFS transporter genes may have been lost from all known Liberibacter lineages but retained in Ca. L. americanus. Deeper sampling of related genomes can help resolve this evolutionary relationship. The Wolbachia lineages that were most closely related to these putative horizontally transferred MFS transporters are harbored by insect, nematode, and spider hosts (Fig. 3). These MFS transporters, such as ProP and OusA, are important in protecting cells from dehydration under hyperosmotic stress by importing proline and glycine betaine into the cells. As a result, Ca. L. americanus may be more robust to osmotic stress compared to other Liberibacter lineages. In summary, the draft genome of the new Liberibacter species Ca. L. capsica was opportunistically derived from a limited amount of insect metagenomic material (5), and we further refined and annotated Ca. L. capsica’s draft genome here. Moreover, we estimated the rates of molecular evolution between Ca. L. capsica and five other Liberibacter species (Ca. L. asiaticus, Ca. L. psyllaurous, Ca. L. americanus, Ca. L. africanus, and L. crescens) to identify important protein candidates in Ca. L. capsica that may be involved in insect-plant interactions. Future research will extend the results from this study to further understand the genetics of Ca. L. capsica and, more broadly, the evolution of Liberibacter lineages.

MATERIALS AND METHODS

Identification of Ca. Liberibacter capsica contigs and proteins.

The metagenome of Ca. Liberibacter capsica was assembled previously in Kwak et al. (5) (accession number: SRR15069915). To further refine and annotate metagenomic contigs identified in Kwak et al. (5), we used Prokka v1.14.5 (34) to identify putative Ca. Liberibacter capsica proteins, followed by a BLASTP analysis against the NCBI nr database (downloaded on 04/2021) using DIAMOND v.2.0.13 (35) with an E value cutoff of 10e-10. Criteria for establishing if annotated proteins belong to Ca. L. capsica’s genome were the following: (i) if the putative Ca. L. capsica protein’s best hit was a Ca. Liberibacter species, this protein was binned as Ca. Liberibacter capsica; (ii) if the putative Ca. L. capsica protein’s best hit was identified as a gene from a genus other than Liberibacter, it was further determined if this protein was surrounded by other Liberibacter proteins (based on BLASTP results) on the same contig. If surrounding proteins on the same contig were identified as Liberibacter proteins, then this protein was binned as Ca. Liberibacter capsica; and (iii) if the putative Ca. L. capsica protein was the only gene on a contig and had a best hit to a bacterium that belongs to Alphaproteobacteria with a percent identity less than 95%, it was binned as a Ca. Liberibacter capsica candidate gene. The final contigs that possessed proteins that were binned as Ca. Liberibacter capsica were submitted to NCBI genome as a draft Ca. Liberibacter capsica genome (accession number: JAMJGA000000000). Emboss 6.6.0 (36) infoseq was used on the final draft of the Ca. Liberibacter capsica genome to determine total nucleotide length.

Identification of orthologous clusters in Ca. L. capsica.

Protein orthologs were identified in Ca. L. capsica using all of the Liberibacter species with fully sequenced genomes that were available in NCBI. These species include Ca. L. asiaticus (NZ_CP019958.1), Ca. L. psyllaurous (also known as Ca. L. solanacearum) (NC_014774), Ca. L. americanus (NC_022793.1), Ca. L. africanus (NZ_CP004021.1), and L. crescens (NZ_CP010522.1). For standardizing protein and nucleotide coding sequence data sets for the analysis of orthologous genes among the latter Liberibacter species, we used Prokka v1.14.5 (33) to identify and annotate proteins and nucleotide coding sequences for each Liberibacter and Ca. Liberbacter species. Orthologous clusters of proteins from Ca. L. capsica, Ca. L. asiaticus, Ca. L. psyllaurous, Ca. L. americanus, Ca. L. africanus, and L. crescens were determined using OrthoVenn2 (37), using the default settings. Orthologous clusters were further assigned GO terms (38, 39) and Swiss-Prot terms (40, 41) using OrthoVenn2 (37), using the default settings. All Ca. L. capsica proteins assigned to clusters were annotated further using BlastKOALA (42), Revigo (43), and NCBI BLAST (16) against nr with an E value cutoff of 10e-10. For the phylogenetic analyses of putative horizontally transferred genes and highly diverging Tad pilus genes, sequences were retrieved from NCBI GenBank and from this study, and they were aligned with MAFFT v7.505 (44). Phylogenetic tree construction was conducted with RAxML v8.2.12 (45), using the GTR substitution model and default bootstrap parameters.

Estimating rates of molecular evolution in Ca. L. capsica proteins.

To estimate the rates of molecular evolution between Ca. L. capsica and the other five Liberibacter species (Ca. L. asiaticus, Ca. L. psyllaurous, Ca. L. americanus, Ca. L. africanus, and L. crescens), we calculated the nonsynonymous substitutions per nonsynonymous site (dN) and the synonymous substitutions per synonymous site (dS) for one-to-one coding sequence orthologs from OrthoVenn2 (37). Similar to Degnan et al. (46), nucleotide sequences for each set of orthologous coding sequences were aligned using MAFFT (44). Then, all gaps and stop codons were removed, and pairwise estimates of (dN) and (dS) were calculated in Paml (47) based on the method of Goldman and Yang (48). Gene pairs were excluded when dS was saturated (dS ≥ 3.0).

Data availability.

The draft genome assembly of Ca. Liberibacter capsica curated here was submitted to NCBI Genome under the accession number JAMJGA000000000. The raw metagenomic data of Ca. Liberibacter capsica can be found here (accession number: SRR15069915; [5]). The Prokka annotation GFF file and protein fasta files for the draft genome of Ca. Liberibacter capsica are in Table S5 and Table S6, respectively.
  43 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  Crystal structure of DegP (HtrA) reveals a new protease-chaperone machine.

Authors:  Tobias Krojer; Marta Garrido-Franco; Robert Huber; Michael Ehrmann; Tim Clausen
Journal:  Nature       Date:  2002-03-28       Impact factor: 49.962

Review 3.  Sec-translocase mediated membrane protein biogenesis.

Authors:  Ross E Dalbey; Minyong Chen
Journal:  Biochim Biophys Acta       Date:  2004-11-11

4.  UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View.

Authors:  Emmanuel Boutet; Damien Lieberherr; Michael Tognolli; Michel Schneider; Parit Bansal; Alan J Bridge; Sylvain Poux; Lydie Bougueleret; Ioannis Xenarios
Journal:  Methods Mol Biol       Date:  2016

5.  The Citrus Huanglongbing Crisis and Potential Solutions.

Authors:  Nian Wang
Journal:  Mol Plant       Date:  2019-04-01       Impact factor: 13.164

6.  Systematics, biogeography and host-plant relationships of the Neotropical jumping plant-louse genus Russelliana (Hemiptera: Psylloidea).

Authors:  Liliya Serbina; Daniel Burckhardt
Journal:  Zootaxa       Date:  2017-05-12       Impact factor: 1.091

7.  A codon-based model of nucleotide substitution for protein-coding DNA sequences.

Authors:  N Goldman; Z Yang
Journal:  Mol Biol Evol       Date:  1994-09       Impact factor: 16.240

8.  RseP (YaeL), an Escherichia coli RIP protease, cleaves transmembrane sequences.

Authors:  Yoshinori Akiyama; Kazue Kanehara; Koreaki Ito
Journal:  EMBO J       Date:  2004-10-21       Impact factor: 11.598

9.  Genome-wide analyses of Liberibacter species provides insights into evolution, phylogenetic relationships, and virulence factors.

Authors:  Shree P Thapa; Agustina De Francesco; Jessica Trinh; Fatta B Gurung; Zhiqian Pang; Georgios Vidalakis; Nian Wang; Veronica Ancona; Wenbo Ma; Gitta Coaker
Journal:  Mol Plant Pathol       Date:  2020-02-28       Impact factor: 5.663

10.  OrthoVenn2: a web server for whole-genome comparison and annotation of orthologous clusters across multiple species.

Authors:  Ling Xu; Zhaobin Dong; Lu Fang; Yongjiang Luo; Zhaoyuan Wei; Hailong Guo; Guoqing Zhang; Yong Q Gu; Devin Coleman-Derr; Qingyou Xia; Yi Wang
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.