Literature DB >> 28770027

The genome of the cotton bacterial blight pathogen Xanthomonas citri pv. malvacearum strain MSCT1.

Kurt C Showmaker1,2, Mark A Arick1, Chuan-Yu Hsu1, Brigitte E Martin3, Xiaoqiang Wang2, Jiayuan Jia2, Martin J Wubben4, Robert L Nichols5, Tom W Allen6, Daniel G Peterson1,7, Shi-En Lu2.   

Abstract

Xanthomonas citri pv. malvacearum is a major pathogen of cotton, Gossypium hirsutum L.. In this study we report the complete genome of the X. citri pv. malvacearum strain MSCT1 assembled from long read DNA sequencing technology. The MSCT1 genome is the first X. citri pv. malvacearum genome with complete coding regions for X. citri pv. malvacearum transcriptional activator-like effectors. In addition functional and structural annotations are presented in this study that will provide a foundation for future pathogenesis studies with MSCT1.

Entities:  

Keywords:  Bacterial blight; Cotton; Long read DNA sequencing; TAL effectors; Xanthomonas citri pv. malvacearum

Year:  2017        PMID: 28770027      PMCID: PMC5525278          DOI: 10.1186/s40793-017-0253-3

Source DB:  PubMed          Journal:  Stand Genomic Sci        ISSN: 1944-3277


Introduction

pv. malvacearum is the causal agent of bacterial blight of cotton ( L.). pv. malvacearum infects plant tissues and organs of cotton during all stages of development beginning with the seedling stage [1]. Typical disease symptoms caused by pv. malvacearum include cotyledon/seedling blight, angular leaf spot, systemic vein blight, black arm (of petioles and main stems), boll shedding, and internal boll rot [1]. Histology studies reported that the host cotton plant cells begin to degenerate 3 days post-infection [2]. Over the 3 day period the degradation of host cells begins by; first, the host tissue appearing to loosen, then the granal and stromal membranes of the chloroplasts disappear, followed by the degeneration of the chloroplast and other organelles [2, 3]. At 6 days post-infection, cellular degeneration along with the production of a hydrophilic extracellular polymeric substance by the bacterium, causes water to accumulate in the infected tissues forming lesions known as “water soaked spots”, a classical plant pathogen-associated symptom [2-4]. Resistance to pv. malvacearum has been identified in cotton, as well as additional species. Currently, most lines resistant to pv. malvacearum exist in G. hirsutum cultivars since breeding for pv. malvacearum resistance has been ongoing since 1939 [5] and continues today as G. hirsutum cultivars and germplasm releases are screened for pv. malvacearum resistance [6-8]. At least 18 genes participate in resistance to pv. malvacearum [1, 9]. The ability of the pv. malvacearum strains to escape specific resistance genes resulted in a classification scheme of races. To date, 22 races have been reported and assigned numerical names (i.e. 1 to 22) [9]. Most races are geographically distinct. Of note, bacterial blight in the U.S. is predominantly caused by race 18. Genetic resistance within cotton cultivars is generally attributed to a certain race or multiple races of pv. malvacearum. The ability of G. hirsutum to mount a defense response to pv. malvacearum is, at least in some cases, dependent upon the transcription activator-like effector avrBs3/pthA gene family in pv. malvacearum indicating the presence of a gene-for-gene relationship in pv. malvacearum-G. hirsutum interactions [9, 10]. With the ever increasing understanding of the importance of TAL effectors in pathogenesis [11-13], the objective of this study was to generate the first genome sequence for a pv. malvacearum strain that contains the TAL effector complement to serve as a foundation for a better understanding of the pv. malvacearum-G. hirsutum interaction. To date, four draft genomes of pv. malvacearum have been published. However, all sequenced pv. malvacearum isolates were obtained from outside the United States [14, 15]. The diversity of the four previously reported draft genomes includes two race 18 isolates, one race 20 isolate, and a highly virulent strain. The project described here was undertaken to provide the first pv. malvacearum genome sequence from the Mid-South region of the United States, a major production area of upland cotton. The isolate, MSCT1, was isolated during the 2011 outbreak of pv. malvacearum in the Mississippi Delta (i.e. Mississippi river’s alluvial plain). This outbreak resulted in the greatest estimated pv. malvacearum-based losses (52,000 bales) in Arkansas and Mississippi as reported by the National Cotton Council Disease Database [16]. This study was undertaken to generate a genome sequence for the pv. malvacearum strain MSCT1 to identify protein candidates that may be involved in the pathogenesis of bacteria bight of cotton. The genome sequence will also serve as a template for which further studies of genetic diversity of pv. malvacearum in the United States can be conducted.

Organism information

Classification and features

pv. malvacearum has gone through a series of name changes over time as additional information has been learned about its biology and genetics. In chronological order, pv. malvacearum has previously been classified as malvacearum, Bacterium malvacearum, and malvacearum [9]. In 2009, Ah-You et al. assigned the pv. malvacearum moniker [9, 17]. pv. malvacearum is a motile, Gram-negative, rod-shaped bacterium that produces yellow, copiously mucoid, wet, shining growth on 2% w/v peptone-sugar agar [1]. pv. malvacearum, like other species (xanthomonads), produces the heteropolysaccharide xanthan [4]. Additional characteristics of pv. malvacearum are provided in Table 1.
Table 1

Classification and general features of Xanthomonas citri pv. malvacearum strain: MSCT1 [75]

MIGS IDPropertyTermEvidence codea
ClassificationDomain Bacteria TAS [76]
Phylum Proteobacteria TAS [77]
Class Gammaproteobacteria TAS [78]
Order “Xanthomonadales” TAS [79]
Family “Xanthomonadaceae” TAS [79]
Genus Xanthomonas TAS [80]
Species Xanthomonas citri TAS [17]
Pathovar malvacearum strain: MSCT1
Gram stain Negative TAS [1]
Cell shape Rod TAS [1]
Motility Motile TAS [1]
Sporulation Not reported
Temperature range 10-38 °C TAS [1, 81]
Optimum temperature 25-30 °C TAS [1, 81]
pH range; Optimum Optimum 6.0 TAS [1]
Carbon source Glucose, sucrose, fructose, arabinose, galactose, maltose, cellobiose, and glycerol TAS [1]
MIGS-6Habitat Plant-associated TAS [1]
MIGS-6.3Salinity Not reported
MIGS-22Oxygen requirement Not reported
MIGS-15Biotic relationship Parasitic TAS [1]
MIGS-14Pathogenicity Pathogenic IDA
MIGS-4Geographic location Mississippi, USA IDA
MIGS-5Sample collection 2011 IDA
MIGS-4.1Latitude Not Reported
MIGS-4.2Longitude Not Reported
MIGS-4.4Altitude Not Reported

aEvidence codes - IDA inferred from direct assay, TAS traceable author statement (i.e., a direct report exists in the literature), NAS non-traceable author statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [82]

Classification and general features of Xanthomonas citri pv. malvacearum strain: MSCT1 [75] aEvidence codes - IDA inferred from direct assay, TAS traceable author statement (i.e., a direct report exists in the literature), NAS non-traceable author statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [82] For specimen isolation, cotton leaves with the typical blight symptoms (Fig. 1) were collected from a field located north of Yazoo City, Mississippi in Yazoo County, during the 2011 growing season. Strain MSCT1 was isolated using a routine method for foliar bacterial pathogens. In brief, the disease lesions were cut into small pieces (3 × 3 mm) from the junction of diseased and healthy tissues. The cut pieces were transferred into a sterile 1.5 ml microcentrifuge tube and surface-sterilized using 10% sodium hypochlorite (bleach; Clorox) for 1 min. The sterilized tissues were washed twice using sterile water, and then stabbed with a sterile lab needle in 200 μl of sterile water. A full loop of the resulting bacterial suspension was streaked on nutrient broth-yeast extract plates [18]. The streaked nutrient broth-yeast extract plates were incubated at 20 °C for 2 days under ambient laboratory temperatures and a 16:8 day: night photoperiod. Single colonies of the resulting bacterium were isolated in a sterilized loop and streaked onto fresh NBY plates for purification. The pathogenicity of MSCT1 to cause bacterial blight of cotton was confirmed by fulfilling Koch’s Postulates. Briefly, cotton seedlings (cotton cultivar PHY499WRF) were grown in the greenhouse until they reached the three-leaf growth stage. A vacuum system (20″ psi for 10 s) was used to inoculate the seedling leaves with a suspension of MSCT1 (OD 0.3 at 420 nm) suspended in sterile phosphate buffer (0.01 M; pH 7.0). After 10 days the characteristic symptoms of bacterial blight were observed on the inoculated leaf tissues. The pv. malvacearum strain MSCT1 that is described in this manuscript was deposited in the USDA Agricultural Research Service Culture Collection under deposition number NRRL B-65440. The isolate MSCT1 was confirmed to be pv. malvacearum based on the 16S rRNA sequence analysis, as described previously [19]. Multilocus sequence typing was used to construct a phylogenic tree for strains based upon three genes from the MLST described by Ah-You et al. 2009 [17], and included; atpD coding ATP synthase β chain, dnaK coding heat shock protein 70, and gyrB coding the gyrase subunit β (Fig. 2). A transmission election micrograph of MSCT1 was generated by the Mississippi State University’s Institute for Imaging & Analytical Technologies (Fig. 3).
Fig. 1

Top (a) and bottom (b) of a cotton leaf displaying the bacterial blight disease symptom caused by Xanthomonas citri pv. malvacearum

Fig. 2

The phylogenetic tree indicating current placement of the source organism. The phylogenetic tree was constructed based on the sequences of genes coding for ATP synthase β chain (atpD), heat shock protein 70 (dnaK), and gyrase subunit β (gyrB) for Xanthomonas species. MAFFT (version 7) [85] was used to align the sequences; the evolutionary history was inferred by using the Maximum Likelihood, with 100 bootstraps, method based on the Tamura-Nei model [86] with MEGA6 [87] software. Sequence identifiers of each subunit are as reported by Ah-You et al. 2009 [17]. Type (T) and Pathovar Type (PT) strains are noted in superscript

Fig. 3

Transmission election micrograph of Xanthomonas citri pv. malvacearum strain MSCT1

Top (a) and bottom (b) of a cotton leaf displaying the bacterial blight disease symptom caused by Xanthomonas citri pv. malvacearum The phylogenetic tree indicating current placement of the source organism. The phylogenetic tree was constructed based on the sequences of genes coding for ATP synthase β chain (atpD), heat shock protein 70 (dnaK), and gyrase subunit β (gyrB) for Xanthomonas species. MAFFT (version 7) [85] was used to align the sequences; the evolutionary history was inferred by using the Maximum Likelihood, with 100 bootstraps, method based on the Tamura-Nei model [86] with MEGA6 [87] software. Sequence identifiers of each subunit are as reported by Ah-You et al. 2009 [17]. Type (T) and Pathovar Type (PT) strains are noted in superscript Transmission election micrograph of Xanthomonas citri pv. malvacearum strain MSCT1

Genome sequencing information

Genome project history

The MSCT1 sequencing project arose from the 2011 outbreak of bacterial blight in the cotton growing regions of the Mississippi Delta. Following MSCT1 isolation, additional testing determined that MSCT1 was capable of producing disease symptoms on several cultivars of upland cotton commonly grown in the Mid-South. Preliminary bioinformatics investigations determined pv. malvacearum assemblies, generated from short reads, lacked detectable TAL effectors in their genomes, although TAL effectors have been previously described as occurring in pv. malvacearum [20-22]. To better understand the pathology of pv. malvacearum, and more specifically of isolate MSCT1, we conducted a long read genome sequencing project to identify MSCT1’s effector complement, including the TAL effectors that do not assemble well with short read DNA sequencing technology. The resultant complete genome sequence has been deposited in the NCBI genome database under genome assembly accession GCF_001719155.1.

Growth conditions and genomic DNA preparation

An MSCT1 colony was isolated from a LB plate (pectone 10 g/L, yeast extract 5 g/L sodium chloride 10 g/ L, agarose 15 g/L) and used to inoculate 1.5 ml of LB medium (pectone 10 g/L, yeast extract 5 g/L sodium chloride 10 g/L) in a sterile, plastic culture tube. The culture tube was placed at 25 °C with 200 rpm orbital shaking overnight. The resulting bacterial culture was pelleted by centrifugation at 5000 rpm for 10 min. The pellet was washed twice to remove LB medium; each wash consisted of resuspending the pellet in 1 ml of phosphate buffered saline (PBS; NaCL 8 g/L, KCl 0.2 g/L, Na2HPO4 1.42 g/L, KH2PO4 0.24 g/L), centrifuging the suspension at 5000 rpm for 10 min, and discarding the supernatant. Genomic DNA was isolated using a modified version of the method described in Chen and Kuo 1993 [23]. Briefly, the cell pellet was resuspended in 300 μl of extraction buffer (40 mM Tris-HCl, 1 mM EDTA, 1% w/v SDS, pH 7.8). After adding 50 μl of 10 mg/ml lysozyme (Sigma-Aldrich; St. Louis, MO, USA), the cell suspension was incubated at 37 °C for 30 min with occasional mixing until the cell suspension became clear. The bacterial nucleic acid sample was further purified using a series of phenol, phenol/chloroform, and chloroform extraction steps, then precipitated with two volumes of 100% ethanol. DNA was pelleted by centrifugation at 12,000 rpm for 10 min. After two washes with 70% ethanol (v/v), the nucleic acid pellet was air-dried (approximately 15 min). The pellet was then dissolved in 50 μl of 10 mM Tris buffer (pH 7.5). The bacterial nucleic acid sample was treated with 20 μl of 30 mg/ml RNase A (Sigma-Aldrich; St. Louis, MO, USA) at 37 °C for 20 min, followed by phenol/chloroform and chloroform extraction steps to remove the enzyme. The DNA was precipitated with 100% ethanol and cleaned with 70% ethanol as described above. The air-dried genomic DNA pellet was dissolved in 50 μl of 10 mM Tris buffer (pH 7.5). The resultant DNA was visualized on a 0.8% w/v agarose gel.

Genome sequencing and assembly

Two long read technologies, PacBio (Pacific Biosciences of California, Melon Park, CA, USA) and Nanopore (Oxford Nanopore Technologies, Oxford, UK), were used to sequence MSCT1. A 20 kb PacBio library was prepared and sequenced on two P6-C4 SRMT cells at the University of Delaware Sequencing & Genotyping Center (Newark, DE, USA). Additionally, a Nanopore library was prepared and sequenced on a R9 Nanopore flowcell at the Mississippi State Institute for Genomics, Biocomputing, and Biotechnology (Mississippi State, MS, USA). The PacBio and Nanopore reads were assembled with the Canu long read assembler [24]. The resultant contigs from the assembly were aligned against themselves with blastn to identify the overlapping ends of the assembly for circularization of the DNA molecules. Following circularization, open reading frames (ORFs) were predicted with the getorf program within the ESBOSS software package [25] and the dnaA coding region for the protein was identified with blastn [26]. The chromosome was rearranged to place the start of the molecule 41 bp from the start of the dnaA coding region. The plasmid molecules were rearranged to put the resultant ends of the circularization within the middle of the molecule while allowing the new cut sight to fall outside a predicted ORF. To ensure the circulation was correct PacBio reads longer than 4000 bp were aligned to the circularized assembly with blasr [27] and manually checked with IGV [28, 29]. For additional error correction, an Illumina PCR-free DNA library with a DNA insert size of 416 bp was prepared at the Institute of Genomics, Biocomputing and Biotechnology (Mississippi State, MS, USA). The Illumina library was paired-end sequenced (2 × 300 bp) using the Illumina MiSeq. The short read pairs were trimmed with Trimmomatic [30] and subsequently used to error correct the Canu assembly with Pilon [31]. After Pilon error correction, the resultant assembly was polished with 20 kb PacBio reads using the Quiver algorithm within the PacBio SMRT Analysis software suite (version 2.3.0.140936, Pacific Biosciences of California). The Minimum Information about a Genome Sequence specification was used to report the MSCT1 genome sequencing and assembly methods (Table 2) [32].
Table 2

Project information

MIGS IDPropertyTerm
MIGS 31Finishing qualityComplete genome
MIGS-28Libraries usedPaired-end (Illumina), Pacbio 20 kb, Nanopore
MIGS 29Sequencing platformsIllumina MiSeq, PacBio, Nanopore
MIGS 31.2Fold coverage2378.74X Total, 1820.26X Illumina, 516.58 PacBio, 41.90 Nanopore
MIGS 30AssemblersCanu v1.3, Pilon v1.17, Quiver v2.3.0
MIGS 32Gene calling methodNCBI Prokaryotic Genome Annotation Pipeline
Locus TagBGK55
Genbank IDGCA001719145.1
GenBank Date of Release06-SEP-2016
GOLD IDGp0177725
BIOPROJECTPRJNA299817
MIGS 13Source Material IdentifierMSCT1
Project relevanceAgricultural
Project information

Genome annotation

Proteins and noncoding RNAs (including rRNA, tRNA, ncRNA) were predicted with the NCBI Prokaryote Genome Annotation Pipeline [33]. Clusters of Orthologous Groups annotation of the predicted proteins against the COG position-specific scoring matrices downloaded from the NCBI Conserved Domain Database was conducted with RSP-BLAST [34-36]. InterProScan V51.0 was used to add Pfam annotations using the Pfam applet [37]. Signal peptides and transmembrane helices were predicted with SignalP [38] and TMHMM [39], respectively. Clustered regularly-interspaced short palindromic repeats sequences were identified using CRISPRFinder [40]. Plant inducible promoter sequences in the promoter region (both strands) of genes were identified with the regular expression ‘TTCGN [16] TTCG’, where N is any nucleotide, as described by Lee et al. 2005 [41-43]. EffectiveDB was used to determine if MSCT1 contains functional T3SS, T4SS, and T6SS secretory systems. EffectiveDB also identified eukaryotic-like domains, potential T3SS, and potential T4SS secreted proteins in the MSCT1 predicted proteome. Additionally, blastp was used to align the proteins of the MSCT1 predicted proteome to the 502 proteins representing 53 effector families of species found in the effector database (Xanthomonas.org) [34]. Transcription activator-like effectors and Repeat Variable Diresidues were predicted with AnnoTALE [44]. TALgetter [45] was used to identify the DNA target domain on the G. hirsutum line TM − 1 promoterome [46].

Genome properties

The MSCT1 long read assembly had a sum length of 5,123,946 bp distributed along one large circular chromosome 5 Mb (Fig. 4) in length and 3 circular plasmids (60, 44, and 15 kb in length) (Table 3). Sequencing depth was 558.48 genome equivalents for the long read sequencing technology and 1820.26 genome equivalents for the Illumina PCR-free DNA library (Table 2). Dot plots determined the MSCT1 chromosome exhibited a high degree of sequence similarity to the circular chromosomes reported in previous pv. malvacearum assemblies (Fig. 5). A total of 4410 genes were predicted for MSCT1 including 4102 protein coding, 95 rRNA, and 213 pseudogenes (Table 4). The NCBI Prokaryotic Genome Annotation Pipeline added functional annotation to 2843 proteins.
Fig. 4

The genomic map of MSCT1 Chromosome 1. The outer and inner dark blue rings represents protein coding genes on the (+) and (−) strands, respectively. The light red, green and blue rings represent blastn alignments to MSCT1 against X. citri pv. malvacearum strains; R18 from Nicaragua (GCF_000309905.1), R18 from Burkina Faso (GCF_000454505.1), R20 from Burkina Faso (GCF_000454525.1), respectively. The black ring represents the gc content, while the inner green and purple ring represents the gc skew. The genomic map was created with cgview [88]

Table 3

Summary of genome: one chromosome and 3 plasmids

LabelSize (Mb)TopologyINSDC identifierRefSeq ID
Chromosome5.0CircularCP017020.1NZ_CP017020.1
pMSCT15kb15,263 (bp)CircularCP017021.1NZ_CP017021.1
pMSCT44kb43,946 (bp)CircularCP017022.1NZ_CP017022.1
pMSCT60kb60,533 (bp)CircularCP017023.1NZ_CP017023.1
Fig. 5

Dot plot of X. citri pv. malvacearum strain MSCT1 chromosome (NZ_CP017020.1) (X-Axis) compared to X. citri pv. malvacearum strain X18 (NZ_CM002136.1) (left, Y-Axis) and X. citri pv. malvacearum strain X20 (NZ_CM002029.1) (right, Y-Axis) Chromosomes. Dot plot produced with YASS web server using default settings [89]

Table 4

Genome statistics

AttributeValue% of Total
Genome size (bp)5,123,946100.00
DNA coding (bp)4,365,46885.20
DNA G + C (bp)3,313,79164.67
DNA scaffolds4100.00
Total genes4410100.00
Protein coding genes410293.02
RNA genes952.15
Pseudo genes2134.83
Genes in internal clusters--
Genes with function prediction337576.53
Genes assigned to COGs322873.20
Genes with Pfam domains330274.88
Genes with signal peptides55312.54
Genes with transmembrane helices91120.66
CRISPR repeats1-
The genomic map of MSCT1 Chromosome 1. The outer and inner dark blue rings represents protein coding genes on the (+) and (−) strands, respectively. The light red, green and blue rings represent blastn alignments to MSCT1 against X. citri pv. malvacearum strains; R18 from Nicaragua (GCF_000309905.1), R18 from Burkina Faso (GCF_000454505.1), R20 from Burkina Faso (GCF_000454525.1), respectively. The black ring represents the gc content, while the inner green and purple ring represents the gc skew. The genomic map was created with cgview [88] Summary of genome: one chromosome and 3 plasmids Dot plot of X. citri pv. malvacearum strain MSCT1 chromosome (NZ_CP017020.1) (X-Axis) compared to X. citri pv. malvacearum strain X18 (NZ_CM002136.1) (left, Y-Axis) and X. citri pv. malvacearum strain X20 (NZ_CM002029.1) (right, Y-Axis) Chromosomes. Dot plot produced with YASS web server using default settings [89] Genome statistics The predominate COG functional classifications were R (general function), E (amino acid transport and metabolism), M (cell wall/membrane biogenesis), and H (coenzyme transport and metabolism), representing 16.31, 11.68, 10.36, and 9.68% of the predicted proteome, respectively (Table 5). InterProScan identified 3302 proteins containing at least one Pfam domain. In total, 3375 proteins contained at least one functional annotation from either the Pfam or COG annotations (Table 4). The rRNA segments were comprised of two copies of each of the 23S, 5S, and 16S rRNA subunits. At least one tRNA for each of the 20 basic amino acids was identified in the 54 predicted tRNA loci. Transmembrane helices prediction identified 911 proteins with at least one predicted transmembrane helix. Signal peptides were identified on 553 proteins; of these, after in silico cleavage of the predicted signal peptide, 23 contained a predicted transmembrane helix leaving 530 proteins that can be secreted from the cell. A single CRISPR sequence with a sequence length of 298 bp was predicted in the genome assembly in the 27,394 to 27,692 bp region of the MSCT1 chromosome. As is common in species of multiple copies of the transposase coding genes were identified dispersed throughout the genome [47]. In total 26 transpose genes were predicted in MSCT1, making it the fourth most abundant functional annotation in the proteome (Table 6).
Table 5

Number of genes associated with general COG functional categories

CodeValue% ageDescription
J3498.51Translation, ribosomal structure and biogenesis
A10.02RNA processing and modification
K3488.48Transcription
L2085.017Replication, recombination and repair
B30.07Chromatin structure and dynamics
D1293.14Cell cycle control, Cell division, chromosome partitioning
V1684.10Defense mechanisms
T3448.39Signal transduction mechanisms
M42510.36Cell wall/membrane biogenesis
N2415.88Cell motility
U1984.83Intracellular trafficking and secretion
O2957.19Posttranslational modification, protein turnover, chaperones
C3699.00Energy production and conversion
G3668.92Carbohydrate transport and metabolism
E47911.68Amino acid transport and metabolism
F1293.14Nucleotide transport and metabolism
H3979.68Coenzyme transport and metabolism
I2666.48Lipid transport and metabolism
P3588.72Inorganic ion transport and metabolism
Q2435.92Secondary metabolites biosynthesis, transport and catabolism
R66916.31General function prediction only
S2636.41Function unknown
-87421.31Not in COGs

The total is based on the total number of protein coding genes in the genome

Table 6

Ten most represented functional annotations

AnnotationCount
Membrane protein64
TonB-dependent receptor42
MFS transporter33
Transposase26
Transcriptional regulator25
ABC transporter ATP-binding protein23
Oxidoreductase19
LysR family transcriptional regulator19
GGDEF domain-containing protein16
Two-component sensor histidine kinase15
Number of genes associated with general COG functional categories The total is based on the total number of protein coding genes in the genome Ten most represented functional annotations

Insights from the genome sequence

Functional T3SS, T4SS, and T6SS secretory systems were predicted in MSCT1. Comparison of the MSCT1 predicted proteins with previously described effectors resulted in the identification of 7 families of effectors common among species of (Table 7). These classes include AvrBs2, XopAG, XopK, XopP, XopR, XopT, and XopZ1. Effectors AvrBs2, XopK, XopP, XopR, and XopZ1, have been shown to suppress the host disease resistance response and immunity in other plant- interactions [48-54]. XopAG effector family members have been shown to be responsible for eliciting the hyper-sensitive response in grapefruit [55]. The predicted protein sequence WP_033481547.1, predicted from the MSCT1 genome, exhibited homology to AvrBs2 effector proteins from several species of and contained a predicted glycerophosphoryl diester phosphodiesterase family (PF03009) domain characteristic of the AvrBs2 effector family [10]. AvrBs2 produced by pv. vesicatoria is recognized by a NBS-LRR in peppers containing the Bs2 resistance gene; however, field strains of pv. vesicatoria have been identified that evade the recognition [56, 57].
Table 7

Xanthomad Non-TAL Effector families found in MSCT1

FamilyRefseq_IDBlastP HITNotesREF
XopAGWP_033479491.1CAP49915.1HR response in Grapefruit[55]
XopKWP_005915119.1CAP50604.1Unclear role in virulence[52, 83, 84]
XopPWP_069288200.1CAJ22867.1Suppresses immune response in rice[49]
XopRWP_005923840.1BAE70889.1Suppression of MAMP-triggered immune responses[48, 53, 54]
XopTWP_069288215.1BAE68965.1-[83]
XopZ1WP_005914471.1BAE69157.1Contributes to virulence in rice[51, 52]
AvrBs2WP_033481547.1CAJ21683.1Suppresses rice immunity[50]
Xanthomad Non-TAL Effector families found in MSCT1 EffectiveDB predicted 408 T3SS and 44 T4SS secreted proteins. MSCT1 predicted secreted proteins that have previously been associated with diseases in G. hirsutum and other plant systems include; endoglucanase [58], polygalacturonase [59], glutathione S-transferase [60], pectate lyase [61], glutathione peroxidase [62], as well as catabolic enzymes such as peptidases and lipases. These protein likely aid the mediation of the host disease response as well as the breaking down of host tissues. The PIP-box sequence was identified 78 bp up stream of the start codon for the HrpB1 gene, that indicates gene regulation via PIP targeted transcription factors are present in the MSCT1 genome. EffectiveDB also identified 22 eukaryotic-like domains among 36 MSCT1 proteins. The most represented eukaryotic-like domains were the of M13 peptidase family (PF01431 and PF05649); however, M13 peptidases are commonly identified among bacteria [63].

Extended insights

AnnoTAL identified 8 potential CDS regions in the MSCT1 genome that could potentially code for TAL effectors (Table 8). AnnoTAL did not predict any TAL sequences in the other four draft pv. malvacearum genomes reported previously (GCF_000454505.1 (strain: X18), GCF_000454525.1 (strain: X20), GCF_000309925.1 (strain: GSPB22388) and GCF_000309915.1 (strain: GSPB1386)). Interestingly, 7 of the 8 TAL effectors in pv. malvacearum MSCT1 are located on plasmids. This arrangement is in contrast to other xanthomonads such as pv. oryzae and pv. oryzicola where the vast majority of TAL effectors are located on the large chromosome. The presence of the pv. malvacearum TAL effectors in pv. malvacearum plasmids can be traced back to the initial report by De Feyter et al. 1991 [64], that described 6 avirulence genes on a 90 kb pv. malvacearum plasmid [20-22]. However, the species and species such as pv. oryzae and pv. oryzicola exhibit evolutionarily divergence and fall into different clades among the other sequenced xanthomonads in phylogenic analysis [65]. Although, the overall total number of TAL effectors found in MSCT1 (n = 8) is less than what has been previously reported for some pv. oryzae and pv. oryzicola strains it is similar to strains of [43, 47, 66].
Table 8

MSCT1 Potential TAL Effectors

TALMoleculeRefseq_IDStarta Stopa Stranda
MSCT1-TAL1pMSCT44kbWP_069288206.136,43140,3331
MSCT1-TAL2pMSCT60kbWP_069288209.116,04319,1271
MSCT1-TAL3ChromosomeWP_069288181.1b 2,568,1812,571,268-1
MSCT1-TAL4pMSCT44kbWP_069288204.115,11119,626-1
MSCT1-TAL5pMSCT60kbWP_069288212.141,40444,689-1
MSCT1-TAL6pMSCT60kbWP_069288211.144,68934,259-1
MSCT1-TAL7pMSCT60kbWP_069288210.121,87021,870-1
MSCT1-TAL8pMSCT60kbWP_069288208.135498064-1

aStart, Stop, and Strand annotations by AnnoTAL

bNCBI Annotation differs from AnnoTAL prediction, the MSCT1-TAL3 NCBI Start Codon begins at 2,570,908

MSCT1 Potential TAL Effectors aStart, Stop, and Strand annotations by AnnoTAL bNCBI Annotation differs from AnnoTAL prediction, the MSCT1-TAL3 NCBI Start Codon begins at 2,570,908 The variable dinucleotide repeats were identified in the 8 MSCT1 TAL sequences for recognition of the TAL DNA target domain with the previously reported TAL code (Table 9). Due to the inherit degeneracy nature of TAL DTD prediction [12, 45, 67–70], potential TAL DTDs reported in this study are limited to the top 2 DTD site predictions for each TAL with the additional constraint of being within 150 bp of the gene start codon. Interestingly, MSCT1 TALs (MSCT1-TAL2 and MSCT1-TAL8) with a DTD prediction had predictions on corresponding sections of the A and D sub-genomes of the G. hirsutum TM − 1 assembly [46]. However, these in silico predictions still need to be confirmed with RNA expression data from studies of G. hirsutum undergoing infection by MSCT1. Of note, no MSCT1 TAL DTD was predicted to target any promoter region on G. hirsutum chromosome 14 or 20 that contain the B 2, B 3 and B 12 genes that are a major source of resistance to pv. malvacearum [71-73].
Table 9

Repeat Variable Diresidues of MSCT1 TAL effectors

TALRepeat Variable Diresidues
MSCT1-TAL1HD-NI-NG-NI-NI-NS-NG-NG-NI-NG-NS-HD-NS-HD-NS-NG-NS-NG-HD-NG-NG-NG
MSCT1-TAL2NI-NI-NI-NN-NI-NS-HD-NG-HD-NS-NG-HD-HD-NG
MSCT1-TAL3NI-NG-NI-HD-NG-NG-NG-NG-HD-NS-HD-HD-NG-NG
MSCT1-TAL4NI-NG-NI-NG-NS-NS-NS-NG-HD-NS-HD-HD-HD-HD-HD-NG-NI-NG-NS-NG-NS-HD-HD-HD-HD-NG-NG-NG
MSCT1-TAL5NI-NI-NI-NN-NI-NS-HD-NG-NN-NS-NN-NN-HD-NG-N*-NN
MSCT1-TAL6NI-NG-NI-NI-NI-NG-NG-NS-NG-NS-NS-NG-NS-NG-HD-NS-HD-HD-NG-NS-NG-NG-NG-NG-NG-NG
MSCT1-TAL7HD-NI-NG-NI-NI-NI-HD-HD-HD-NS-NS-HD-HD-NS-NS-NG-NS-NG-NG
MSCT1-TAL8NI-NG-NI-NI-NI-NG-HD-HD-NS-NI-HD-NI-HD-HD-NI-NS-NG-HD-NS-NS-NS-NG-NS-NG-NG-NG-NG-NG
Repeat Variable Diresidues of MSCT1 TAL effectors Of the predicted TALs only two, MSCT1-TAL2 and MSCT1-TAL8, had target sequences that fall within 100 bp of the start codon. MSCT1-TAL2 was predicted to target 21 bp from the start codon of the two paralogous proteins (Gh_A04G1143, Gh_D04G1757) found on chromosome 4 of each of the respective sub-genomes of tetraploid cotton. The proteins that MSCT1-TAL2 potential targets contain the ProSiteProfiles NAC domain profile (PS51005). The NAC domain has been reported to participate in both biotic and abiotic stress related responses [74]. MSCT1-TAL8 targeted 19 and 20 base pairs upstream of the paralogous proteins (Gh_A01G1702, Gh_D01G1952) in the A and D sub-genomes of G. hirsutum, respectively.

Conclusions

The MSCT1 genome reported in this study is the first pv. malvacearum genome to be completed with long read DNA sequencing technology. The long read sequencing and assembly strategy allowed for the identification of eight TAL effectors in pv. malvacearum and makes the MSCT1 genome assembly the only pv. malvacearum genome with assembled TAL effectors. In addition to the TAL effector identification, many T3SS effectors were identified in MSCT1 genome. The genome assembly, as outlined in this paper, provides a basis for future epidemiological and pathogenesis studies of the pv. malvacearum-G. hirsutum pathogen host complex.
  69 in total

1.  Expression of the Bs2 pepper gene confers resistance to bacterial spot disease in tomato.

Authors:  T H Tai; D Dahlbeck; E T Clark; P Gajiwala; R Pasion; M C Whalen; R E Stall; B J Staskawicz
Journal:  Proc Natl Acad Sci U S A       Date:  1999-11-23       Impact factor: 11.205

2.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors:  A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal:  J Mol Biol       Date:  2001-01-19       Impact factor: 5.469

3.  A simple cipher governs DNA recognition by TAL effectors.

Authors:  Matthew J Moscou; Adam J Bogdanove
Journal:  Science       Date:  2009-12-11       Impact factor: 47.728

Review 4.  Regulation and secretion of Xanthomonas virulence factors.

Authors:  Daniela Büttner; Ulla Bonas
Journal:  FEMS Microbiol Rev       Date:  2009-10-13       Impact factor: 16.408

5.  Identification and molecular characterization of a β-1,4-endoglucanase gene (Rr-eng-1) from Rotylenchulus reniformis.

Authors:  Martin J Wubben; Satish Ganji; Franklin E Callahan
Journal:  J Nematol       Date:  2010-12       Impact factor: 1.402

6.  A simple and rapid method for the preparation of gram-negative bacterial genomic DNA.

Authors:  W P Chen; T T Kuo
Journal:  Nucleic Acids Res       Date:  1993-05-11       Impact factor: 16.971

7.  MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors:  Kazutaka Katoh; Daron M Standley
Journal:  Mol Biol Evol       Date:  2013-01-16       Impact factor: 16.240

8.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees.

Authors:  K Tamura; M Nei
Journal:  Mol Biol Evol       Date:  1993-05       Impact factor: 16.240

9.  InterProScan: protein domains identifier.

Authors:  E Quevillon; V Silventoinen; S Pillai; N Harte; N Mulder; R Apweiler; R Lopez
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

10.  Code-assisted discovery of TAL effector targets in bacterial leaf streak of rice reveals contrast with bacterial blight and a novel susceptibility gene.

Authors:  Raul A Cernadas; Erin L Doyle; David O Niño-Liu; Katherine E Wilkins; Timothy Bancroft; Li Wang; Clarice L Schmidt; Rico Caldo; Bing Yang; Frank F White; Dan Nettleton; Roger P Wise; Adam J Bogdanove
Journal:  PLoS Pathog       Date:  2014-02-27       Impact factor: 6.823

View more
  2 in total

1.  Identification of a virulence tal gene in the cotton pathogen, Xanthomonas citri pv. malvacearum strain Xss-V2-18.

Authors:  Fazal Haq; Shiwang Xie; Kunxuan Huang; Syed Mashab Ali Shah; Wenxiu Ma; Lulu Cai; Xiameng Xu; Zhengyin Xu; Sai Wang; Lifang Zou; Bo Zhu; Gongyou Chen
Journal:  BMC Microbiol       Date:  2020-04-15       Impact factor: 3.605

Review 2.  Cassava diseases caused by Xanthomonas phaseoli pv. manihotis and Xanthomonas cassavae.

Authors:  Carlos A Zárate-Chaves; Diana Gómez de la Cruz; Valérie Verdier; Camilo E López; Adriana Bernal; Boris Szurek
Journal:  Mol Plant Pathol       Date:  2021-07-06       Impact factor: 5.663

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.