| Literature DB >> 27547538 |
Edward W Davis Ii1,2, Alexandra J Weisberg1, Javier F Tabima1, Niklaus J Grunwald1,2,3,4, Jeff H Chang1,2,3.
Abstract
Understanding the population structure and genetic diversity of plant pathogens, as well as the effect of agricultural practices on pathogen evolution, is important for disease management. Developments in molecular methods have contributed to increase the resolution for accurate pathogen identification, but those based on analysis of DNA sequences can be less straightforward to use. To address this, we developed Gall-ID, a web-based platform that uses DNA sequence information from 16S rDNA, multilocus sequence analysis and whole genome sequences to group disease-associated bacteria to their taxonomic units. Gall-ID was developed with a particular focus on gall-forming bacteria belonging to Agrobacterium, Pseudomonas savastanoi, Pantoea agglomerans, and Rhodococcus. Members of these groups of bacteria cause growth deformation of plants, and some are capable of infecting many species of field, orchard, and nursery crops. Gall-ID also enables the use of high-throughput sequencing reads to search for evidence for homologs of characterized virulence genes, and provides downloadable software pipelines for automating multilocus sequence analysis, analyzing genome sequences for average nucleotide identity, and constructing core genome phylogenies. Lastly, additional databases were included in Gall-ID to help determine the identity of other plant pathogenic bacteria that may be in microbial communities associated with galls or causative agents in other diseased tissues of plants. The URL for Gall-ID is http://gall-id.cgrb.oregonstate.edu/.Entities:
Keywords: Agrobacterium; Average nucleotide identity; Genomes; Molecular diagnostics; Multilocus sequence analysis; Rhodococcus; Taxonomy
Year: 2016 PMID: 27547538 PMCID: PMC4958008 DOI: 10.7717/peerj.2222
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Statistics for the WGS Pipeline.
| WGS Pipeline step | Statistic | Value |
|---|---|---|
| generate_pileup.sh (1 cpu) | Number of input paired read sets | 19 |
| Average runtime per pileup (hh:mm:ss) | 00:42:01 | |
| Total runtime (hh:mm:ss) | 13:18:14 | |
| generate_core_alignment.sh (1 cpu) | Total pileup alignment length | 5,947,114 bp |
| 90%-shared core alignment length | 855,355 bp | |
| Total runtime (hh:mm:ss) | 00:15:32 | |
| remove_recombination.sh (10 cpus) | Number of core polymorphic sites | 177,961 bp |
| core SNP alignment length (w/o putative recombinant SNPs) | 174,819 bp | |
| Computational time (hh:mm:ss) | 04:25:32 | |
| Actual runtime (hh:mm:ss) | 00:29:28 | |
| Figure output runtime (hh:mm:ss) | 00:13:03 | |
| generate_phylogeny.sh (raxmlHPC-PTHREADS-AVX, 10 cpus) | Time to optimize RAxML parameters (hh:mm:ss) | 00:02:32 |
| Time to compute 20 ML searches (hh:mm:ss) | 00:34:53 | |
| Number of bootstrap replicates (RAxML autoMRE) | 50 | |
| Time to compute 50 bootstrap searches (hh:mm:ss) | 01:02:09 | |
| Total runtime (hh:mm:ss) | 01:39:34 | |
| All | Total runtime (hh:mm:ss) | 15:55:51 |
Strain identity of 14 isolates associated with crown gall.
| Isolate name | Host | Positive ID based on | # high quality read pairs | Clade (based on 16S rDNA) | # of virulence genes ID’ed |
|---|---|---|---|---|---|
| 13-2099-1-2 | Quaking Aspen | 1,244,074 | 63 | ||
| 13-626 | Pear | 220,903 | 2 ( | ||
| AC27/96 | Pieris | Not pathogenic | 826,690 | 1 ( | |
| AC44/96 | Pieris | No reaction to hybridization probes | 1,404,002 | 0 | |
| B131/95 | Peach/Almond Rootstock | Pathogenicity assay | 539,283 | 46 | |
| B133/95 | Peach/Almond Rootstock | Pathogenicity assay | 1,199,902 | 46 | |
| B140/95 | Peach/Almond Rootstock | Response to 20 different biochemical and physiological tests | 448,314 | 51 | |
| N2/73 | Cranberry gall | Response to 20 different biochemical and physiological tests | 1,345,404 | 64 | |
| W2/73 | Euonymus | Response to 20 different biochemical and physiological tests | 1,244,159 | 51 | |
| 15-1187-1-2a | Yarrow | 508,223 | 39 | ||
| 15-1187-1-2b | Yarrow | 299,970 | 38 | ||
| 14-2641 | Rose | No data | 698,756 | 0 | |
| 15-172 | Leucanthemum | Colony morphology on selective media | 384,308 | 56 | |
| 15-174 | Leucanthemum | Colony morphology on selective media | 753,570 | 58 |
Figure 1Overview of Gall-ID diagnostic tools.
DNA sequence information can be used to reveal the identity of the causative agent (unknown isolate) of disease. Tools associated with “Gall Isolate Typing” and “Phytopath-type” use 16S rDNA or pathogen-specific MLSA gene sequences to infer the identity of the isolate by comparing the sequences to manually curated sequence databases. Tools associated with “Whole Genome Analysis” and “Vir-Search” use Illumina short sequencing reads to characterize pathogenic isolates. The former tab provides downloadable tools to infer genetic relatedness based on SNPs (WGS Pipeline) or average nucleotide identity (Auto ANI). The “Vir-Search” tab provides an on-line tool to quickly map short reads against a database of sequences of virulence genes.
Manually curated datasets developed for Gall-ID.
| Database | Bacterial group | # of isolates used in Gall-ID | References |
|---|---|---|---|
| MLSA ( | Rhizobiaceae | 199 | |
| MLSA ( | Rhizobiaceae | 188 | |
| Rhizobiaceae | 198 | ||
| 16S rDNA | Rhizobiaceae | 245 | |
| MLSA ( | 85 | ||
| 16S rDNA | 66 | ||
| MLSA ( | 356 | ||
| 16S rDNA | 352 | ||
| MLSA ( | 158 | ||
| MLSA ( | 153 | ||
| 16S rDNA | 161 | ||
| 16S rDNA | 345 | ||
| MLSA ( | 7 | ||
| MLSA ( | 7 | ||
| MLSA ( | 40 | ||
| MLSA ( | 54 | ||
| MLSA ( | 28 | ||
| MLSA ( | 348 | ||
| MLSA ( | 17 | ||
Figure 2Flowchart for the WGS Pipeline.
Scripts and the programs that each script runs are boxed and presented along the left. The logic flow of the WGS Pipeline tool is presented along the right. Rectangles with rounded corners, inputs and outputs; boxes outlined in red, processes. The inputs, outputs, and processes are matched to the corresponding script and program.
Figure 3Validation of the Agro-type and Vir-Search tools.
(A) An unrooted Neighbor Joining phylogenetic tree based on 16s rDNA sequences from Agrobacterium spp. The 16S rDNA sequence was identified and extracted from the genome assembly of Agrobacterium isolate 13-2099-1-2 and analyzed using the tool available in the Agro-type tab. The isolate is labeled in red, as “query_isolate”; inset shows the clade that circumscribes the isolate. (B) Screenshot of output results from Vir-Search. Paired 2 × 300 bp MiSeq short reads from Agrobacterium isolate 13-2099-1-2 were analyzed using the Vir-Search tool in Gall-ID. Reference virulence gene sequences that were aligned are indicated with a green plus (“+”) icon and the lengths and depths of the read coverage are reported (must exceed user-specified cutoffs, which were designated as 90% minimum coverage and 20% maximum sequence divergence). Virulence genes that failed to exceed user-specific cutoffs for read alignment parameters are indicated with a red “X”. Virulence genes are grouped into categories based on their function in virulence.
Figure 4Maximum likelihood tree based on vertically inherited polymorphic sites core to 20 Rhodococcus isolates.
WGS Pipeline was used to automate the processing of paired end short reads from 20 previously sequenced Rhodococcus isolates, and generate a maximum likelihood unrooted tree. Sequencing reads were aligned, using R. fascians strain A44a as a reference. SNPs potentially acquired via recombination were removed. The tree is midpoint-rooted. Scale bar = 0.05 average substitutions per site; non-parametric bootstrap support as percentages are indicated for each node. Major clades and sub-clades are labeled in a manner consistent with previous labels.