| Literature DB >> 27602267 |
Javier F Tabima1, Sydney E Everhart2, Meredith M Larsen3, Alexandra J Weisberg1, Zhian N Kamvar1, Matthew A Tancos4, Christine D Smart4, Jeff H Chang5, Niklaus J Grünwald6.
Abstract
Development of tools to identify species, genotypes, or novel strains of invasive organisms is critical for monitoring emergence and implementing rapid response measures. Molecular markers, although critical to identifying species or genotypes, require bioinformatic tools for analysis. However, user-friendly analytical tools for fast identification are not readily available. To address this need, we created a web-based set of applications called Microbe-ID that allow for customizing a toolbox for rapid species identification and strain genotyping using any genetic markers of choice. Two components of Microbe-ID, named Sequence-ID and Genotype-ID, implement species and genotype identification, respectively. Sequence-ID allows identification of species by using BLAST to query sequences for any locus of interest against a custom reference sequence database. Genotype-ID allows placement of an unknown multilocus marker in either a minimum spanning network or dendrogram with bootstrap support from a user-created reference database. Microbe-ID can be used for identification of any organism based on nucleotide sequences or any molecular marker type and several examples are provided. We created a public website for demonstration purposes called Microbe-ID (microbe-id.org) and provided a working implementation for the genus Phytophthora (phytophthora-id.org). In Phytophthora-ID, the Sequence-ID application allows identification based on ITS or cox spacer sequences. Genotype-ID groups individuals into clonal lineages based on simple sequence repeat (SSR) markers for the two invasive plant pathogen species P. infestans and P. ramorum. All code is open source and available on github and CRAN. Instructions for installation and use are provided at https://github.com/grunwaldlab/Microbe-ID.Entities:
Keywords: Genotyping; Identification; Molecular diagnostics; Pathogen; Phytophthora; Taxonomy
Year: 2016 PMID: 27602267 PMCID: PMC4994078 DOI: 10.7717/peerj.2279
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Open source computational tools required to install and deploy Microbe-ID on a server.
| Tool | Description | Source | References |
|---|---|---|---|
| R package for phylogenetic and evolutionary analysis |
| ||
| Basic Local Alignment and Search Tool implemented as an algorithm for comparing DNA, RNA or protein query sequences against a reference database |
| ||
| R package for multivariate analysis of genetic data |
| ||
| A framework for developing responsive, mobile-first projects on the web |
| – | |
| Set of web-apps for identification of species, genotypes, and strains of any organism |
| This paper. | |
| Multiple sequence alignment algorithm to find homology between sequences using Fourier algorithms |
| ||
| R package for analysis of population genetic data |
| ||
| R package for genetic analysis of populations with mixed reproduction |
| ||
| Interactive web application framework for R |
| – |
Notes.
reference not available; see website provided for information
Figure 1Diagram representing implementation of Genotype-ID, which is comprised of a user interface file (index.html) and a server file (server.R).
Each file communicates with the R framework (via shiny) and user (via HTML5). On the user side (left side), user input is provided by copy/paste of a query and selects/specifies the desired application modifiers (seed number, genetic distance calculation). This information is subsequently received and processed by the server file, prompting the application to run in R. On the server side (right side) a database file (Marker DB), R packages, and functions are retrieved and executed. When the run is complete, the server file provides output to the user interface file and displayed on the app output.
Genetic distances implemented in the Genotype-ID module of Microbe-ID.
Each of the distances included in Microbe-ID are specific to a given molecular marker used in the web application.
| Distance model | Module | R package | References |
|---|---|---|---|
| Felsenstein 81 (F81) | MLST-ID | ape | |
| Felsenstein 84 (F84) | MLST-ID | ape | |
| Indel | MLST-ID | ape | |
| Jukes-Cantor (JC69) | MLST-ID | ape | |
| Kimura 80 (K80) | MLST-ID | ape | |
| Kimura 81 (K81) | MLST-ID | ape | |
| Raw | MLST-ID | ape | |
| Tamura and Nei 93 (TN93) | MLST-ID | ape | |
| Transitions (TS) | MLST-ID | ape | |
| Transversions (TV) | MLST-ID | ape | |
| Bruvo | SSR-ID | poppr | |
| Edwards | Binary-ID | poppr/adegenet | |
| Nei | Binary-ID | poppr/adegenet | |
| Prevosti | Binary-ID | poppr/adegenet | |
| Reynolds | Binary-ID | poppr/adegenet | |
| Rogers | Binary-ID | poppr/adegenet |
Notes.
Multilocus sequence typing
SSR/microsatellite loci
AFLP/SNP loci
Figure 2Results of SSR-ID for NA1 and NA2 queries of P. ramorum provided in the example data file.
Each color represents a clonal lineage pre-assigned to each reference sample (NA1, NA2, EU1, EU2) with queries colored in red. (A) UPGMA tree with 1,000 bootstrap replicates and support values above branches. Queries are represented in red and all are correctly placed with reference samples of the presumptive clonal lineage while also representing the relationship between clonal lineages in the reference dataset. (B) Minimum spanning network reconstruction. Edge shade and width are inversely proportional to Bruvo’s distance as shown in the horizontal scale bar. Queries are represented in red and placed in nodes with the most similar reference sample in the dataset, indicating the NA1 query is most similar to the PR-12-044 reference sample and the NA2 query is more closely related to the PR-05-156 and PR-12-103 samples, which also belong to the NA2 clonal lineage.
Figure 3Results of SSR-ID queries for strains placed into the US8 and US23 clonal lineages of the potato late blight pathogen, P. infestans.
Colors correspond to clonal lineages assigned to each reference sample (B, C, EU-13, EU-14, etc.) except for the queries which are colored in red. (A) UPGMA tree with 1,000 bootstrap replicates with support values above branches. Queries are represented in red and all are correctly placed with samples of the presumptive clonal lineage while also representing relationships between clonal lineages in the reference dataset. (B) Minimum spanning network reconstruction. Edge shade and width are proportional to Bruvo’s distance shown in the horizontal scale bar. Queries are represented in red nodes and appear in legend as ‘???’. Queries placed in nodes with the most similar reference sample, indicating that the US8 query is most similar to the PI-12-016 reference sample (US-8 clonal lineage) and the US23 query is most closely related to the PI-12-023 sample, part of the US-23 lineage.