| Literature DB >> 19468047 |
Lee S Katz1, Chris R Bolen, Brian H Harcourt, Susanna Schmink, Xin Wang, Andrey Kislyuk, Robert T Taylor, Leonard W Mayer, I King Jordan.
Abstract
The Meningococcus Genome Informatics Platform (MGIP) is a suite of computational tools for the analysis of multilocus sequence typing (MLST) data, at http://mgip.biology.gatech.edu. MLST is used to generate allelic profiles to characterize strains of Neisseria meningitidis, a major cause of bacterial meningitis worldwide. Neisseria meningitidis strains are characterized with MLST as specific sequence types (ST) and clonal complexes (CC) based on the DNA sequences at defined loci. These data are vital to molecular epidemiology studies of N. meningitidis, including outbreak investigations and population biology. MGIP analyzes DNA sequence trace files, returns individual allele calls and characterizes the STs and CCs. MGIP represents a substantial advance over existing software in several respects: (i) ease of use-MGIP is user friendly, intuitive and thoroughly documented; (ii) flexibility--because MGIP is a website, it is compatible with any computer with an internet connection, can be used from any geographic location, and there is no installation; (iii) speed--MGIP takes just over one minute to process a set of 96 trace files; and (iv) expandability--MGIP has the potential to expand to more loci than those used in MLST and even to other bacterial species.Entities:
Mesh:
Year: 2009 PMID: 19468047 PMCID: PMC2703879 DOI: 10.1093/nar/gkp288
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.MLST+ workflow on MGIP. Users upload trace files to the MGIP server for analysis. First, Phred makes base calls on each trace to produce a sequence FASTA file and a quality file. Next, Phrap aligns and produces a consensus sequence FASTA file and other associated files. BLAST is then used to match the consensus sequence against a database of known MLST+ alleles. Allelic FASTA files are extracted from the database using fastacmd and individually aligned to the consensus sequences to determine coordinates, mismatches and indels using pairwise BLAST. Alignments between consensus sequences, called allelic sequences and underlying trace files are displayed using the trace file viewer. The trace file viewer can be used to manually edit consensus sequences based on the aligned trace files (see Figure 3).
Figure 3.The trace viewer and editor applet. The consensus sequence acts as a backbone when aligning the allelic sequence and the traces. The applet tools allow users to (1) alter the amplitude of the traces, (2) edit the consensus sequence, (3) insert/delete consensus sequence nucleotides, (4) undo/redo any action and (5) save a modified consensus sequence. Sequences of interest are embedded below the applet so that they can be copied and pasted.
Figure 2.Viewing MGIP+ results. (A) The user can select a set of results to view. For each set, all strain/loci and their allele calls are shown. (B) Options are shown for each strain/locus that allow the user to view more details. (C) When an allele call is not a perfect match, a flag appears. (D) On mouseover (when the mouse pointer hovers over the flag), a message giving information as to why it is not a perfect match appears.
MGIP is more sensitive and faster than other commonly used methods
| Speed (s) | ||||
|---|---|---|---|---|
| MGIP versus STARS | ||||
| MGIP | 660 | 18 | 97.4 | 63 ± 0.58 |
| STARS | 653 | 35 | 94.9 | 323 ± 30 |
| MGIP versus SeqMan method | ||||
| MGIP | 323 | 6 | 98.2 | 75 ± 2 |
| SeqMan | 319 | 8 | 97.6 | 1520 ± 173 |
TP: true positives; FN: false negatives; Sn: sensitivity.
aSpeed is shown as an average per trace file set (84 traces in the STARS comparison, 96 traces in the SeqMan comparison), plus or minus standard deviation. The speed tests were performed over a 1 Gigabyte per second network connection and therefore the upload time was negligible. However, the upload time from a slower connection will understandably increase the time to process a set of trace files. Approximate times for uploading a set of traces is given in Supplementary Table 2.
bFor the MGIP versus STARS comparison, 17 sets of MLST data were tested which were composed of trace files over 691 strain/loci. The speed test was performed on three randomly selected sets, composed of 126 strain/loci.
cFor the MGIP versus SeqMan method comparison, 10 sets of fetA were tested in the SeqMan comparison, totaling 331 loci. The speed test was performed on three randomly selected sets, composed of 103 strain/loci.