| Literature DB >> 26020012 |
Madison I Dunitz1, Jenna M Lang1, Guillaume Jospin1, Aaron E Darling2, Jonathan A Eisen1, David A Coil1.
Abstract
The sequencing, assembly, and basic analysis of microbial genomes, once a painstaking and expensive undertaking, has become much easier for research labs with access to standard molecular biology and computational tools. However, there are a confusing variety of options available for DNA library preparation and sequencing, and inexperience with bioinformatics can pose a significant barrier to entry for many who may be interested in microbial genomics. The objective of the present study was to design, test, troubleshoot, and publish a simple, comprehensive workflow from the collection of an environmental sample (a swab) to a published microbial genome; empowering even a lab or classroom with limited resources and bioinformatics experience to perform it.Entities:
Keywords: Bioinformatics; Genome assembly; Genome sequencing; Microbial genomics; Workflow
Year: 2015 PMID: 26020012 PMCID: PMC4435499 DOI: 10.7717/peerj.960
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Overview of the workflow.
All the steps required to go from a swab to a genome.
Figure 2A model phylogenetic tree.
A phylogenetic tree is often helpful in assigning taxonomy to an unknown sequence.
Figure 3SeqTrace options.
This screenshot shows an example of manually entered primer information in SeqTrace.
Figure 4SeqTrace trimming setting.
An example of reducing the minimum confidence score in SeqTrace.
Figure 5Sanger chromatogram.
This screenshot from SeqTrace shows both the chromatogram (trace) as well as the consensus sequence.
Figure 6BLAST options.
The recommended settings for using BLAST in this workflow.
Figure 7GOLD search.
Sample “Quick Search” page on GOLD.
Figure 8GOLD results.
Sample results for Brachybacterium on GOLD.
Figure 9Sample “Align Sequences Nucleotide BLAST” results.
In this example, our sequence of interest is 98% identical to the target sequence.
Figure 10RDP options.
Here we show our recommended options for RDP.
Figure 11Dendroscope options.
The circle shows the options for expanding/shrinking the tree, while the arrow points to the “phylogram” option.
Figure 12An informative phylogenetic tree.
This phylogenetic tree shows our sequence of interest to be in a clade where everything has the same name.
Figure 13An uninformative phylogenetic tree.
In this phylogenetic tree our species of interest is found in a clade with several species, some of which are found in other clades.
Estimated materials costs of bacterial genome sequencing.
This table shows the estimated materials (i.e., without labor) cost of performing a genome sequencing project with this workflow in 2014. The “Best Case” shows the marginal cost of sequencing one genome in a case where you are multiplexing 48 samples, and have the appropriate kits and reagents on hand. The “Worst Case” shows the cost of doing a single genome, with no multiplexing, in a lab where every reagent needed to be purchased new and was not used for anything else.
| Projected cost | ||
|---|---|---|
| Item | Best case (per sample) | Worst case (per sample) |
| DNA extraction | $1.66 | $1.66 |
| PCR | $0.60 | $150 |
| PCR cleanup | $2.00 | $100 |
| Sanger | $14.00 | $14 |
| Library prep | $58.33 | $2,800 |
| Illumina sequencing | $35.42 | $1,700 |
| Total | $112.01 | $4,930 |
Notes.
Specific assumptions are as follows;
This assumes the purchase of a standard DNA extraction kit, good for 100 samples.
This assumes purchase of a standard 200U PCR reagent kit.
PCR cleanup can be performed in a number of ways; gel extraction, beads, or columns for example. Here we assume purchase of a standard column-based kit.
Sanger sequencing cost is given as the price per reaction ($7 at our sequencing facility), times the forward and reverse reactions.
This assumes the purchase of a 48-sample Nextera or TrueSeq kit from Illumina, however kits from other manufacturers can be cheaper.
Our sequencing cost estimate assumes purchase of an Illumina MiSeq run from a sequencing facility.