| Literature DB >> 27899609 |
Po-E Li1, Chien-Chi Lo1, Joseph J Anderson2,3, Karen W Davenport1, Kimberly A Bishop-Lilly3,4, Yan Xu1, Sanaa Ahmed1, Shihai Feng1, Vishwesh P Mokashi3, Patrick S G Chain5.
Abstract
Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the ease of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. This bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research.Entities:
Mesh:
Year: 2016 PMID: 27899609 PMCID: PMC5224473 DOI: 10.1093/nar/gkw1027
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.An overview of the EDGE Bioinformatics Environment. The only inputs required from the user are raw sequencing data and a project name. The user can create specific workflows with any combination of the modules. In addition, tailored parameters dictating how each module functions can be modified by the user. EDGE outputs a variety of files, tables and graphics which can be viewed on screen or downloaded. A more detailed overview is shown in Supplementary Figure S1. All Modules are described in the Methods section.
Descriptions of samples and EDGE modules tested
| EDGE Modulesa | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Sample description | Sample type (material) | # of reads (millions) | Sequence type | 1 | 2 | 3 | 4 | 5 | 6 | CPUs | Run time (h) |
| Isolate (gDNA) | 28.6 | HiSeq 2×101 nt | X | X | X | X | X | X | 8 | 04:12:03 | |
| Isolate (gDNA) | 28.6 | HiSeq 2×101 nt | X | X | X | X | X | X | 20 | 03:33:52 | |
| Isolate (gDNA) | 15.0 | GAII 2×110 nt | X | X | X | X | X | X | 8 | 03:35:39 | |
| Human Microbiome Project (staggered mock community) SRR172903 | Metagenome (DNA) | 7.93 | GAII 75 nt | X | X | X | 8 | 00:53:59 | |||
| Patient plasma sample 2014 | Metagenome (RNA) | 0.930 | HiSeq 2×100 nt | X | X | X | X | 12 | 00:38:07 | ||
| Patient plasma sample 2014 | Metagenome (RNA) | 0.930 | HiSeq 2×100 nt | X | X | X | X | 12 | 00:47:24 | ||
| Patient fecal sample 2011 | Metagenome (DNA) | 273 | HiSeq 2×100 nt | X | X | X | 8 | 34:43:30 | |||
| Patient nasal swab acute respiratory illness SRP062772b | Metagenome (DNA) | 2.52 | MiSeq 2×300 nt | X | X | X | 8 | 00:20:59 | |||
aEDGE Modules are described in Materials and Methods: 1. Pre-Processing; 2. Assembly and Annotation; 3. Reference-Based Analysis; 4. Taxonomic Classification; 5. Phylogenetic Analysis; 6. PCR Primer Analysis.
bThese samples were retrieved directly from the NCBI SRA.
Figure 2.Taxonomy and phylogenetic evaluations of bacterial isolates. Panels A and B show taxonomic classification of reads for (A) the Y. pestis Harbin35 sample and (B) the B. anthracis SK-102 sample. The stars indicate the consistent dominant taxonomic calls for all tools, while the black arrow and bracket indicate identified contamination in the B. anthracis sample. Panels C and D indicate the inferred phylogenetic trees for the (C) Y. pestis and (D) B. anthracis; black arrows point to the read dataset (pink) and contigs (blue) that were placed in these trees.
Figure 3.Taxonomic Classification of the HMP staggered mock sample. (A) Read-based classification using various taxonomy profiling tools; (B) contig-based classification displaying length of all classified contigs per taxon and (C) a scatterplot of contig % GC versus fold coverage of the contigs, colored by taxon.
Figure 4.Interactive genome browsing view of a reference-based analysis in EDGE with a human clinical sample containing Ebolavirus. (A) An Ebola reference genome and its genes (green lines) are displayed together with contig-based (using IDBA) and read-based comparisons. The two contigs (blue lines) from IDBA are shown aligned along the length of the reference as well as the reads (red and blue). (B) A zoomed-in view of one section of the genome where SNPs were identified. The SNP and coding difference is outlined under the contig alignment, while the variants are indicated under the read alignments.
Figure 5.Phylogenetic and taxonomic analysis of human clinical samples with suspected and unknown causative agents. (A) Circular phylogenetic tree clearly places within the E. coli O104 group both the raw reads and the contigs obtained from a clinical fecal sample. (B) A comparative heatmap view of identified taxa from a nasal swab sample demonstrates the abundance of typical nasal cavity organisms. (C) The E. coli identified with GOTTCHA in the nasal swab sample (in B) is described in greater detail under the tool-specific EDGE view (red arrow), showing the percent of hits to plasmids for each identified taxon; below are a taxonomic dendrogram featuring the taxa detected with circles representing relative abundance, and a Krona plot view of the same data.