| Literature DB >> 20966005 |
Eoghan D Harrington1, Manimozhiyan Arumugam, Jeroen Raes, Peer Bork, David A Relman.
Abstract
SUMMARY: Recent advances in single-cell manipulation technology, whole genome amplification and high-throughput sequencing have now made it possible to sequence the genome of an individual cell. The bioinformatic analysis of these genomes, however, is far more complicated than the analysis of those generated using traditional, culture-based methods. In order to simplify this analysis, we have developed SmashCell (Simple Metagenomics Analysis SHell-for sequences from single Cells). It is designed to automate the main steps in microbial genome analysis-assembly, gene prediction, functional annotation-in a way that allows parameter and algorithm exploration at each step in the process. It also manages the data created by these analyses and provides visualization methods for rapid analysis of the results. AVAILABILITY: The SmashCell source code and a comprehensive manual are available at http://asiago.stanford.edu/SmashCell CONTACT: eoghanh@stanford.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.Entities:
Mesh:
Year: 2010 PMID: 20966005 PMCID: PMC2982155 DOI: 10.1093/bioinformatics/btq564
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.(A) The data model used in SmashCell is designed to reduce redundancy and facilitate the comparison of results using different parameters and/or algorithms [MC: metagenome collection, MG: metagenome (equivalent to a SAG), AS: assembly, GP: gene prediction, FUNC: functional annotation]. (B) K-mer frequency statistics supplement sequence similarity information to identify potential contaminants. This shows a self-organizing map (SOM) trained on the tetramer frequencies of an assembly. The left panel shows a series of pie charts highlighting the taxonomic identity (determined by best hit in GenBank, those with no hits are uncoloured) of the contigs assigned to each neuron. The right panel shows the U-matrix of the SOM. (C) The abundance of single-copy COGs can be used to assess genome completeness, the presence of contamination and the quality of the assembly. (D) SmashCell uses different graphs to aid in parameter and algorithm selection. Here the results from two different gene prediction algorithms are presented, along with GC-content, quality scores and read depth.