| Literature DB >> 33525806 |
Robert M Bowers1, Devin F R Doud1, Tanja Woyke1.
Abstract
Single-cell genome sequencing of individual archaeal and bacterial cells is a vital approach to decipher the genetic makeup of uncultured microorganisms. With this review, we describe single-cell genome analysis with a focus on the unique properties of single-cell sequence data and with emphasis on quality assessment and assurance.Entities:
Keywords: assembly; genome sequence; quality control; single cell
Year: 2017 PMID: 33525806 PMCID: PMC7289031 DOI: 10.1042/ETLS20160028
Source DB: PubMed Journal: Emerg Top Life Sci ISSN: 2397-8554
Figure 1.A schematic representation of the single-cell workflow with a focus on the analysis following sequencing.
Left panels with blue background represent the production of single-cell genomes, while the rest of the workflow relates specifically to the analysis of single-cell sequence data going from raw reads to public database submission. The bottom row of analysis boxes refers to the steps that are considered mandatory to any single-cell analysis pipeline, while the top row can be considered context-dependent. For example, if multiplexing was not performed, poolmate decontamination is not necessary (Library Quality Control). However, in nearly all cases, an SAG will benefit from contamination screening (Assembly Quality Control), as even the cleanest SAGs may contain a few contaminating contigs, and if not, this step can serve as validation of a clean SAG that is nearly ready for submission to the public databases.
Figure 2.Tetranucleotide principle component analysis (top) and GC content analysis (bottom) of target SAGs (blue) alongside additional contaminating sequence (red) and integrated phage sequences (green).
Target SAG containing contamination (A); target SAG where contamination was removed (B); target SAG with an outlying rRNA gene and an integrated phage (C). (C) Chromosomal elements such as the highly conserved rRNA genes (blue outlying points) often have tetranucleotide frequencies that differ from the main genome. Integrated phage genes can also appear as outlying points with distinct nucleotide composition and unique taxonomy (green outlying points). Each point in the plot represents fragments of contigs that split into 5000 bp fragments. Colored points (top panel) and bars (bottom panel) represent contigs that can be taxonomically classified, whereas white points or bars represent contigs with no taxonomic assignment.