| Literature DB >> 29158994 |
Simon M Dittami1, Erwan Corre2.
Abstract
Modern genome sequencing strategies are highly sensitive to contamination making the detection of foreign DNA sequences an important part of analysis pipelines. Here we use Taxoblast, a simple pipeline with a graphical user interface, for the post-assembly detection of contaminating sequences in the published genome of the kelp Saccharina japonica. Analyses were based on multiple blastn searches with short sequence fragments. They revealed a number of probable bacterial contaminations as well as hybrid scaffolds that contain both bacterial and algal sequences. This or similar types of analysis, in combination with manual curation, may thus constitute a useful complement to standard bioinformatics analyses prior to submission of genomic data to public repositories. Our analysis pipeline is open-source and freely available at http://sdittami.altervista.org/taxoblast and via SourceForge (https://sourceforge.net/projects/taxoblast).Entities:
Keywords: Basic local alignment search tool (BLAST); Brown algae; Contaminating sequences; Genome assembly; Horizontal gene transfer
Year: 2017 PMID: 29158994 PMCID: PMC5695246 DOI: 10.7717/peerj.4073
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Overview of the Taxoblast pipeline (A), the corresponding graphical user interface (B) and the generated output (C).
Figure 2Taxoblast analysis of the S. japonica genome.
Application of the Taxoblast pipeline to identify potential bacterial sequences in the published S. japonica genome (Ye et al., 2015). (A) shows the percentage of bacterial/eukaryote blast hits over the 6,731 scaffolds >2 kbp with blast hits (254 scaffolds >2kbp had no hits). Dotted lines show the 90% cutoff proposed to consider a sequence as “contaminant”. (B) and (C) illustrate the different distribution of GC contents in the sequences considered bacterial, and those considered eukaryotic or unclassified.