| Literature DB >> 33194784 |
Bo Segerman1,2.
Abstract
Whole genome sequencing has become a powerful tool in modern microbiology. Especially bacterial genomes are sequenced in high numbers. Whole genome sequencing is not only used in research projects, but also in surveillance projects and outbreak investigations. Many whole genome analysis workflows begins with the production of a genome assembly. To accomplish this, a number of different sequencing technologies and assembly methods are available. Here, a summarization is provided over the most frequently used sequence technology and genome assembly approaches reported for the bacterial RefSeq genomes and for the bacterial genomes submitted as belonging to a surveillance project. The data is presented both in total and broken up on a per year basis. Information associated with over 400,000 publically available genomes dated April 2020 and prior were used. The information summarized include (i) the most frequently used sequencing technologies, (ii) the most common combinations of sequencing technologies, (iii) the most reported sequencing depth, and (iv) the most frequently used assembly software solutions. In all, this mini review provides an overview of the currently most common workflows for producing bacterial whole genome sequence assemblies.Entities:
Keywords: RefSeq; assembly methods; bacterial genomes; sequencing technologies; surveillance
Year: 2020 PMID: 33194784 PMCID: PMC7604302 DOI: 10.3389/fcimb.2020.527102
Source DB: PubMed Journal: Front Cell Infect Microbiol ISSN: 2235-2988 Impact factor: 5.293
Figure 1(A) The growth of the RefSeq microbial genomic databases and the database of bacterial genomes excluded from RefSeq for the reason “derived from surveillance project.” Dotted lines represent number of “complete genomes.” The data for 2020 includes genomes submitted before April 17. (B) Relative proportions between the different assembly levels in the bacterial RefSeq genome database. (C) The most frequently used sequencing techniques in the bacterial RefSeq database. (D) Relative proportions between the different Illumina platforms in the bacterial RefSeq genome database. (E) Relative proportions between sequencing techniques used in bacterial RefSeq divided by years. (F) Frequencies of pseudogenes in bacterial RefSeq genomes reported to be produced by one technique alone. (G) Relative proportions between genomes produced by a single sequencing technique and combinations of techniques. (H) Relative proportions between the most frequently used combinations of sequencing techniques in the bacterial RefSeq genome database. (I) Histogram of the reported sequence depth (coverage) used in the bacterial RefSeq genome database and in the bacterial surveillance project genome database.
Figure 2Heatmap of the most frequently used genome assembly software solutions used.