Literature DB >> 30533749

Complete Genome Sequence of the Arcobacter molluscorum Type Strain LMG 25693.

William G Miller1, Emma Yee1, James L Bono2.   

Abstract

As components of freshwater and marine microflora, Arcobacter spp. are often recovered from shellfish, such as mussels, clams, and oysters. Arcobacter molluscorum was isolated from mussels from the Ebro Delta in Catalonia, Spain. This article describes the whole-genome sequence of the A. molluscorum strain LMG 25693T (= F98-3T = CECT 7696T).

Entities:  

Year:  2018        PMID: 30533749      PMCID: PMC6256585          DOI: 10.1128/MRA.01293-18

Source DB:  PubMed          Journal:  Microbiol Resour Announc        ISSN: 2576-098X


ANNOUNCEMENT

Members of the genus Arcobacter are often recovered from shellfish (1–7). The prevalence of Arcobacter species in environmental waters (8) suggests that contamination of shellfish by these organisms might be the result of filter feeding-associated bioaccumulation, with this contamination potentially resulting in human illness following the consumption of raw or partially cooked shellfish. Arcobacter molluscorum was isolated from farmed shellfish harvested in Catalonia, Spain (4). In this article, we report the first closed genome sequence of the A. molluscorum type strain LMG 25693 (= F98-3T = CECT 7696T), isolated in 2009 from farmed mussels from the Ebro Delta in Catalonia, Spain. The genome of A. molluscorum strain LMG 25693T was completed using the Roche GS FLX+, Illumina HiSeq, and PacBio RS II next-generation sequencing platforms. Genomic DNA was isolated with the Wizard genomic DNA purification kit (Promega, Madison, WI) using a loop (∼5 μl) of cells taken from cultures grown (aerobic environment, 48 h, 30°C) on anaerobe basal agar (Oxoid) amended with 5% horse blood. Shotgun and paired-end Roche 454 libraries were constructed following the manufacturer’s protocols, and 454 sequencing was performed using the Titanium chemistry and standard methods. PacBio SMRTbell libraries were prepared from 10 μg of genomic DNA using the standard 20-kb PacBio protocol (9). Single-molecule real-time (SMRT) cell sequencing was performed using standard protocols, the 20-kb libraries, P6-C4 sequencing chemistry, and the 360-min data collection mode. Illumina HiSeq reads were obtained from SeqWright (Houston, TX). Shotgun and paired-end Roche 454 reads were assembled using Newbler v. 2.6 (Roche) and default parameters into 88 total contigs; 5 low-quality contigs consisting of <100 reads were deleted. PacBio reads were assembled with RS Hierarchical Genome Assembly Process (HGAP) v. 3 (Pacific Biosciences) with default settings, which yielded a single chromosomal contig that was polished, using the RS.Resequencing.1 module (Pacific Biosciences) with default parameters, and circularized. Reads were quality controlled within the Newbler or RS HGAP assemblers; 99.8% to 99.99% of the bases in the assembled 454 and Illumina contigs had base call quality scores of 40 (Table 1). The custom Perl script contig_extender3 (10) was used to order and orient the 454 contigs into a single circular sequence. Verification of this 454 contig order was performed through a BLASTN analysis of these contigs using the PacBio contig as a reference. The 55 unique 454 contigs and the PacBio contig were assembled together using SeqMan Pro v. 8.0 (DNASTAR, Madison, WI), with the remaining 28 contigs that represent repeat regions added to the assembly manually at two or more locations. This assembly was confirmed using an optical restriction map (restriction enzyme XbaI; OpGen, Gaithersburg, MD). Verification and error correction of base calls within the composite 454/PacBio assembly were performed using the HiSeq reads. These reads were assembled de novo within Newbler using the same parameters as with the 454 reads; small contigs represented by <20 reads were deleted. The remaining contigs were assembled into the SeqMan 454/PacBio assembly described above, with base calls adjusted to the Illumina consensus sequence. Single nucleotide polymorphisms within the repeat contigs and sequences between the Illumina contigs were assessed/verified by assembling the Illumina reads onto these regions within Geneious v. 8.1 (Biomatters, Auckland, NZ) and using the “find variations/SNPs” module, with a default minimum variant frequency parameter of 0.3. The final coverage across the genome was 1,089×.
TABLE 1

Sequencing metrics and genomic data for A. molluscorum strain LMG 25693T

FeatureValue(s)a
Sequencing metrics
    454 (shotgun) platform
        No. of reads177,873
        No. of bases73,714,660
        Average length (bases)414.4
        Coverage (×)26.3
    454 (paired-end) platform
        No. of reads150,593
        No. of bases46,384,064
        Average length (bases)308.0
        Coverage (×)16.6
    Illumina HiSeq 2000 platform
        No. of reads25,306,576
        No. of bases2,530,657,600
        Average length (bases)100
        Coverage (×)903.6
    PacBio platform
        No. of reads129,047
        No. of bases399,548,656
        Average length (bases)3,096.1b
        Coverage (×)142.7
    Newbler metricsc
        N50ContigSize (454) (bases)90,324
        Q40PlusBases (454) (%)99.84
        N50ContigSize (HiSeq pool 1) (bases)78,972
        Q40PlusBases (HiSeq pool 1) (%)99.99
        N50ContigSize (HiSeq pool 2) (bases)90,503
        Q40PlusBases (HiSeq pool 2) (%)99.96
        N50ContigSize (HiSeq pool 3) (bases)79,027
        Q40PlusBases (HiSeq pool 3) (%)99.97
Genomic data
    Chromosome
        Size (bp)2,800,582
        G+C content (%)26.25
        No. of CDSd 2,666
            Assigned function (% CDS)1,044 (39.2)
            General function annotation (% CDS)995 (37.3)
            Domain/family annotation only (% CDS)199 (7.5)
            Hypothetical (% CDS)428 (16.1)
        Pseudogenes31
    Genomic islands/CRISPR
        No. of genetic islands3
        No. of CDS in genetic islands71, [1]
        CRISPR-Cas lociI-B, [III-A]
    Gene content/pathways
        IS elements, mobile elements, or tranposases3 (IS1595); 1, [1] (other)
        Signal transduction
            Che proteinscheABDRVW(Y)2
            No. of methyl-accepting chemotaxis proteins26
            No. of response regulators57
            No. of histidine kinases62
            No. of response regulator/histidine kinase fusions7
            No. of diguanylate cyclases17
            No. of diguanylate phosphodiesterases (HD-GYP, EAL)4, 5
            No. of diguanylate cyclase/phosphodiesterases8
            No. of other11
        Motility
            Flagellin genesfla1 to fla6
        Restriction/modification
            No. of type I systems (hsd)1
            No. of type II systems1, [1]
            No. of type III systems0
        Transcription/translation
            No. of transcriptional regulatory proteins64
            Non-ECFe σ factorsσ54, σ70
            No. of ECF σ factors0
            No. of tRNAs56
            No. of ribosomal locif 3 (A), 3 (B)
        CO dehydrogenase (coxSLF)Yes
        Ethanolamine utilization (eutBCH)Yes
        Nitrogen fixation (nif)Yes
        OsmoprotectionBCCT3, ectABC
        Pyruvate → acetyl-CoA
            Pyruvate dehydrogenase (E1/E2/E3)Yes
            Pyruvate:ferredoxin oxidoreductasepor
        UreaseureAB
        Vitamin B12 biosynthesisYes

Numbers in square brackets indicate pseudogenes or fragments.

Maximum length, 25,747 bases.

Features and values taken from largeContigMetrics within 454NewblerMetrics.txt for each assembly. Large contigs were defined as ≥500 bases. Due to the large number of HiSeq reads, the total reads were split into three pools and assembled independently.

Numbers do not include pseudogenes; CDS, coding sequences.

ECF, extracytoplasmic function.

A: 16S-tRNAIle-tRNAAla-23S-5S; B: 16S-23S-5S.

Sequencing metrics and genomic data for A. molluscorum strain LMG 25693T Numbers in square brackets indicate pseudogenes or fragments. Maximum length, 25,747 bases. Features and values taken from largeContigMetrics within 454NewblerMetrics.txt for each assembly. Large contigs were defined as ≥500 bases. Due to the large number of HiSeq reads, the total reads were split into three pools and assembled independently. Numbers do not include pseudogenes; CDS, coding sequences. ECF, extracytoplasmic function. A: 16S-tRNAIle-tRNAAla-23S-5S; B: 16S-23S-5S. A. molluscorum strain LMG 25693T has a circular genome of 2,800,582 bp with an average G+C content of 26.25%. Protein-, rRNA-, and tRNA-encoding genes were identified and annotated as described (11, 12). Briefly, putative coding sequences (CDSs), tRNA/transfer-messenger RNA (tmRNA) genes, and rRNA loci were identified using GeneMark, ARAGORN, and RNAmmer, respectively (13–15). The genome sequence and the CDS coordinates from GeneMark were used to create a preliminary GenBank-formatted file which was entered into Artemis v. 16 (16) to identify putative pseudogenes and genes missed in the original GeneMark analysis and to manually curate the start codon of each putative CDS. Initial annotation was accomplished by comparing the proteome of strain LMG 25693T to proteomes derived from other Arcobacter genomes (primarily A. butzleri strain RM4018 and A. nitrofigilis [GenBank accession numbers CP000361 and CP001999, respectively]) and to proteins in the NCBI nonredundant (nr) database using BLASTP. Annotation was further refined, e.g., through an analysis of Pfam motifs (17) and a BLASTP analysis that utilized a larger custom protein database that also included proteomes from all current completed Campylobacter genomes. The LMG 25693T genome is predicted to encode 2,666 putative protein-coding genes and 31 pseudogenes. Additionally, the LMG 25693T genome contains 56 tRNA-encoding genes and 6 rRNA operons; however, 3 of these rRNA operons do not contain the isoleucyl-tRNA or alanyl-tRNA genes that are commonly found in other rRNA loci. Three genomic islands were identified in the LMG 25693T genome; one genomic island is a putative integrated plasmid containing genes for a P-type type IV conjugative transfer system, while a second 28-kb island putatively encodes a type VI secretion system. The LMG 25693T genome also contains a type I-B CRISPR-Cas system. A second CRISPR-Cas system (type III-A) was identified; however, although this locus contains the cas6, csm2, csm3, csm4, and csm5 genes, it does not contain cas1 or cas2, and the cas10 gene is presumably nonfunctional. No plasmids were identified in the strain LMG 25693T genome.

Data availability.

The complete genome sequence of A. molluscorum strain LMG 25693T has been deposited in GenBank under the accession number CP032098. HiSeq, 454, and PacBio sequencing reads have been deposited in the NCBI Sequence Read Archive (SRA; accession number SRP155187).
  14 in total

1.  Arcobacter bivalviorum sp. nov. and Arcobacter venerupis sp. nov., new species isolated from shellfish.

Authors:  Arturo Levican; Luis Collado; Carmen Aguilar; Clara Yustes; Ana L Diéguez; Jesús L Romalde; Maria José Figueras
Journal:  Syst Appl Microbiol       Date:  2012-03-06       Impact factor: 4.022

2.  ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences.

Authors:  Dean Laslett; Bjorn Canback
Journal:  Nucleic Acids Res       Date:  2004-01-02       Impact factor: 16.971

3.  Arcobacter molluscorum sp. nov., a new species isolated from shellfish.

Authors:  Maria José Figueras; Luis Collado; Arturo Levican; Jessica Perez; Maria Josep Solsona; Clara Yustes
Journal:  Syst Appl Microbiol       Date:  2010-12-24       Impact factor: 4.022

4.  Arcobacter ellisii sp. nov., isolated from mussels.

Authors:  Maria José Figueras; Arturo Levican; Luis Collado; Maria Isabel Inza; Clara Yustes
Journal:  Syst Appl Microbiol       Date:  2011-07-01       Impact factor: 4.022

5.  Arcobacter lekithochrous sp. nov., isolated from a molluscan hatchery.

Authors:  Ana L Diéguez; Sabela Balboa; Thorolf Magnesen; Jesús L Romalde
Journal:  Int J Syst Evol Microbiol       Date:  2017-05-30       Impact factor: 2.747

6.  Prevalence of Arcobacter in meat and shellfish.

Authors:  Luis Collado; Josep Guarro; Maria José Figueras
Journal:  J Food Prot       Date:  2009-05       Impact factor: 2.077

7.  Higher water temperature and incubation under aerobic and microaerobic conditions increase the recovery and diversity of Arcobacter spp. from shellfish.

Authors:  Arturo Levican; Luis Collado; Clara Yustes; Carme Aguilar; Maria José Figueras
Journal:  Appl Environ Microbiol       Date:  2013-11-01       Impact factor: 4.792

8.  The Pfam protein families database.

Authors:  Marco Punta; Penny C Coggill; Ruth Y Eberhardt; Jaina Mistry; John Tate; Chris Boursnell; Ningze Pang; Kristoffer Forslund; Goran Ceric; Jody Clements; Andreas Heger; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2011-11-29       Impact factor: 16.971

9.  Comparative genomics of the Campylobacter lari group.

Authors:  William G Miller; Emma Yee; Mary H Chapman; Timothy P L Smith; James L Bono; Steven Huynh; Craig T Parker; Peter Vandamme; Khai Luong; Jonas Korlach
Journal:  Genome Biol Evol       Date:  2014-11-08       Impact factor: 3.416

10.  RNAmmer: consistent and rapid annotation of ribosomal RNA genes.

Authors:  Karin Lagesen; Peter Hallin; Einar Andreas Rødland; Hans-Henrik Staerfeldt; Torbjørn Rognes; David W Ussery
Journal:  Nucleic Acids Res       Date:  2007-04-22       Impact factor: 16.971

View more
  4 in total

1.  Complete Genome Sequencing of Four Arcobacter Species Reveals a Diverse Suite of Mobile Elements.

Authors:  William G Miller; Emma Yee; James L Bono
Journal:  Genome Biol Evol       Date:  2020-02-01       Impact factor: 3.416

2.  Arcobacter cryaerophilus Isolated From New Zealand Mussels Harbor a Putative Virulence Plasmid.

Authors:  Stephen L W On; Damien Althaus; William G Miller; Darrell Lizamore; Samuel G L Wong; Anso J Mathai; Venkata Chelikani; Glen P Carter
Journal:  Front Microbiol       Date:  2019-08-05       Impact factor: 5.640

3.  Complete Genome Sequences of the Campylobacter fetus subsp. venerealis, Campylobacter lari subsp. concheus, Campylobacter sputorum bv. sputorum, and Campylobacter volucris Type Strains.

Authors:  William G Miller; Emma Yee
Journal:  Microbiol Resour Announc       Date:  2019-11-07

4.  Complete Genome Sequence of the Arcobacter canalis Type Strain LMG 29148.

Authors:  William G Miller; Emma Yee; Mary H Chapman
Journal:  Microbiol Resour Announc       Date:  2019-10-31
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.