Literature DB >> 31727705

Complete Genome Sequence of Cellulomonas sp. Strain Y8, a High-GC-Content Plasmid-Free Heavy Metal-Resistant Bacterium Isolated from Farmland Soil.

Jinghao Chen^1,2, Chao Xing^1,2, Xin Zheng³, Xiaofang Li³.

Abstract

We report the complete genome sequence of cadmium-resistant Cellulomonas sp. strain Y8, isolated from farmland soil. The 4.5-Mbp genome contains 4,074 genes, with an approximate GC content of 75%. This work might help in understanding how strain Y8 survives under heavy metal stress.

Entities: Chemical Disease Species

Year: 2019 PMID： 31727705 PMCID： PMC6856271 DOI： 10.1128/MRA.01066-19

Source DB: PubMed Journal: Microbiol Resour Announc ISSN： 2576-098X

ANNOUNCEMENT

Cellulomonas sp. strain Y8, which has strong cadmium (Cd) resistance, was isolated from farmland soil of the Agro-Ecosystem Experimental Station located in Luancheng, Shijiazhuang, China (37°53′N, 114°41′E), in 2017. Colonies of Y8 appear as smooth, opaque, pale-yellow, moist spheres. Y8 had a high growth rate on a Luria-Bertani plate at 37°C under aerobic conditions, and no significant growth inhibition was observed when it was inoculated on a Luria-Bertani plate containing 2 mM CdCl2. It was therefore desirable to obtain the genomic sequence of the Cellulomonas strain that was able to thrive under Cd stress. Genomic DNA was extracted from Y8 by using the PureLink Pro 96 genomic DNA purification kit (Thermo Fisher, USA), following the standard instructions. As the template, the 16S rRNA gene was then amplified and sequenced to verify the quality of the genomic DNA. The complete genome of Cellulomonas sp. strain Y8 was sequenced by using both the Illumina HiSeq (USA) and PacBio RS II (USA) platforms, according to standard protocols (1). For next-generation sequencing, the library preparations were constructed following the manufacturer’s protocol. For each sample, 100 ng genomic DNA was randomly fragmented to <500 bp by sonication (Covaris S220). The fragments were treated with end prep enzyme mix. Size selection of adaptor-ligated DNA was performed, and then fragments of ∼470 bp (with an approximate insert size of 350 bp) were recovered. Each sample was then amplified by PCR for 8 cycles using the P5 and P7 primers. The PCR products were cleaned up and validated using an Agilent 2100 Bioanalyzer (USA) and quantified with a Qubit 3.0 fluorometer (Invitrogen, USA). Then libraries with different indices were multiplexed and loaded onto an Illumina HiSeq instrument according to the manufacturer’s instructions. Cutadapt 1.9.1 (2) was employed to control the quality of the pass filter data, and reads with base groups having a quality score below 20 at both ends, as well as sequences containing more than 10% N bases or those that were less than 75 bp in length, were removed. For PacBio sequencing, the genomic DNA was sheared, and 10-kb double-stranded DNA fragments were selected. The DNA fragments were end repaired and ligated with universal hairpin adapters. The library was sequenced in a PacBio RS II instrument (1, 3). The PacBio reads were assembled using Falcon with wgs-assembler 8.2 (4–6). Then, the genome was recorrected with Pilon 1.22 (7) using Illumina data (SRA accession number SRR9639642) or with Quiver using PacBio reads (SRA accession number SRR9639643). The GC content was calculated by using an in-house Perl script. Prodigal gene-finding software was used to identity coding genes (8). Transfer RNAs were detected in the genome by using tRNAscan-SE (9). rRNAs were identified by using RNAmmer (10). Default parameters were used except where otherwise noted. Protein-coding genes were assigned using BLASTp against the following databases: the Reference Sequence nonredundant protein (nr) (11), Kyoto Encyclopedia of Genes and Genomes (KEGG) (12), Clusters of Orthologous Groups of proteins (COG) (13), Gene Ontology (GO) (14), and Carbohydrate-Active enZYmes (CAZy) databases (15). As a result of next-generation sequencing, 20,382,470 clean reads were obtained, with an average length of 148 bp, which were mainly used for correction. PacBio sequencing generated 232,404 sequences with an average length of 3,620 bp and an N50 value of 4,551 bp. The complete genome was 4,475,991 bp long, with a GC content of 75.35%. Annotation by Prodigal identified 4,074 protein-coding genes and 94 noncoding RNAs in the Y8 genome. A total of 3,872 genes were assigned to the COG functional categories for (i) transport and metabolism of amino acids (261), carbohydrates (424), inorganic ions (186), lipids (82), and coenzymes (105); (ii) transcription (351); (iii) signal transduction (205); (iv) cell wall/membrane biogenesis (164); and (v) general function prediction only (381). Three copies of the 16S rRNA gene were detected in the Y8 genome. The 16S rRNA gene sequence of Y8 exhibited a high level of similarity to Cellulomonas pakistanensis (99.0%, GenBank accession number NR_125452) and Cellulomonas hominis (98.3%, GenBank accession number NR_029288). We also calculated its average nucleotide identity (ANI) and DNA-DNA hybridization (DDH) values via the ANI calculator (16) (https://www.ezbiocloud.net/tools/ani) and the Genome-to-Genome Distance Calculator (17) (http://ggdc.dsmz.de/ggdc.php) by using the reported draft genome sequence of Cellulomonas pakistanensis (NCBI assembly number ASM131550v1), with default parameter settings, and both of the results obtained (93.03% and 52.5%, respectively) were below the corresponding threshold. This index-based taxonomic assignment of Y8 suggested that it might be a novel species in the genus Cellulomonas (18).

Data availability.

The complete sequence of Cellulomonas sp. Y8 has been deposited in GenBank under accession number CP041203 (chromosome), BioProject number PRJNA550281, and SRA accession numbers SRR9639642 (Illumina) and SRR9639643 (PacBio).

17 in total

1. The Gene Ontology (GO) database and informatics resource.

Authors: M A Harris; J Clark; A Ireland; J Lomax; M Ashburner; R Foulger; K Eilbeck; S Lewis; B Marshall; C Mungall; J Richter; G M Rubin; J A Blake; C Bult; M Dolan; H Drabkin; J T Eppig; D P Hill; L Ni; M Ringwald; R Balakrishnan; J M Cherry; K R Christie; M C Costanzo; S S Dwight; S Engel; D G Fisk; J E Hirschman; E L Hong; R S Nash; A Sethuraman; C L Theesfeld; D Botstein; K Dolinski; B Feierbach; T Berardini; S Mundodi; S Y Rhee; R Apweiler; D Barrell; E Camon; E Dimmer; V Lee; R Chisholm; P Gaudet; W Kibbe; R Kishore; E M Schwarz; P Sternberg; M Gwinn; L Hannick; J Wortman; M Berriman; V Wood; N de la Cruz; P Tonellato; P Jaiswal; T Seigfried; R White
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

2. Third generation DNA sequencing: pacific biosciences' single molecule real time technology.

Authors: Alice McCarthy
Journal: Chem Biol Date: 2010-07-30

3. Identifying bacterial genes and endosymbiont DNA with Glimmer.

Authors: Arthur L Delcher; Kirsten A Bratke; Edwin C Powers; Steven L Salzberg
Journal: Bioinformatics Date: 2007-01-19 Impact factor: 6.937

4. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.

Authors: Konstantin Berlin; Sergey Koren; Chen-Shan Chin; James P Drake; Jane M Landolin; Adam M Phillippy
Journal: Nat Biotechnol Date: 2015-05-25 Impact factor: 54.908

Review 5. A genomic perspective on protein families.

Authors: R L Tatusov; E V Koonin; D J Lipman
Journal: Science Date: 1997-10-24 Impact factor: 47.728

6. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

Authors: T M Lowe; S R Eddy
Journal: Nucleic Acids Res Date: 1997-03-01 Impact factor: 16.971

7. A large-scale evaluation of algorithms to calculate average nucleotide identity.

Authors: Seok-Hwan Yoon; Sung-Min Ha; Jeongmin Lim; Soonjae Kwon; Jongsik Chun
Journal: Antonie Van Leeuwenhoek Date: 2017-02-15 Impact factor: 2.271

8. Genome sequence-based species delimitation with confidence intervals and improved distance functions.

Authors: Jan P Meier-Kolthoff; Alexander F Auch; Hans-Peter Klenk; Markus Göker
Journal: BMC Bioinformatics Date: 2013-02-21 Impact factor: 3.169

9. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

Authors: Kim D Pruitt; Tatiana Tatusova; Donna R Maglott
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

10. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement.

Authors: Bruce J Walker; Thomas Abeel; Terrance Shea; Margaret Priest; Amr Abouelliel; Sharadha Sakthikumar; Christina A Cuomo; Qiandong Zeng; Jennifer Wortman; Sarah K Young; Ashlee M Earl
Journal: PLoS One Date: 2014-11-19 Impact factor: 3.240

3 in total

1. Genomic Insights Into Cadmium Resistance of a Newly Isolated, Plasmid-Free Cellulomonas sp. Strain Y8.

Authors: Jinghao Chen; Likun Wang; Wenjun Li; Xin Zheng; Xiaofang Li
Journal: Front Microbiol Date: 2022-01-28 Impact factor: 5.640

2. Functional Genomic Identification of Cadmium Resistance Genes from a High GC Clone Library by Coupling the Sanger and PacBio Sequencing Strategies.

Authors: Jinghao Chen; Chao Xing; Xin Zheng; Xiaofang Li
Journal: Genes (Basel) Date: 2019-12-20 Impact factor: 4.096

3. Evolution of a Record-Setting AT-Rich Genome: Indel Mutation, Recombination, and Substitution Bias.

Authors: Duong T Nguyen; Baojun Wu; Shujie Xiao; Weilong Hao
Journal: Genome Biol Evol Date: 2020-12-06 Impact factor: 3.416

3 in total