Literature DB >> 31699756

Improved Draft Genome Sequence of Microbacterium sp. Strain LKL04, a Bacterial Endophyte Associated with Switchgrass Plants.

Mohammad Radhi Sahib1,2, Piao Yang3, Norbert Bokros1, Nicole Shapiro4, Tanja Woyke4, Nikos C Kyrpides4, Ye Xia5, Seth DeBolt6.   

Abstract

We report here the genome assembly and analysis of Microbacterium strain sp. LKL04, a Gram-positive bacterial endophyte isolated from switchgrass plants (Panicum virgatum) grown on a reclaimed coal-mining site. The 2.9-Mbp genome of this bacterium was assembled into a single contig encoding 2,806 protein coding genes.
Copyright © 2019 Sahib et al.

Entities:  

Year:  2019        PMID: 31699756      PMCID: PMC6838614          DOI: 10.1128/MRA.00927-19

Source DB:  PubMed          Journal:  Microbiol Resour Announc        ISSN: 2576-098X


ANNOUNCEMENT

Members of the genus Microbacterium have previously been isolated from a wide range of environments, including soils, marine ecosystems, air, and sewage, and from plants and insects (1–5). We report here information about the sequenced and assembled genome of the bacterial endophyte Microbacterium sp. strain LKL04, a Gram-positive actinobacterium, isolated from leaves of switchgrass plants grown on a reclaimed coal-mining site in western Kentucky (6). Switchgrass samples were collected from the coal-mining site in July 2010. Leaf samples were cut into 1- to 1.5-cm-long segments, surface sterilized with a 20% bleach solution, and rinsed 5 times with autoclaved tap water. The surface-sterilized segments were incubated on tryptic soy agar (TSA) plates for 3 to 5 days at 26°C before the individual colonies were isolated and restreaked at least three times on new TSA plates (6). Single purified colonies were then isolated and grown at room temperature for 1 to 2 days in tryptic soy broth (TSB). A modified cetyltrimethylammonium bromide (CTAB) bacterial DNA isolation protocol (7; https://1ofdmq2n8tc36m6i46scovo2e-wpengine.netdna-ssl.com/wp-content/uploads/2014/02/JGI-Bacterial-DNA-isolation-CTAB-Protocol-2012.pdf) was followed to isolate the bacterial DNA for sequencing. The genome of Microbacterium sp. strain LKL04 was sequenced at 212× coverage using Pacific Biosciences (PacBio) sequencing technology (8). A PacBio SMRTbell library was constructed and sequenced with the PacBio RS platform, generating 198,113 filtered subreads with an average read length of 3,930 bp ± 2,621 bp, totaling 778.5 Mbp. Reads were trimmed and assembled using Hierarchical Genome Assembly Process (HGAP) v.2.3.0 (9). The final genome assembly contains a single contig spanning the complete 2.922-Mbp length of the bacterial genome, with a GC content of 69.7%, which is characteristic of actinobacteria. The genome is precited to be circular. Genes were identified using Prodigal v.2.5, followed by a round of manual curation using GenePRIMP, resulting in a total of 2,862 predicted genes (10, 11). From these, 2,806 predicted protein coding genes were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant, UniProt, TIGRFam, Pfam, Kyoto Encyclopedia of Genes and Genomes (KEGG), Clusters of Orthologous Genes (COG), PANTHER, and InterPro databases (12–18). For the remaining 56 genes, the tRNAScan-SE tool was used to further identify 45 tRNA genes, 6 rRNA genes, and 5 noncoding RNAs. For the noncoding RNAs, the RNA components of the protein secretion complex and RNase P were identified by searching the genome for the corresponding Rfam profiles using Infernal (19, 20). CheckM v.1.0.8, hosted on KBase, was used to estimate the completeness of the LKL04 genome (21, 22). Overall, the LKL04 genome returned a completeness score of 99.5% and a contamination level of only 0.67%. Using the PANTHER hidden Markov model (HMM) scoring tool pantherScore v.2.1, the protein sequences were further mapped against the PANTHER HMM database v.14.1 to functionally annotate the LKL04 genes and query for significantly overrepresented genes (23). Default parameters were used for each software program, unless otherwise specified. Selected annotations and genome characteristics are shown in Fig. 1. Additional gene prediction analysis and manual functional annotation were performed within the Integrated Microbial Genomes (IMG) platform developed by the Joint Genome Institute (Walnut Creek, CA) (24).
FIG 1

Circular representation of the LKL04 genome using Circos (25). The circles, from outside to inside, denote protein coding genes colored by size (A), RNA genes (B), transmembrane helix regions (C), GC content along a 1-kb window, with red lines indicating regions above the 69.7% genome average and black lines indicating regions below the genome average (D) GC skew, with red lines indicating a skew greater than zero and black lines indicating a skew less than zero (E), and genes annotated into distinct PANTHER protein classes (F). The repository for storage of scripts used to construct the figure can be found at https://github.com/nbo245/LKL04/tree/master/circos_plot.

Circular representation of the LKL04 genome using Circos (25). The circles, from outside to inside, denote protein coding genes colored by size (A), RNA genes (B), transmembrane helix regions (C), GC content along a 1-kb window, with red lines indicating regions above the 69.7% genome average and black lines indicating regions below the genome average (D) GC skew, with red lines indicating a skew greater than zero and black lines indicating a skew less than zero (E), and genes annotated into distinct PANTHER protein classes (F). The repository for storage of scripts used to construct the figure can be found at https://github.com/nbo245/LKL04/tree/master/circos_plot.

Data availability.

The whole-genome sequence has been deposited in DDBJ/EMBL/GenBank under the accession no. PRJNA322991. Original forward and reverse sequencing reads can be retrieved from NCBI under SRA accession no. SRR4232145 and SRR4232146. The associated sequence data can also be found at the Joint Genome Institute (JGI) portal with the IMG taxon identifier (ID) 2667527218 (https://genome.jgi.doe.gov/portal/MicspLKL04/MicspLKL04.info.html) or at https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=912630. Scripts used to construct Fig. 1 can be found at https://github.com/nbo245/LKL04/tree/master/circos_plot.
  24 in total

1.  GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes.

Authors:  Amrita Pati; Natalia N Ivanova; Natalia Mikhailova; Galina Ovchinnikova; Sean D Hooper; Athanasios Lykidis; Nikos C Kyrpides
Journal:  Nat Methods       Date:  2010-05-02       Impact factor: 28.547

2.  Circos: an information aesthetic for comparative genomics.

Authors:  Martin Krzywinski; Jacqueline Schein; Inanç Birol; Joseph Connors; Randy Gascoyne; Doug Horsman; Steven J Jones; Marco A Marra
Journal:  Genome Res       Date:  2009-06-18       Impact factor: 9.043

3.  Novel alkali-tolerant GH10 endo-β-1,4-xylanase with broad substrate specificity from Microbacterium trichothecenolyticum HY-17, a gut bacterium of the mole cricket Gryllotalpa orientalis.

Authors:  Do Young Kim; Dong-Ha Shin; Sora Jung; Hyangmi Kim; Jong Suk Lee; Han-Young Cho; Kyung Sook Bae; Chang-Keun Sung; Young Ha Rhee; Kwang-Hee Son; Ho-Yong Park
Journal:  J Microbiol Biotechnol       Date:  2014-07       Impact factor: 2.351

4.  Microbacterium aquimaris sp. nov., isolated from seawater.

Authors:  Kwang Kyu Kim; Keun Chul Lee; Hee-Mock Oh; Jung-Sook Lee
Journal:  Int J Syst Evol Microbiol       Date:  2008-07       Impact factor: 2.747

5.  Microbacterium insulae sp. nov., isolated from soil.

Authors:  Jung-Hoon Yoon; Peter Schumann; So-Jung Kang; Chang-Soo Lee; Soo-Young Lee; Tae-Kwang Oh
Journal:  Int J Syst Evol Microbiol       Date:  2009-06-19       Impact factor: 2.747

6.  NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

Authors:  Kim D Pruitt; Tatiana Tatusova; Donna R Maglott
Journal:  Nucleic Acids Res       Date:  2006-11-27       Impact factor: 16.971

7.  UniProt: a worldwide hub of protein knowledge.

Authors: 
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

8.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools.

Authors:  Christian Quast; Elmar Pruesse; Pelin Yilmaz; Jan Gerken; Timmy Schweer; Pablo Yarza; Jörg Peplies; Frank Oliver Glöckner
Journal:  Nucleic Acids Res       Date:  2012-11-28       Impact factor: 16.971

9.  The COG database: an updated version includes eukaryotes.

Authors:  Roman L Tatusov; Natalie D Fedorova; John D Jackson; Aviva R Jacobs; Boris Kiryutin; Eugene V Koonin; Dmitri M Krylov; Raja Mazumder; Sergei L Mekhedov; Anastasia N Nikolskaya; B Sridhar Rao; Sergei Smirnov; Alexander V Sverdlov; Sona Vasudevan; Yuri I Wolf; Jodie J Yin; Darren A Natale
Journal:  BMC Bioinformatics       Date:  2003-09-11       Impact factor: 3.169

10.  Pfam: the protein families database.

Authors:  Robert D Finn; Alex Bateman; Jody Clements; Penelope Coggill; Ruth Y Eberhardt; Sean R Eddy; Andreas Heger; Kirstie Hetherington; Liisa Holm; Jaina Mistry; Erik L L Sonnhammer; John Tate; Marco Punta
Journal:  Nucleic Acids Res       Date:  2013-11-27       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.