Literature DB >> 28761909

Correcting names of bacteria deposited in National Microbial Repositories: an analysed sequence data necessary for taxonomic re-categorization of misclassified bacteria-ONE example, genus Lysinibacillus.

Bhagwan N Rekadwad1,2, Juan M Gonzalez3.   

Abstract

A report on 16S rRNA gene sequence re-analysis and digitalization is presented using Lysinibacillus species (one example) deposited in National Microbial Repositories in India. Lysinibacillus species 16S rRNA gene sequences were digitalized to provide quick response (QR) codes, Chaose Game Representation (CGR) and Frequency of Chaose Game Representation (FCGR). GC percentage, phylogenetic analysis, and principal component analysis (PCA) are tools used for the differentiation and reclassification of the strains under investigation. The seven reasons supporting the statements made by us as misclassified Lysinibacillus species deposited in National Microbial Depositories are given in this paper. Based on seven reasons, bacteria deposited in National Microbial Repositories such as Lysinibacillus and many other needs reanalyses for their exact identity. Leaves of identity with type strains of related species shows difference 2 to 8 % suggesting that reclassification is needed to correctly assign species names to the analyzed Lysinibacillus strains available in National Microbial Repositories.

Entities:  

Keywords:  16S rRNA; Bacteria; Culture collection; DDH; Digitalization

Year:  2017        PMID: 28761909      PMCID: PMC5520958          DOI: 10.1016/j.dib.2017.06.042

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table Value of the data Generated datasets are useful for visual interpretation and comparative analyses. Data act as limelight for differentiation and reclassification of individual species. Data give exact visual distribution, thorough analysis of each base pair and the relevance for strain differentiation and prerequisite for classification.

Data

Data analysis was started in early 2016. Lysinibacillus species 16S rRNA gene sequence accession number were picked from respective Microbial Repositories web catalogue. 16S rRNA gene sequences of Lisinibacillus species were downloaded from NCBI website ( https://www.ncbi.nlm.nih.gov/nuccore) from January-May in the year 2016. The thoroughly investigated dataset of this article provides information on the misclassified and misplaced bacteria in the microbial culture collections/repositories in India. Fig. 1, Fig. 2, Fig. 3, Fig. 4, Fig. 5, Fig. 6 and Table 1, Table 2, Table 3, Table 4, Table 5, Table 6 explain datasets of the misclassified bacteria.
Fig. 1

Quick response (QR) codes of Lysinibacillus strains.

Fig. 2

Chaose Game Representation (CGR) of Lysinibacillus strains.

Fig. 3

Frequency of Chaose Game Representation (FCGR) for Lysinibacillus strains.

Fig. 4

GC plots of Lysinibacillus strains based on their 16S rRNA gene sequences.

Fig. 5

Evolutionary relationships amongst the evaluated Lysinibacillus species and type strains from related species showing two lineages and the differentiation of distinct strains.

Fig. 6

Principle component analysis (PCA) of Lysinibacillus strains.

Table 1

Lysinibacillus species from National Microbial Repositories in India and their 16S rRNA gene sequences.

Culture collectionAccession numberaAssigned speciesNucleotide lengtha
Gujarat Biodiversity Gene Bank, Gujarat State Biotechnology Mission (GSBTM), Gandhinagar, IndiaGU815938Lysinibacillus sphaericus280
GU815944Lysinibacillus fusiformis415
HQ827832Lysinibacillus fusiformis809
JF925136Lysinibacillus fusiformis586
JN099793Lysinibacillus fusiformis436
JQ389685Lysinibacillus fusiformis513
JQ389688Lysinibacillus fusiformis819
JQ389694Lysinibacillus fusiformis493
JQ964026Lysinibacillus fusiformis244
JQ964029Lysinibacillus fusiformis218
JX081370Lysinibacillus sphaericus491
JX081387Lysinibacillus fusiformis507
JX081455Lysinibacillus fusiformis931
KC250125Lysinibacillus fusiformis505
KC250126Lysinibacillus fusiformis502
KC250127Lysinibacillus fusiformis503
KF889293Lysinibacillus fusiformis1242
KF913669Lysinibacillus xylanilyticus1450

National Centre for Microbial Resource, National Centre for Cell Science, Pune, IndiaAF169495Lysinibacillus sphaericus1410
AF169537Lysinibacillus fusiformis1412
FJ477040Lysinibacillus xylanilyticus strain XDB91349
JX130370Lysinibacillus fusiformis strain R-2-1859
KR809552Lysinibacillus sphaericus strain S2R3C41414

National Collection of Industrial Microorganisms (NCIM), National Chemical Laboratory (NCL), Pune, IndiaKJ363190Lysinibacillus sp. IT4(2011)1572

16S rRNA gene sequences.

Table 2

Lysinibacillus strains from culture collections compared by using GC calculation tool.

Accession numberaNucleotide lengthaGC percentage
MaximumMinimumAverage
GU815938280605056.8
GU815944415654053.7
HQ82783280967.533.352.9
JF9251365866542.353.8
JN0997934366537.552.7
JQ389685513653552
JQ38968881968.43554
JQ3896944936537.553.2
JQ9640262446032.553.6
JQ9640292186033.351.4
JX081370491653653.4
JX08138750267.542.556.1
JX0814559316542.553.7
KC250125505654052.5
KC25012650267.542.553.3
KC2501275036542.553.4
KF88929312426542.554.1
KF91366914506537.553.8
AF16949514106537.553.5
AF16953714126541.753.4
FJ477040134967.537.553.3
JX130370859653653.4
KR8095521414653553.5
KJ363190157283.322.553.3

16S rRNA gene sequence.

Table 3

NCBI-BLAST Analysis report: Lysinibacillus sp.

SLAccession Number% similarity with strainIdentity
1JQ964026Lysinibacillus macroides strain Se2 (KX959975)93%
2JQ964029Lysinibacillus fusiformis (KX397625)92%
3JX081387Lysinibacillus sp. 20088723339 (KT254135)90%
4GU815938Bacterium enrichment culture clone ALO1 (JF687759)90%
Table 4

GC content difference between Lysinibacillus fusiformisstrains and the type strain for this species.

Accession numberStrainDifference of % GC
AF169537Lysinibacillus fusiformis16.34
GU815944Lysinibacillus fusiformis16.42
HQ827832Lysinibacillus fusiformis16.33
JF925136Lysinibacillus fusiformis16.78
JN099793Lysinibacillus fusiformis15.44
JQ389685Lysinibacillus fusiformis15.51
JQ389694Lysinibacillus fusiformis16.64
JQ964026Lysinibacillus fusiformis20.86
JQ964029Lysinibacillus fusiformis20.69
JX081387Lysinibacillus fusiformis17.91
JX081455Lysinibacillus fusiformis16.39
KC250125Lysinibacillus fusiformis15.56
KC250126Lysinibacillus fusiformis16.27
KC250127Lysinibacillus fusiformis16.17
KF889293Lysinibacillus fusiformis16.96
KJ363190Lysinibacillus sp. IT4(2011)15.42
JX130370Lysinibacillus fusiformis strain R-2-116.05
Table 5

GC content difference between Lysinibacillus sphaericus strains and the type strain for this species.

Accession numberStrainDifference of % GC
AF169495Lysinibacillus sphaericus16.23
GU815938Lysinibacillus sphaericus19.47
JX081370Lysinibacillus sphaericus16.05
KR809552Lysinibacillus sphaericus strain S2R3C416.22
Table 6

GC content difference between Lysinibacillus xylanilyticus strains and the type strain for this species.

Accession numberStrainDifference of % GC
KF913669Lysinibacillus xylanilyticus18.31
FJ477040Lysinibacillus xylanilyticus strain XDB917.75
JQ389688Lysinibacillus sp.18.05
Quick response (QR) codes of Lysinibacillus strains. Chaose Game Representation (CGR) of Lysinibacillus strains. Frequency of Chaose Game Representation (FCGR) for Lysinibacillus strains. GC plots of Lysinibacillus strains based on their 16S rRNA gene sequences. Evolutionary relationships amongst the evaluated Lysinibacillus species and type strains from related species showing two lineages and the differentiation of distinct strains. Principle component analysis (PCA) of Lysinibacillus strains. Lysinibacillus species from National Microbial Repositories in India and their 16S rRNA gene sequences. 16S rRNA gene sequences. Lysinibacillus strains from culture collections compared by using GC calculation tool. 16S rRNA gene sequence. NCBI-BLAST Analysis report: Lysinibacillus sp. GC content difference between Lysinibacillus fusiformisstrains and the type strain for this species. GC content difference between Lysinibacillus sphaericus strains and the type strain for this species. GC content difference between Lysinibacillus xylanilyticus strains and the type strain for this species. Output of sequence data on EzBioCloud's Identify service (http://www.ezbiocloud.net/identify) database supporting our finding. Table 7 Output of sequence data on EzBioCloud׳s Identify service (http://www.ezbiocloud.net/identify) database supporting our finding paper is tabulated.
Table 7

Output of sequence data on EzBioCloud's Identify service (http://www.ezbiocloud.net/identify) database supporting our finding.

Experimental design, materials and methods

Twenty-four Lysinibacillus strains deposited in renowned microbial culture collections in India were used as a model case for this study (Table 1). Of the twenty-four Lysinibacillus species, eighteen species were from Microbial Repository (Biogene), Gujarat Biodiversity Gene Bank, Gujarat State Biotechnology Mission, Gandhinagar, Gujarat, Five species from National Centre for Microbial Resource, National Centre for Cell Science, Pune. One species was deposited in National Collection of Industrial Microorganisms, National Chemical Laboratory, Pune in India. No Lysinibacillus species was found in MTCC-IMTECH, Chandigarh, India. The 16 rRNA gene sequences of these strains were retrieved from the international repositories (https://www.ncbi.nlm.nih.gov/nuccore/) in FASTA format. FASTA rRNA gene sequences of Lysinibacillus species were used to generate QR codes, CGR, FCGR, GC percent determination, phylogenetic analysis, principal component analysis [3], [4], [5] and DNA–DNA Hybridization [6]. QR codes were prepared using DNABarID tool (http://www.neeri.res.in/DNA_BarID/DNA_BarID.htm). CGR, FCGR and GC plot were drawn using web-based tools [7], [8]. The phylogenetic tree was constructed using MEGA6.2 tool [9], [10], [11]. PCA was carried out using a multiple alignment program EMBL-EBI MUSCLE [12], [13], [14].

Background

At present, the 16S rRNA genes are the key for the taxonomic categorization of Bacteria and Archaea. This is due to the existence of extensive sequence information on 16S rRNA genes in public repositories [1] and well curated databases [2]. Nevertheless, the identification of unknown or newly sequenced strains involves comparison with these databases and often a subjective and/or ambiguous set when differentiating novel strains by their 16S rRNA gene sequence. For instance, some 16S rRNA gene sequences are too short limiting the information that can be extracted for comparison and identification. Thus, the accurate identification or classification of strains needs a simple and quick pipeline besides more advanced procedures involving polyphasic approaches (including phenotypic and genomic techniques) for the definitive classification of species. The aim in microbial strain identification and differentiation is to have an available pipeline for unambiguous classification. This paper describes new types of analyses for strain differentiation based on sequence analyses which are easy to perform.

Results

QR codes prepared from 16S rDNA sequences of Lysinibacillus species were unique. Any user can scan QR code using a smart phone and retrieve the sequence (Fig. 1). CGR and FCGR were used for visual interpretation of the appearance of nucleotides in 16S rRNA genes. Each CGR image has four corners. Upper two corner from left to right were C and T/U, while lower two corners from left to right were A and G. Each CGR square has four sub-squares for nucleotides viz. C, G, A and T/U. A number of dots appeared in sub-square is directly proportional to the number of nucleotides. Distribution of each nucleotide in sub-square indicates the appearance of base pairs in the analyzed gene i.e. sequence, number and percentage (Fig. 2). Unlike CGR, FCGR presents a different type of visual datasets. Distribution of nucleotides in these matrices is diverse among the studied strains. The FCGR scale indicates from poorly represented dinucleotides (white or light colored) to frequently observed dinucleotides (darkest squares) (Fig. 3). The nucleotide sequences from JQ964026, JQ964029, JX081387, GU815938 and JX081370 showed high GC percent about 60–67.5% while KJ363190 have 83.3% GC content (Table 2, Fig. 4). The BLAST analysis of JQ964026, JQ964029, JX081387 and GU815938 sequences showed 93%, 92%, 90% and 90% identity with existing species and type strains. This was confirmed from phylogenetic analysis, principal component analysis and GGDC-DDH results. The phylogenetic tree was constructed including Lysinibacillus and phylogenetically related species with bootstrap values corresponding to 1000 replicates (Fig. 5). The 16S rRNA gene sequences JQ964026, JQ964029, JX081387 and GU815938 showed identities lower than 97% (90–93% with existing species and type strains) (Table 3) suggesting that they could potentially belong to different species. Table 3 suggests a clear distinction between Lysinibacillus strains below the expected level for species differentiation. Results of Principal Component Analysis comparing the 16S rRNA gene sequences (Fig. 6) revealed different groups which could be related to major novel species or taxa within the Lysinibacillus genus. Most of these strains were isolated from environmental samples such as boron containing soil, forest humus collected from Gyeryong Mountain in Korea, Environmental Treatment Plant Naroda G.I.D.C., Ahmedabad, Gujarat (India) and textile mill effluent contaminated soil etc., followed by acclimatization on the presence of different chemicals such as Boron, Sodium Chloride, Xylan, dyes etc [15], [16], [17], [18]. This information suggests that different adaptations could result in differential strains with distinctive 16S rRNA gene sequences. GGDC-DDH analysis with type strains indicated all species has G+C difference ranged from 15.44 to 20.86 (Table 4, Table 5, Table 6). These analyses suggest that the Lysinibacillus strains could represent distinct species deposited in Indian Microbial Repositories. Thus, there is a gap of information on accurate classification within this genus and specifically on this group of strains that have been used as a model case to describe this current identification issue. Special note: The reason for the statements made by us as misclassified Lysinibacillus species deposited in National Microbial Depositories: (a) Erroneous sequences. (b) Mismatch of identity with the top hit taxon on NCBI nucleotide-nucleotide BLAST and EzTaxon database. (c) Very less percentage similarity and less than 92.22-99.0 percent match with Standard type strains. (d) Very less completeness score. (e) Very short 16S rRNA sequences. (f) Very long sequences with chimaeras. (g) Doubtful contigs or single long and unassembled sequence Based on above seven reasons, bacteria deposited in National Microbial Repositories such as Lysinibacillus either need to be re-sequenced for 16S rRNA gene and should be reanalysed on EzBioCloud׳s database for their exact identity or identified using appropriate valid techniques (Table 7).

Discussion

This study provides a pipeline to structure 16S rRNA gene sequence information constructing digitalized datasets on Lysinibacillus strains currently present in several culture collections (GSBTM Gujarat, NCMR-NCCS Pune and NCIM-NCL Pune) in India and many other National Culture Collections in the world. This information contributes to identify, compare, evaluate, interpret strain, species differentiation for novel isolates from environmental samples and make compulsory rule to investigate the correct identity of bacteria with them. Differentiation of bacteria obtained from an environment results in a relatively complicated task when those bacteria are phylogenetically closely related among them. This issue gets enhanced when comparing and classifying bacteria related to poorly curated sequence data and scarcely analyzed strains lacking a fulfillment of polyphasic recommendations. An easy differentiating pipeline represents a greatly useful tool for a large number of applications including species classification of new isolates from natural and artificial environments. The type of digitalized data from this study can be produced for any prokaryotic species and eukaryote sequence data. It could be expanded to the use of genomes or different genes or sets of genes. Overall, the enlisted data and protocol will be useful to research and industry. The proposed pipeline greatly contributes to simplify the identification and differentiation of unclassified strains and the needs for reclassification of some previously isolated microorganisms, including the detection of microbes based on 16 S rRNA gene sequence information from microbial community surveys. The proposed approach can increase its specificity and applicability as needed using different genes or genome sequence information. Thus, this protocol allows the phenotype and genotype characteristic for reintroduction and taxonomic categorization of species in current pipeline.

Conflicts of interests

The author declares there are no any conflicts of interest.
Subject areaMicrobiology
More specific subject areaBasic Microbiology
Type of dataFigure and Tables
How data was acquiredThrough 16S rRNA sequence analysis and freeware
Data formatRaw and analyzed
Experimental factorsNot applicable
Experimental featuresAll analysis carried out for bacterial sequences using standard parameters
Data source location and analysisAll data analysis was carried out at the School of Life Sciences, S. R. T. M. University, Nanded (India) during 2016.
Data accessibilityData is incorporated within this article
  15 in total

1.  Degradation of sulphonated azo dye Red HE7B by Bacillus sp. and elucidation of degradative pathways.

Authors:  Jyoti Kumar Thakur; Sangeeta Paul; Prem Dureja; K Annapurna; Jasdeep C Padaria; Madhuban Gopal
Journal:  Curr Microbiol       Date:  2014-03-30       Impact factor: 2.188

2.  Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB.

Authors:  T Z DeSantis; P Hugenholtz; N Larsen; M Rojas; E L Brodie; K Keller; T Huber; D Dalevi; P Hu; G L Andersen
Journal:  Appl Environ Microbiol       Date:  2006-07       Impact factor: 4.792

3.  MEGA6: Molecular Evolutionary Genetics Analysis version 6.0.

Authors:  Koichiro Tamura; Glen Stecher; Daniel Peterson; Alan Filipski; Sudhir Kumar
Journal:  Mol Biol Evol       Date:  2013-10-16       Impact factor: 16.240

4.  CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP.

Authors:  Joseph Felsenstein
Journal:  Evolution       Date:  1985-07       Impact factor: 3.694

5.  The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Authors:  N Saitou; M Nei
Journal:  Mol Biol Evol       Date:  1987-07       Impact factor: 16.240

6.  Genome sequence-based species delimitation with confidence intervals and improved distance functions.

Authors:  Jan P Meier-Kolthoff; Alexander F Auch; Hans-Peter Klenk; Markus Göker
Journal:  BMC Bioinformatics       Date:  2013-02-21       Impact factor: 3.169

7.  Genomic Analysis of a Marine Bacterium: Bioinformatics for Comparison, Evaluation, and Interpretation of DNA Sequences.

Authors:  Bhagwan N Rekadwad; Juan M Gonzalez; Chandrahasya N Khobragade
Journal:  Biomed Res Int       Date:  2016-11-01       Impact factor: 3.411

8.  GenBank.

Authors:  Dennis A Benson; Mark Cavanaugh; Karen Clark; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; Eric W Sayers
Journal:  Nucleic Acids Res       Date:  2012-11-27       Impact factor: 16.971

9.  Bioinformatics data supporting revelatory diversity of cultivable thermophiles isolated and identified from two terrestrial hot springs, Unkeshwar, India.

Authors:  Bhagwan N Rekadwad; Chandrahasya N Khobragade
Journal:  Data Brief       Date:  2016-04-23

10.  Digital data for quick response (QR) codes of alkalophilic Bacillus pumilus to identify and to compare bacilli isolated from Lonar Crator Lake, India.

Authors:  Bhagwan N Rekadwad; Chandrahasya N Khobragade
Journal:  Data Brief       Date:  2016-04-09
View more
  2 in total

1.  Compendium of specialized metabolite biosynthetic diversity encoded in bacterial genomes.

Authors:  Athina Gavriilidou; Satria A Kautsar; Nestor Zaburannyi; Daniel Krug; Rolf Müller; Marnix H Medema; Nadine Ziemert
Journal:  Nat Microbiol       Date:  2022-05-02       Impact factor: 30.964

2.  Bioinformatics delimitation of the psychrophilic and psychrotolerant actinobacteria isolated from the Polar Frontal waters of the Southern Ocean.

Authors:  Palaniappan Sivasankar; Bhagwan Rekadwad; Subramaniam Poongodi; Kannan Sivakumar; Bhaskar Venkateswaran Parli; N Anil Kumar
Journal:  Data Brief       Date:  2018-03-08
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.