Literature DB >> 27222847

Digital data of quality control strains under general deposit at Microbial Culture Collection (MCC), NCCS, Pune, India: A bioinformatics approach.

Bhagwan N Rekadwad1, Chandrahasya N Khobragade1.   

Abstract

A total of 13 short DNA sequences of quality control strains (MCC 2052, MCC 2077, MCC 2078, MCC 2080, MCC 2309, MCC 2322, MCC 2408, MCC 2409, MCC 2412, MCC 2413, MCC 2415, MCC 2483 and MCC 2515) were retrieved from NCBI BioSample database and generated quick response (QR) codes for sequences. 16S rRNA was used for creation of Chaose Game representation (CGR), Chaose Game Representation of Frequencies (FCGR) and measurement of GC percentage. Digital data in the form of QR codes, CGR, FCGR and GC plot would be useful for identification, visual comparison and evaluation of newly isolated strains with quality control strains. The digital data of QR codes, CGR, FCGR and GC content all the quality control strains are made available to users through this paper. This generated digital data helps to evaluate and compare newly isolated strains, less laborious and avoid misinterpretation of newly isolated species.

Entities:  

Keywords:  Chaose Game Representation; GC content; Microbial Culture Collection; QR codes; Standard type strains

Year:  2016        PMID: 27222847      PMCID: PMC4865665          DOI: 10.1016/j.dib.2016.04.048

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Value of the data

Digital information generated provides most comprehensive and analysed data on quality control strains present under general deposit facility in Microbial Culture Collection, National Centre for Cell Sciences, Pune, Maharashtra, India. Data would be valuable for identification, evaluation and comparison of other species with quality control strains. Data would be valuable for analysis and interpretation correlation between GC content and thermophilic nature of bacteria. These experiments with quality control strains were carried out first time by us and made available to users.

Data

A total of 13 short DNA sequences of quality control strains (MCC 2052, MCC 2077, MCC 2078, MCC 2080, MCC 2309, MCC 2322, MCC 2408, MCC 2409, MCC 2412, MCC 2413, MCC 2415, MCC 2483 and MCC 2515) were obtained through NCBI׳s BioSample database. QR codes for these species were generated through DNABarID tool. Chaose Game Representation (CGR) and CGR of Frequencies were graphically represented using BioPHP tool [1]. Data on GC content in percentage were generated using ENDMEMO GC calculating tool (Table 1 and Fig. 1, Fig. 2, Fig. 3, Fig. 4, Fig. 5) [2]. See also NCBI repository http://www.ncbi.nlm.nih.gov/nuccore and MCC – list of bacteria under general deposit http://www.nccs.res.in/mcc/Bacteria.html.
Table 1

Details of digitised short DNA sequences of quality control strains under general deposit at MCC, NCCS, Pune, Maharashtra, India.

MCC accession no.NameNCBI BioSample ID (Accession no.)Maximum GC contentAverage GC content
2052A. hydrophila (ATCC 7966 T)X7467767.555.3
2077A. calcoaceticusX8166167.552.8
2078Citrobacter freundii 16 S rRNA gene (strain DSM 30039)AJ2334086552.2
2080Pseudomonas aeruginosa strain ATCC 27853AY2681756554.3
2309A. caviae (ATCC 15468 T)X7467467.555.1
2322Aeromonas caviaeX604096554.9
2408Staphylococcus aureus subsp. aureusAB59475362.551.2
2409Citrobacter freundiiAJ2334086555.2
2412Escherichia coli (strain ATCC 25922)DQ36084467.554.7
2413Escherichia coliAM98086567.555
2415Streptococcus pneumoniaeAY2810826552.7
2483Stenotrophomonas maltophiliaAB00850970.455.3
2515M. phleiM295667057.5
Fig. 1

Flow diagram for digitisation of quality control strains and their application.

Fig. 2

QR codes of quality control strains.

Fig. 3

Chaose Game Representation of quality control strains.

Fig. 4

Chaose Game Representation of Frequencies: quality control strains.

Fig. 5

GC percentage in short DNA sequences quality control strains under general deposit at MCC, NCCS, Pune, Maharashtra, India.

Experimental design, materials and methods

Short DNA sequences in FASTA format were retrieved from NCBI BioSample database. The data in the form of QR codes, CGR, FCGR and GC plot was generated using DNABarID, BioPHP and ENDMEMO GC calculating and plotting tools respectively. DNABarID tool was used for creation of unique QR codes. BioPHP tool was used for graphical representation of nucleotide in quality control strains. ENDMEMO tool was used for calculation of GC content and plotting GC distribution graph.

Interpretation of generated digital data

The generated QR codes, CGR, FCGR and GC plot using 16 S rRNA sequences of quality control strains would be useful for identification, evaluation and comparison of newly isolated species. The data displayed in QR codes is exactly similar to the data available on NCBI web portal. The number of dots appeared in a CGR image is directly proportional to number of base pairs in a field. Like CGR, FCGR images also represent the concentration of base pairs in a quantitative manner. FCGR comprises twelve squares which mimics Ramanujan׳s Magic Square but different than it. Dark colour of square indicates that higher frequency if base in 16 S rRNA. Frequency tape given in each figure is helpful to understand pattern of distribution of base pairs in DNA. Darkness of the square is directly proportional to the number of base pairs in the square. GC plot shows maximum (upper; red), average (middle; blue) and minimum (lower; red) GC percent lines Upper red line indicate maximum % of GC. Middle blue line indicate average % of GC. Lower red line indicate minimum % of GC in short DNA sequence.
Subject areaMicrobiology
More specific subject areaBioinformatics
Type of dataTable, figure and graphs
How data was acquiredThrough NCBI BioSample database
Data formatAnalyzed
Experimental factorsBioinformatics tools were used for creation of digital information
Experimental featuresAnalysis and digitization of DNA sequences were carried out using bioinformatics approach
Data source locationMicrobial Culture Collection, National Centre for Cell Sciences, Pune, Maharashtra, India (18° 32′ 14.00″ N; 73° 47′ 29.79″ E)
Data accessibilityData available within this paper.
  2 in total

1.  Digital data for Quick Response (QR) codes of thermophiles to identify and compare the bacterial species isolated from Unkeshwar hot springs (India).

Authors:  Bhagwan N Rekadwad; Chandrahasya N Khobragade
Journal:  Data Brief       Date:  2015-11-24

2.  Determination of GC content of Thermotoga maritima, Thermotoga neapolitana and Thermotoga thermarum strains: A GC dataset for higher level hierarchical classification.

Authors:  Bhagwan N Rekadwad; Chandrahasya N Khobragade
Journal:  Data Brief       Date:  2016-05-27
  2 in total
  6 in total

1.  Genomics dataset of unidentified disclosed isolates.

Authors:  Bhagwan N Rekadwad
Journal:  Data Brief       Date:  2016-06-15

2.  Genomics dataset on unclassified published organism (patent US 7547531).

Authors:  Mohammad Mahfuz Ali Khan Shawan; Md Ashraful Hasan; Md Mozammel Hossain; Md Mahmudul Hasan; Afroza Parvin; Salina Akter; Kazi Rasel Uddin; Subrata Banik; Mahbubul Morshed; Md Nazibur Rahman; S M Badier Rahman
Journal:  Data Brief       Date:  2016-10-05

3.  Bioinformatics delimitation of the psychrophilic and psychrotolerant actinobacteria isolated from the Polar Frontal waters of the Southern Ocean.

Authors:  Palaniappan Sivasankar; Bhagwan Rekadwad; Subramaniam Poongodi; Kannan Sivakumar; Bhaskar Venkateswaran Parli; N Anil Kumar
Journal:  Data Brief       Date:  2018-03-08

4.  tRNA diversification among uncultured archeon clones.

Authors:  Mohammad Mahfuz Ali Khan Shawan; Md Ashraful Hasan; Raihana Yesmin; Tareq Hossan; Md Mozammel Hossain; Md Mahmudul Hasan; Afroza Parvin; Mahbubul Morshed; Nahiyan Mohammad Salauddin; Satya Ranjan Sarker; Md Nazibur Rahman; S M Badier Rahman
Journal:  Bioinformation       Date:  2018-07-31

5.  Determination of GC content of Thermotoga maritima, Thermotoga neapolitana and Thermotoga thermarum strains: A GC dataset for higher level hierarchical classification.

Authors:  Bhagwan N Rekadwad; Chandrahasya N Khobragade
Journal:  Data Brief       Date:  2016-05-27

6.  Correcting names of bacteria deposited in National Microbial Repositories: an analysed sequence data necessary for taxonomic re-categorization of misclassified bacteria-ONE example, genus Lysinibacillus.

Authors:  Bhagwan N Rekadwad; Juan M Gonzalez
Journal:  Data Brief       Date:  2017-07-05
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.