Literature DB >> 27408929

Genomics dataset of unidentified disclosed isolates.

Bhagwan N Rekadwad1.   

Abstract

Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis.

Entities:  

Keywords:  BioLABs; Blunt ends; Genomics; NEB cutter; Restriction digestion; Short DNA sequences; Sticky ends

Year:  2016        PMID: 27408929      PMCID: PMC4930343          DOI: 10.1016/j.dib.2016.06.010

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table Value of the data Data provides information of the AT and GC percentage of unidentified isolates. This data would be valuable for qualitative and quantitative analysis newly isolated and unidentified strains. This data provides exact position of restriction sites to create blunt and sticky ends and gives an idea about cleavage affected by methylation.

Data

This paper contains data on data for QR codes, GC percentage and DNA sequence analysis of 17 unidentified strains. Genome sequences of unidentified bacterial strains which were disclosed from the patents US 6596510 and WO 9906567 were retrieved in FASTA format via NCBI nuccore database. These downloaded sequences were used to create quick response (QR) codes and digitized using ENDMEMO GC calculating and GC plotting tool. The AT and GC percentage, number of cleavage code (blunt end, 5′ and 3′ sticky ends) and number of enzyme code (cleavage affected methylation) were determined using BioLabs NEB cutter tool (NEW ENGLAND BioLabs. Inc. https://www.neb.com/).

Experimental design, materials and methods

A total of 17 genome sequences of disclosed unidentified bacteria (AR360580, AR360581, AR360582, AR360583, AR360584, AR360585, AR360586, AR360587, AR360588, AR360589, AR360590, AX000218, AX000220, AX000221, AX000222, AX000224 and AX000225) were saved in FASTA format via NCBI BioSample DNA database. DNABarID tool was used for creation of QR codes (Fig. 1). ENDMEMO GC calculating and GC plotting tool was used to determine percentage of nucleotides in the genome. Pattern of GC distribution in complete DNA sequence showed through graphical representations in Fig. 2. Upper and lower red line indicate maximum and minimum percentage of GC content distribution in complete DNA sequence, while middle blue line indicates average GC percentage [1], [2], [3], [4], [5], [6]. NEB cutter tool was used analysis of DNA sequence of unidentified isolates. The number of cleavage to possible in the form of blunt end, 5′ and 3′ sticky ends was determined. The number of enzyme codes was determined. It gives exact information about cleavage affected CpG methylation and other types of methylation possible caused by biomolcules. Additionally, BioLabs database determined the AT and GC percentage in the genome [7], [8] (Fig. 3; Table 1).
Fig. 1

QR codes of unidentified sequences (AR360580-AR360590, AX000218, AX000220, AX000221, AX000222, AX000224 and AX000225).

Fig. 2

GC plot of unidentified sequences (AR360580-AR360590, AX000218, AX000220, AX000221, AX000222, AX000224 and AX000225).

Fig. 3

NEB restriction enzyme digestion of unidentified sequences of patents (Accession No.: AR360580-AR360590, AX000218, AX000220, AX000221, AX000222, AX000224 and AX000225).

Table 1

Unidentified sequences: Genomic analysis and restriction digestion using NEB single cutter restriction enzymes.

S. N.Accession numberName of sequenceMaximum GC%Average GC%Average AT%Number of cleavage code
Number of enzyme code
Blunt end cut5′ extension3′ extension*: cleavage affected by CpG methylation#: cleavage affected by other methylation
1AR360580Sequence 1 from patent US 659651042376310109134
2AR360581Sequence 3 from patent US 659651038.52267173
3AR360582Sequence 4 from patent US 65965106551492122207
4AR360583Sequence 5 from patent US 6596510453664111114172
5AR360584Sequence 7 from patent US 659651073.3747533546
6AR360585Sequence 8 from patent US 659651030277362
7AR360586Sequence 9 from patent US 659651055.651494201125
8AR360587Sequence 10 from patent US 659651058.851495184106
9AR360588Sequence 11 from patent US 659651054.54159111510172
10AR360589Sequence 12 from patent US 659651044.5386291210174
11AR360590Sequence 13 from patent US 659651054.5415991911181
12AX000218Sequence 1 from Patent WO 990656742376310119134
13AX000220Sequence 3 from patent WO 990656738.533671837
14AX000221Sequence 4 from patent WO 99065676551492121217
15AX000222Sequence 5 from patent WO 9906567453664112312182
16AX000224Sequence 7 from patent WO 990656773.347533546
17AX000225Sequence 8 from patent WO 990656730277362
Subject areaLife Sciences
More specific subject areaMicrobiology, Genomics, Bioinformatics, Bacterial Systematics
Type of dataTable, Figures
How data was acquiredThrough NCBI BioSample database
Data formatRaw and Analyzed
Experimental factorsDataset obtained through bioinformatics tool
Experimental featuresOnly disclosed genome sequences were used
Data source locationSchool of Life Sciences, S. R. T. M. University, Nanded, India
Data accessibilityData available within article and via the NCBI repositoryhttp://www.ncbi.nlm.nih.gov/nuccore.
  6 in total

1.  Digital data for Quick Response (QR) codes of thermophiles to identify and compare the bacterial species isolated from Unkeshwar hot springs (India).

Authors:  Bhagwan N Rekadwad; Chandrahasya N Khobragade
Journal:  Data Brief       Date:  2015-11-24

2.  Bioinformatics data supporting revelatory diversity of cultivable thermophiles isolated and identified from two terrestrial hot springs, Unkeshwar, India.

Authors:  Bhagwan N Rekadwad; Chandrahasya N Khobragade
Journal:  Data Brief       Date:  2016-04-23

3.  Data on true tRNA diversity among uncultured and bacterial strains.

Authors:  Bhagwan N Rekadwad; Chandrahasya N Khobragade
Journal:  Data Brief       Date:  2016-04-26

4.  Digital data for quick response (QR) codes of alkalophilic Bacillus pumilus to identify and to compare bacilli isolated from Lonar Crator Lake, India.

Authors:  Bhagwan N Rekadwad; Chandrahasya N Khobragade
Journal:  Data Brief       Date:  2016-04-09

5.  Digital data of quality control strains under general deposit at Microbial Culture Collection (MCC), NCCS, Pune, India: A bioinformatics approach.

Authors:  Bhagwan N Rekadwad; Chandrahasya N Khobragade
Journal:  Data Brief       Date:  2016-04-26

6.  Determination of GC content of Thermotoga maritima, Thermotoga neapolitana and Thermotoga thermarum strains: A GC dataset for higher level hierarchical classification.

Authors:  Bhagwan N Rekadwad; Chandrahasya N Khobragade
Journal:  Data Brief       Date:  2016-05-27
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.