Literature DB >> 27766287

Genomics dataset on unclassified published organism (patent US 7547531).

Mohammad Mahfuz Ali Khan Shawan1, Md Ashraful Hasan1, Md Mozammel Hossain1, Md Mahmudul Hasan1, Afroza Parvin1, Salina Akter1, Kazi Rasel Uddin1, Subrata Banik1, Mahbubul Morshed1, Md Nazibur Rahman1, S M Badier Rahman1.   

Abstract

Nucleotide (DNA) sequence analysis provides important clues regarding the characteristics and taxonomic position of an organism. With the intention that, DNA sequence analysis is very crucial to learn about hierarchical classification of that particular organism. This dataset (patent US 7547531) is chosen to simplify all the complex raw data buried in undisclosed DNA sequences which help to open doors for new collaborations. In this data, a total of 48 unidentified DNA sequences from patent US 7547531 were selected and their complete sequences were retrieved from NCBI BioSample database. Quick response (QR) code of those DNA sequences was constructed by DNA BarID tool. QR code is useful for the identification and comparison of isolates with other organisms. AT/GC content of the DNA sequences was determined using ENDMEMO GC Content Calculator, which indicates their stability at different temperature. The highest GC content was observed in GP445188 (62.5%) which was followed by GP445198 (61.8%) and GP445189 (59.44%), while lowest was in GP445178 (24.39%). In addition, New England BioLabs (NEB) database was used to identify cleavage code indicating the 5, 3 and blunt end and enzyme code indicating the methylation site of the DNA sequences was also shown. These data will be helpful for the construction of the organisms' hierarchical classification, determination of their phylogenetic and taxonomic position and revelation of their molecular characteristics.

Entities:  

Keywords:  Cleavage code; GC content; Genomics dataset; Hierarchical classification; NCBI BioSample database; QR code; Taxonomic position; patent US 7547531

Year:  2016        PMID: 27766287      PMCID: PMC5066183          DOI: 10.1016/j.dib.2016.09.046

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table Value of the data Data regarding AT and GC percentage of the DNA sequences would give idea about their stability at different temperatures. The QR code would be useful for the identification, qualitative, quantitative analysis of the isolates and for their comparison with other organisms. These data give information about exact position of restriction sites to create blunt and sticky ends and also give an idea about the sites where cleavage is being affected by methylation.

Data

This paper contains data on quick response (QR) codes, guanine and cytosine (GC) content, analyzed DNA sequences and microorganisms having similar regions of 48 nucleotide sequences of unclassified disclosed microorganism from patent US 7547531. All the sequences of unidentified microorganisms disclosed from the patent US 7547531 were downloaded in FASTA format via NCBI nuccore database. These retrieved nucleotide sequences were utilized to generate QR codes, calculate GC content along with GC plot, determine number of cleavage code (blunt end cut, 5′ and 3′ sticky ends extension) and identify number of enzyme code (cleavage affected by CpG and other methylation).

Experimental design, materials and methods

At the beginning, a total of 48 nucleotide sequences (GP445164, GP445165, GP445166, GP445167, GP445168, GP445169, GP445170, GP445171, GP445172, GP445173, GP445174, GP445175, GP445176, GP445177, GP445178, GP445179, GP445180, GP445181, GP445182, GP445183, GP445184, GP445185, GP445186, GP445187, GP445188, GP445189, GP445190, GP445191, GP445192, GP445193, GP445194, GP445195, GP445196, GP445197, GP445198, GP445199, GP445200, GP445201, GP445202, GP445203, GP445204, GP445205, GP445206, GP445207, GP445208, GP445209, GP445210 and GP445211) of unclassified published microorganism from patent US 7547531 were retrieved from most trustworthy biological databases namely NCBI (National Center for Biotechnology Information) via Nucleotide DNA database (http://www.ncbi.nlm.nih.gov/nuccore/?term=patent+US+7547531) and saved in FASTA format [1], [2]. The QR code for each of the nucleotide sequence was determined by using DNA BarID tool (http://www.neeri.res.in/DNA_BarID/DNA_BarID.htm) [3] (Supplementary Table 1). GC content as well as GC plot of each nucleotide sequence was analyzed by ENDMEMO DNA/RNA GC Content Calculator (http://www.endmemo.com/bio/gc.php). GC content was determined as percentage of guanine (G) and cytosine (C) nucleotides in a given sequence (Supplementary Table 2), while GC plot was the blueprint of G and C nucleotide allotment in a given sequence illustrated through graphical image (Supplementary Fig. 1). Within the GC plot, middle blue line show average GC percentage, while upper and lower red lines indicate maximum and minimum percentage of GC allotment respectively [4], [5], [6], [7], [8]. The analysis of large non-overlapping open reading frames within a given nucleotide sequence was determined by using NEBcutter V2.0 tool (http://nc2.neb.com/NEBcutter2/) [9]. For each sequence, this tool determines possible number of cleavage in the form of blunt end cut and 5′ and 3′ sticky ends extension, while the identified number of enzyme code provide precise clue about cleavage affected by CpG and other types of methylation (Supplementary Table 2 and Supplementary Fig. 2). Furthermore, New England BioLabs (NEB) database determined the A (adenine) and T (thymine) as well as GC percentages in nucleotide sequence [10]. After that, nucleotide blast (https://blast.ncbi.nlm.nih.gov/Blast.cgi) was done for each of the disclosed unidentified nucleotide sequence to identify the regions of similarity between biological sequences [11], [12].
Study areaBiological Sciences
More definite study areaGenomics, Microbiology, Bioinformatics
Data typesTable, figure, graph and QR Code
How data was obtainedThrough NCBI BioSample database
Data formatRaw and analyzed
Experimental factorsDataset obtained through bioinformatics tool
Experimental featuresOnly disclosed genome sequences were used
Data source locationDepartment of Biochemistry and Molecular Biology, Jahangirnagar University, Savar, Dhaka-1342, Bangladesh.
Data accessibilityData available within this article and via the NCBI repository http://www.ncbi.nlm.nih.gov/nuccore/?term=patent+US+7547531
  7 in total

1.  NEBcutter: A program to cleave DNA with restriction enzymes.

Authors:  Tamas Vincze; Janos Posfai; Richard J Roberts
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

2.  Digital data for Quick Response (QR) codes of thermophiles to identify and compare the bacterial species isolated from Unkeshwar hot springs (India).

Authors:  Bhagwan N Rekadwad; Chandrahasya N Khobragade
Journal:  Data Brief       Date:  2015-11-24

3.  Bioinformatics data supporting revelatory diversity of cultivable thermophiles isolated and identified from two terrestrial hot springs, Unkeshwar, India.

Authors:  Bhagwan N Rekadwad; Chandrahasya N Khobragade
Journal:  Data Brief       Date:  2016-04-23

4.  Data on true tRNA diversity among uncultured and bacterial strains.

Authors:  Bhagwan N Rekadwad; Chandrahasya N Khobragade
Journal:  Data Brief       Date:  2016-04-26

5.  Digital data for quick response (QR) codes of alkalophilic Bacillus pumilus to identify and to compare bacilli isolated from Lonar Crator Lake, India.

Authors:  Bhagwan N Rekadwad; Chandrahasya N Khobragade
Journal:  Data Brief       Date:  2016-04-09

6.  Digital data of quality control strains under general deposit at Microbial Culture Collection (MCC), NCCS, Pune, India: A bioinformatics approach.

Authors:  Bhagwan N Rekadwad; Chandrahasya N Khobragade
Journal:  Data Brief       Date:  2016-04-26

7.  Determination of GC content of Thermotoga maritima, Thermotoga neapolitana and Thermotoga thermarum strains: A GC dataset for higher level hierarchical classification.

Authors:  Bhagwan N Rekadwad; Chandrahasya N Khobragade
Journal:  Data Brief       Date:  2016-05-27
  7 in total
  2 in total

1.  tRNA diversification among uncultured archeon clones.

Authors:  Mohammad Mahfuz Ali Khan Shawan; Md Ashraful Hasan; Raihana Yesmin; Tareq Hossan; Md Mozammel Hossain; Md Mahmudul Hasan; Afroza Parvin; Mahbubul Morshed; Nahiyan Mohammad Salauddin; Satya Ranjan Sarker; Md Nazibur Rahman; S M Badier Rahman
Journal:  Bioinformation       Date:  2018-07-31

2.  Structure to function analysis with antigenic characterization of a hypothetical protein,HPAG1_0576 from Helicobacter pylori HPAG1.

Authors:  Hanan Ashrafi; Muntequa Ishtiaq Siraji; Nazmir Nur Showva; Md Mozamme Hossain; Tareq Hossan; Md Ashraful Hasan; Abdullah Mohammad Shohael; Mohammad Mahfuz Ali Khan Shawan
Journal:  Bioinformation       Date:  2019-07-31
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.