Literature DB >> 35402671

rbcL gene in global molecular data repository.

Conrad Omonhinmin1, Chinedu Onuselogu1.   

Abstract

The ribulose-bisphosphate carboxylase (rbcL) gene sequence data in the molecular data repository has been increasing significantly, over the years with contributions from different parts of the world. The abundance of the gene has enhanced its applications in several ways. Bulk records were obtained from National Center for Biotechnology Information (NCBI) GenBank database using the entrez efetch utility as implemented in the Biopython package version 1.77. Records corresponding to the following keywords "rbcL AND plants [filter] AND biomol_genomic [PROP] AND is _nuccore [filter]" were created. Generated records were cleaned and then further analysed using the code file in the supplementary materials. Country information was obtained by searching reference information for matches to countries present in the pycountry package. Where no match was found, null was returned. This data article contains information about the plant family and species whose rbcL gene sequence has been deposited on the NCBI and regions of the world that has contributed to the rbcL repository growth. This data can be used to analyse the intra and inter family relatedness of plant and compare with existing relationships the molecular characterization of plants, evolutionary relationship studies, inferring biogeography origin of plant.
© 2022 Published by Elsevier Inc.

Entities:  

Keywords:  Biogeography; Evolutionary; Molecular repository; Phylogeny; rbcL gene

Year:  2022        PMID: 35402671      PMCID: PMC8987485          DOI: 10.1016/j.dib.2022.108090

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Values of the Data

This data present information of plant species, phylum, and family for which rbcL gene sequence have been deposited on NCBI GenBank. Molecular systematics can use the data to renew the relatedness of plants both within and between families as well as compare with existing relationships. This data is useful in the following field: molecular characterization of plants, evolutionary relationship studies, inferring biogeography, origin of plant, codon bias usage profile, protein structure analysis, ecological preference studies. This data can be used to determine the pattern of growth of rbcL gene sequence from different regions in the molecular repository. This data shows the least explore plant species and the need for exploitation.

Data Description

The data in this article gives an overview of the total number of plant species, families, with rbcL gene sequence in the GenBank and the regions that has contributed to the growth of the rbcL sequence in the repository. The sequence data of the rbcL gene are used for renewal of phylogenies among the seed plants [1]. The rbcL gene is preferred among other plant genes for phylogenetic studies due to its slower rate of evolutionary changes and the lowest divergence among the plastid genes in flowering plants [2,3]. [4] described the suitability of the gene for solving intergeneric and interspecific relationship and no difficulties of alignment. Some of the applications of the gene in the molecular investigations of plant species include: tracing of the molecular origin of plants [5], the biogeography origin of plants [6]. The datasets used, in the study was collected as a secondary data and the Bio python code written for data collection can be assessed as Supplementary data, the rbcL gene data used was obtained from the first report till 2020. Fig. 1, shows the most studied plant families with rbcL gene on the GenBank. Fig. 2; shows plant phyla with rbcL gene sequence and the extent to which the sequences have been utilized for rbcL related studies. The continents with rbcL sequence submission and the percentage of contribution to the GenBank is represented in Figs 3, 4 and 5 shows countries with higher rbcL sequence submission on GenBank. The plant species and other species with rbcL gene sequence can be assessed in the supplementary materials.
Fig. 1

Most studied plant families with rbcL gene sequence in GenBank.

*The numbers indicate the number of species in each family with rbcL gene deposited on NCBI GenBank.

*NB: The study discovered a total number of 808 plant families with rbcL gene sequence submitted on the NCBI GenBank making it difficult to include all the families in the tree map in Fig. 1, hence the plant families with the most rbcL gene submission are mentioned in Fig. 1.

Fig. 2

Percentage of plant phyla with rbcL gene data deposited on GenBank.

Fig. 3

Percentage of rbcL sequences contribution from different regions.

Fig. 4

Countries with higher submissions of rbcL sequences on the GenBank repository.

Fig. 5

Map showing global concentration of rbcL sequence contribution to GenBank repository.

* Regions with dark blue has higher contribution of rbcL gene sequence on the NCBI GenBank.

Most studied plant families with rbcL gene sequence in GenBank. *The numbers indicate the number of species in each family with rbcL gene deposited on NCBI GenBank. *NB: The study discovered a total number of 808 plant families with rbcL gene sequence submitted on the NCBI GenBank making it difficult to include all the families in the tree map in Fig. 1, hence the plant families with the most rbcL gene submission are mentioned in Fig. 1. Percentage of plant phyla with rbcL gene data deposited on GenBank. Percentage of rbcL sequences contribution from different regions. Countries with higher submissions of rbcL sequences on the GenBank repository. Map showing global concentration of rbcL sequence contribution to GenBank repository. * Regions with dark blue has higher contribution of rbcL gene sequence on the NCBI GenBank.

Experimental Design, Materials and Methods

Bulk records were obtained for NCBI GenBank database using the entrez efetch utility as implemented in the Biopython package version 1.77. Records corresponding to the following keywords “rbcL AND plants[filter] AND biomol_genomic[PROP] AND is_nuccore[filter]” were obtained. Obtained records were cleaned and then further analysed using the codes files in the supplementary material. Country information was obtained by searching reference information for matches to countries present in the pycountry package. Where no match was found, null was returned.

CRedit Author Statement

Conrad Omonhinmin: Conceptualization, Methodology, Validation and Supervision; Chinedu Onuselogu: Data curation, Investigation, software, Reviewing and Editing, Writing-Original draft preparation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.
SubjectBiological sciences
Specific subject areaMolecular phylogenetics, Phylogeny and Evolution
Type of dataText, Table, Chart, Figure
How data were acquiredBiopython package version 1.77. was used to retrieve the rbcL gene sequence data from the NCBI GenBank. The written code used for retrieving the data from the NCBI GenBank can be assessed in the supplementary materials.
Data formatRaw, Analysed and Filtered.
Description of data collectionBulk data were obtained from NCBI GenBank database using the entrez efetch utility as implemented in the Biopython package version 1.77. Datasets that do not have the matching words rbcL, Plant and DNA were filtered off from the data leaving behind data with the keywords rbcL, plant and DNA.
Data source locationThe data was obtained from the NCBI GenBank database.
Data accessibilityWith the article.
Repository nameMendeley Data
Data identification number10.17632/wdmtpnwsrn.1
Direct link to the dataset:http://www.rbcLGeneinGlobalMolecularDataRepository.com
  3 in total

1.  Use of DNA barcodes to identify flowering plants.

Authors:  W John Kress; Kenneth J Wurdack; Elizabeth A Zimmer; Lee A Weigt; Daniel H Janzen
Journal:  Proc Natl Acad Sci U S A       Date:  2005-05-31       Impact factor: 11.205

2.  Phylogenetic relationships and DNA barcoding of nine endangered medicinal plant species endemic to Saint Katherine protectorate.

Authors:  Ahmed M Hashim; Aishah Alatawi; Faris M Altaf; Sameer H Qari; Mohamed E Elhady; Gamal H Osman; Heba H Abouseadaa
Journal:  Saudi J Biol Sci       Date:  2021-01-01       Impact factor: 4.219

3.  Mutational dynamics of aroid chloroplast genomes.

Authors:  Ibrar Ahmed; Patrick J Biggs; Peter J Matthews; Lesley J Collins; Michael D Hendy; Peter J Lockhart
Journal:  Genome Biol Evol       Date:  2012       Impact factor: 3.416

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.