Literature DB >> 30629718

EMBL2checklists: A Python package to facilitate the user-friendly submission of plant and fungal DNA barcoding sequences to ENA.

Michael Gruenstaeudl1, Yannick Hartmaring2.   

Abstract

BACKGROUND: The submission of DNA sequences to public sequence databases is an essential, but insufficiently automated step in the process of generating and disseminating novel DNA sequence data. Despite the centrality of database submissions to biological research, the range of available software tools that facilitate the preparation of sequence data for database submissions is low, especially for sequences generated via plant and fungal DNA barcoding. Current submission procedures can be complex and prohibitively time expensive for any but a small number of input sequences. A user-friendly software tool is needed that streamlines the file preparation for database submissions of DNA sequences that are commonly generated in plant and fungal DNA barcoding.
METHODS: A Python package was developed that converts DNA sequences from the common EMBL and GenBank flat file formats to submission-ready, tab-delimited spreadsheets (so-called 'checklists') for a subsequent upload to the annotated sequence section of the European Nucleotide Archive (ENA). The software tool, titled 'EMBL2checklists', automatically converts DNA sequences, their annotation features, and associated metadata into the idiosyncratic format of marker-specific ENA checklists and, thus, generates files that can be uploaded via the interactive Webin submission system of ENA.
RESULTS: EMBL2checklists provides a simple, platform-independent tool that automates the conversion of common DNA barcoding sequences into easily editable spreadsheets that require no further processing but their upload to ENA via the interactive Webin submission system. The software is equipped with an intuitive graphical as well as an efficient command-line interface for its operation. The utility of the software is illustrated by its application in four recent investigations, including plant phylogenetic and fungal metagenomic studies. DISCUSSION: EMBL2checklists bridges the gap between common software suites for DNA sequence assembly and annotation and the interactive data submission process of ENA. It represents an easy-to-use solution for plant and fungal biologists without bioinformatics expertise to generate submission-ready checklists from common DNA sequence data. It allows the post-processing of checklists as well as work-sharing during the submission process and solves a critical bottleneck in the effort to increase participation in public data sharing.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 30629718      PMCID: PMC6328100          DOI: 10.1371/journal.pone.0210347

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


  44 in total

1.  EMBOSS: the European Molecular Biology Open Software Suite.

Authors:  P Rice; I Longden; A Bleasby
Journal:  Trends Genet       Date:  2000-06       Impact factor: 11.639

2.  The advent of mandatory data archiving.

Authors:  Daphne J Fairbairn
Journal:  Evolution       Date:  2010-11-15       Impact factor: 3.694

3.  Data archiving in ecology and evolution: best practices.

Authors:  Michael C Whitlock
Journal:  Trends Ecol Evol       Date:  2010-12-14       Impact factor: 17.712

4.  Identification of species in the angiosperm family Apiaceae using DNA barcodes.

Authors:  Jinxin Liu; Linchun Shi; Jianping Han; Geng Li; Heng Lu; Jingyi Hou; Xiaoteng Zhou; Fanyun Meng; Stephen R Downie
Journal:  Mol Ecol Resour       Date:  2014-05-14       Impact factor: 7.090

5.  Reminder to deposit DNA sequences.

Authors:  Mark Blaxter; Antoine Danchin; Babis Savakis; Kaoru Fukami-Kobayashi; Ken Kurokawa; Sumio Sugano; Richard J Roberts; Steven L Salzberg; Chung-I Wu
Journal:  Science       Date:  2016-05-11       Impact factor: 47.728

6.  The chloroplast tRNALys(UUU) gene from mustard (Sinapis alba) contains a class II intron potentially coding for a maturase-related polypeptide.

Authors:  H Neuhaus; G Link
Journal:  Curr Genet       Date:  1987       Impact factor: 3.886

7.  Biopython: freely available Python tools for computational molecular biology and bioinformatics.

Authors:  Peter J A Cock; Tiago Antao; Jeffrey T Chang; Brad A Chapman; Cymon J Cox; Andrew Dalke; Iddo Friedberg; Thomas Hamelryck; Frank Kauff; Bartek Wilczynski; Michiel J L de Hoon
Journal:  Bioinformatics       Date:  2009-03-20       Impact factor: 6.937

8.  Lost branches on the tree of life.

Authors:  Bryan T Drew; Romina Gazis; Patricia Cabezas; Kristen S Swithers; Jiabin Deng; Roseana Rodriguez; Laura A Katz; Keith A Crandall; David S Hibbett; Douglas E Soltis
Journal:  PLoS Biol       Date:  2013-09-03       Impact factor: 8.029

Review 9.  Telling plant species apart with DNA: from barcodes to genomes.

Authors:  Peter M Hollingsworth; De-Zhu Li; Michelle van der Bank; Alex D Twyford
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2016-09-05       Impact factor: 6.237

10.  DNA Data Bank of Japan: 30th anniversary.

Authors:  Yuichi Kodama; Jun Mashima; Takehide Kosuge; Eli Kaminuma; Osamu Ogasawara; Kousaku Okubo; Yasukazu Nakamura; Toshihisa Takagi
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.