| Literature DB >> 21818249 |
Indra Neil Sarkar1, Michael Trizna.
Abstract
With the volume of molecular sequence data that is systematically being generated globally, there is a need for centralized resources for data exploration and analytics. DNA Barcode initiatives are on track to generate a compendium of molecular sequence-based signatures for identifying animals and plants. To date, the range of available data exploration and analytic tools to explore these data have only been available in a boutique form--often representing a frustrating hurdle for many researchers that may not necessarily have resources to install or implement algorithms described by the analytic community. The Barcode of Life Data Portal (BDP) is a first step towards integrating the latest biodiversity informatics innovations with molecular sequence data from DNA barcoding. Through establishment of community driven standards, based on discussion with the Data Analysis Working Group (DAWG) of the Consortium for the Barcode of Life (CBOL), the BDP provides an infrastructure for incorporation of existing and next-generation DNA barcode analytic applications in an open forum.Entities:
Mesh:
Year: 2011 PMID: 21818249 PMCID: PMC3144886 DOI: 10.1371/journal.pone.0014689
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Overview of Common Data Analytic Process.
Based on discussion with DAWG membership, the set of common steps (shown in black) that transcend across the range of data analytic applications was used to inform decisions for common file formats (shown in blue).
List of current tools available in the BDP.
| Tool Name | Category | Publication | Description |
| BLOG (Barcoding with LOGic formulas) | Identification | Bertolazzi P, Felici G, Weitschek E. | BLOG (Barcoding with LOGic formulas) is a character-based identification tool based on Logic Mining techniques. The identification process is comprised of two steps. The first step is feature selection, where the problem of selecting a small number of relevant features if formulated as an integer programming problem. The second step is the identification of the logic formulas that separate each class from all the others. This task is accomplished using the |
| DNA Barcode Linker | Identification | Hajibabaei M, Singer G. | This is an interface to the DNA Barcode Linker website hosted by the Bioinformatics Laboratory at Concordia University, Montreal, Quebec. It was developed by Gregory Singer and Hamid Nikbakht, with technical assistance from Lee Zamparo.Searches of the DNA barcode library are based on an algorithm developed by Gregory Singer, Mehrdad Hajibabaei and Donal Hickey. This search algorithm is informally called GoogleGene. |
| BLAST | Identification | Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden T. | BLAST stands for Basic Local Alignment Search Tool. It was originally designed at the NIH by Eugene Myers, Stephen Altschul, Warren Gish, David J. Lipman and Webb Miller in 1990. The Data Portal implements the current BLAST+ release 2.2.23 from March 2010. The algorithm emphasizes speed over sensitivity, so BLAST is not optimized for barcode identification, but still is able to provide valuable estimate identifications. |
| Barcode Data Release Report | Dataset Analysis | Guidelines to Authors of BARCODE Data Release Papers for Submission to PLoS ONE ( | This tool generates a report of all of the required statistical measures expected in a barcode data release report submitted to PLoS One. |
| Sequence Composition | Dataset Analysis | None | This tool gives a general statistical report of the nucleotide composition of the sequences in a dataset. Results are available for each sequence, as well as the dataset as a whole. |