| Literature DB >> 22574113 |
Chang Liu1, Linchun Shi, Xiaolan Xu, Huan Li, Hang Xing, Dong Liang, Kun Jiang, Xiaohui Pang, Jingyuan Song, Shilin Chen.
Abstract
The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, "DNA barcode" actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22574113 PMCID: PMC3344831 DOI: 10.1371/journal.pone.0035146
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Examples of the seven different types of 2D barcodes used in the current study.
An ITS2 sequence from P. ginseng (GenBank accession: HQ112416.1) was used as the input.
Comparisons of the characteristics of the seven different types of 2D barcodes.
| Name | Aztec Code | CodaBlock-F | Data Matrix | PDF417 | PDF417 Truncated | QR code | QR code-2005 | |
| Code type | Matrix | Stacked | Matrix | Stacked | Stacked | Matrix | Matrix | |
| Symbol size | 15×15 to 27×27 modules | 2 to 44 rows | 8×8 to 144×144 modules | 3 to 90 rows | 3 to 90 rows | 21×21−177×177 modules | 21×21−177×177 modules | |
| Capacity | 8-bit bytes | 3832 | 5450 | 3116 | 2710 | 2710 | 7089 | 7089 |
| Numeric | 3067 | − | 2355 | 1850 | 1850 | 4296 | 4296 | |
| Alphanumeric | 1914 | 2725 | 1556 | 1108 | 1108 | 2953 | 2953 | |
| Error correction | 25%–50% | − | 15%∼25%by fixed size | Level 0 to 8 | Level 0 to 8 | 4 steps of 7% | 4 steps of 7% | |
Figure 2Correlation of image file sizes of seven 2D barcode types and sequence length for five DNA barcode markers.
Figure 3Sizes of six types of 2D barcodes shown as percentage of that of PDF417.
Figure 4Screenshots of the QRforDNA web server.
The module numbers are shaded in blue squares. The module names are shown in Red. The front and the results pages are framed in blue and red respectively. Various components on the front page, final result page and intermediate result pages are shaded in blue circles. (1) Module “GetBarcode”; (1a) Front page of the “Retrieve QR code for a species” module; (1b) Result page of the module. (2) Module “Encode”; (2a) Front page for “Convert a DNA sequence into a QR code” page; (2b) Result page showing the generated 2D barcode. (3) Module “Decode”; (3a) Front page for the “Decode a QR code into a sequence” module; (3b) Result page showing the original DNA sequence decoded from an input QR code. (4) Module “IdentifybyBlast”; (4a) Front page for the “Identify by BLAST” module; (4b) Result page for the module; (4c) the actual BLAST search result; and (4d) the best hit from the BLAST result. This is the predicted species identity for the given sample. (5) Module “IdentifybyDistance”; (5a) Front page for the “Identify by distance” module; (5b) Result page for the module; (5c) the fasta file showing the hits among the top 100 best hits and having E value <1e-5 from the BLAST search (details described in the text); (5d) the tree file in newick format; (5e) the tree file in svg format; (5f) the BLAST result; and (5g) the closest species found in the phylogenetic tree. This is the predicted identity of the query.