| Literature DB >> 22373238 |
Chang Liu1, Dong Liang, Ting Gao, Xiaohui Pang, Jingyuan Song, Hui Yao, Jianping Han, Zhihua Liu, Xiaojun Guan, Kun Jiang, Huan Li, Shilin Chen.
Abstract
BACKGROUND: DNA barcoding technology, which uses a short piece of DNA sequence to identify species, has wide ranges of applications. Until today, a universal DNA barcode marker for plants remains elusive. The rbcL and matK regions have been proposed as the "core barcode" for plants and the ITS2 and psbA-trnH intergenic spacer (PTIGS) regions were later added as supplemental barcodes. The use of PTIGS region as a supplemental barcode has been limited by the lack of computational tools that can handle significant insertions and deletions in the PTIGS sequences. Here, we compared the most commonly used alignment-based and alignment-free methods and developed a web server to allow the biologists to carry out PTIGS-based DNA barcoding analyses.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22373238 PMCID: PMC3278844 DOI: 10.1186/1471-2105-12-S13-S4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Taxonomy coverage of PTIGS sequences.
| Category | Families | Genera | Species | Samples | Related taxids |
|---|---|---|---|---|---|
| Angiosperms | 149 | 644 | 1961 | 9404 | 3398 |
| Dicotyledons | 108 | 451 | 1367 | 7468 | 71240 |
| Monocotyledons | 30 | 146 | 472 | 1603 | 4447 |
| Ferns | 12 | 19 | 42 | 124 | 3290 |
| Gymnosperm | 8 | 16 | 65 | 206 | 3312, 58020, 58021, 58022 |
| Mosses | 34 | 46 | 72 | 287 | 3208 |
The number in each cell represents the number of taxonomy units at the corresponding taxonomy levels for the particular group used in our analyses. Essentially they correspond to the taxonomy units whose species having at least two sequences. The sequences can be downloaded from the web server.
Discrimination success rate and performance using the kmer-based method at different Kmer sizes
| Size of Kmer | Discrimination success rate | Performance (cpu time cost : mS) |
|---|---|---|
| 2 | 0.761 | 95.3 |
| 3 | 0.775 | 390.4 |
| 4 | 0.795 | 1564.2 |
| 5 | 0.823 | 5060.4 |
| 6 | 0.813 | 19007.1 |
| 7 | 0.611 | 74949.3 |
The test data is the same as those shown in Table 1.
Discrimination success rates and performance using various method combinations for the dataset containing all sequences shown in Table 1.
| Include | Not include | |||||||
|---|---|---|---|---|---|---|---|---|
| Method | Correct | Wrong | Ratio | Time | Correct | Wrong | Ratio | Time |
| B | 6291 | 4846 | 0.5649 | 0.4213 | 5323 | 5814 | 0.4780 | 0.5653 |
| B+P | 7744 | 3393 | 0.6953 | 5.0552 | 6496 | 4641 | 0.5833 | 6.4200 |
| B+E | 8650 | 2487 | 0.7767 | 36.7524 | 7034 | 4103 | 0.6316 | 52.3093 |
| D | 8477 | 2660 | 0.7612 | 0.2496 | 6669 | 4468 | 0.5988 | 0.5347 |
| D+P | 8477 | 2660 | 0.7612 | 2.3828 | 6670 | 4467 | 0.5989 | 2.4413 |
| D+E | 8687 | 2450 | 0.7800 | 21.5453 | 7363 | 3774 | 0.6611 | 15.6762 |
| B+P+E | 8651 | 2486 | 0.7768 | 12.9270 | 7096 | 4041 | 0.6372 | 11.6186 |
| D+P+E | 8686 | 2451 | 0.7799 | 9.8835 | 7401 | 3736 | 0.6645 | 9.7989 |
Ratio indicates the number of correctly identified/total number of tests. The performance shows the average time in second taken to complete a query. The base methods are B: BLAST; P: P Distance; E: Edit Distance; D: DNFP. “Included” means that the query sequences are included in the reference database, while “excluded” means that the query sequences are not included in the database when performing the analyses.
Species identification using Blast method for an exemplar query sequence.
| Query | Target | % similarity | QS* | QE* | TS* | TE* | E value | Score |
|---|---|---|---|---|---|---|---|---|
| Query | GQ248374_129213 | 100.00 | 1 | 61 | 20 | 80 | 4e-32 | 121 |
| Query | EF590730_3213 | 100.00 | 1 | 61 | 1 | 61 | 4e-32 | 121 |
| Query | EF590731_129213 | 100.00 | 1 | 61 | 1 | 61 | 4e-32 | 121 |
*QS: Query Start; *QE: Query End; *TS: Target Start; *TE: Target End.
Species identification using Blast+P method for an exemplar query sequence.
| Query | Target | K2P distance |
|---|---|---|
| query | EF590731 | 0.000000 |
| query | EF590730 | 0.027541 |
| query | GQ248374 | 0.000000 |