| Literature DB >> 31435510 |
Abstract
DNA barcodes are very useful for species identification especially when identification by traditional morphological characters is difficult. However, the short mitochondrial and chloroplast barcodes currently in use often fail to distinguish between closely related species, are prone to lateral transfer, and provide inadequate phylogenetic resolution, particularly at deeper nodes. The deficiencies of short barcode identifiers are similar to the deficiencies of the short year identifiers that caused the Y2K problem in computer science. The resolution of the Y2K problem was to increase the size of the year identifiers. The performance of conventional mitochondrial COI barcodes for phylogenetics was compared with the performance of complete mitochondrial genomes and nuclear ribosomal RNA repeats obtained by genome skimming for a set of caddisfly taxa (Insect Order Trichoptera). The analysis focused on Trichoptera Family Hydropsychidae, the net-spinning caddisflies, which demonstrates many of the frustrating limitations of current barcodes. To conduct phylogenetic comparisons, complete mitochondrial genomes (15 kb each) and nuclear ribosomal repeats (9 kb each) from six caddisfly species were sequenced, assembled, and are reported for the first time. These sequences were analyzed in comparison with eight previously published trichopteran mitochondrial genomes and two triochopteran rRNA repeats, plus outgroup sequences from sister clade Lepidoptera (butterflies and moths). COI trees were not well-resolved, had low bootstrap support, and differed in topology from prior phylogenetic analyses of the Trichoptera. Phylogenetic trees based on mitochondrial genomes or rRNA repeats were well-resolved with high bootstrap support and were largely congruent with each other. Because they are easily sequenced by genome skimming, provide robust phylogenetic resolution at various phylogenetic depths, can better distinguish between closely related species, and (in the case of mitochondrial genomes), are backwards compatible with existing mitochondrial barcodes, it is proposed that mitochondrial genomes and rRNA repeats be used as next generation DNA barcodes.Entities:
Keywords: Cheumatopsyche; DNA barcoding; Genome skimming; Hydropsyche; Next Generation Barcodes; Potamyia; Trichoptera; millennium bug; mitochondrial genome evolution; year 2000 problem
Year: 2018 PMID: 31435510 PMCID: PMC6690253 DOI: 10.3934/genet.2018.1.1
Source DB: PubMed Journal: AIMS Genet ISSN: 2377-1143
Commonalities between challenges presented by the Y2K problem and by DNA barcoding.
| Identifier | two-digit year notation | animals: ∼658 bp |
| Identifier purpose | 1. To distinguish between years | 1. To facilitate species identification without reference to morphology |
| Constraints influencing the original design of the identifier | Extremely limited memory in early computers | 1. ∼500 bp maximum read length of radiolabeled dideoxy-terminated sequencing |
| Reason(s) for identifier maintenance | To maintain backwards compatibility with older software applications | To take advantage of the large database of existing DNA barcodes |
| Crisis | At the turn of the 21st century: 2000 > 1999, but 00 < 99. | 1. Many recent species pairs cannot be separated by 658 bp barcodes because there has not been enough time for mutations to accumulate within the regions |
| Resolution | Enlarge Identifiers: Worldwide effort to update software applications and change to four-digit identifiers in the late 1990s (acceptable until the year 9999) | Enlarge and diversify identifiers: The high copy number of organelle genomes and the nuclear rRNA repeat relative to the rest of the nuclear genome will cause these sequences to be very well represented among random reads of whole genome DNA extractions |
Caddisfly species collected at the Living Prairie Museum and analyzed in this study.
| Million | Mitochondrial Genome | Nuclear | |||||||
| Scientific Name | Collection Date | Specimen Identifier | Reads Total | # Reads | Mean Fold Coverage | Length (bp) | #Reads | Mean Fold Coverage | Length (bp) |
| 14-Aug-15 | 2015.08.14.065A | 6.84 | 67275 | 1266 X | 15097 | 9412 | 308 X | 7791 | |
| 17-Jul-15 | 2015.07.17.021A | 5.89 | 67559 | 1275 X | 15100 | 6239 | 122 X | 8323 | |
| 14-Aug-15 | 2015.08.14.106A | 7.75 | 37191 | 184 X | 15098 | 29449 | 548 X | 8683 | |
| 14-Aug-15 | 2015.08.14.066A | 2.08 | 86392 | 458 X | 15185 | 29054 | 640 X | 9228 | |
| 14-Aug-15 | 2015.08.14.067 | 6.30 | 15864 | 326 X | 15237 | 31093 | 301 X | 7797 | |
| 14-Aug-15 | 2015.08.14.070B | 4.59 | 122730 | 600 X | 15160 | 53095 | 1222 X | 9244 | |
| 17-Jul-15 | 2015.07.17.018 | 8.29 | 40865 | 482 X | 15048 | 98766 | 3149 X | 9400 | |
| 14-Aug-15 | 2015.08.14.077 | 8.36 | 6952 | 35 X | 14963 | 82832 | 168 X | 9232 | |
1Read length for C. analis and C. campyla was 300 bp. For all other species, read length was 75 bp.
Figure 1.Phylogenetic tress reconstructed from COI barcodes (left) and complete mitochondrial genomes (right) using maximum likelihood and parsimony. Asterisks indicate where some of the four most parsimonious COI trees differ from the tree topology shown here. Portions of the phylogenetic tree that are congruent between the analyses of the COI and the mitochondrial genome datasets are indicated by bold lines on the tree. Maximum likelihood bootstrap values are shown above each node, parsimony bootstrap values are shown below the node.
Figure 2.Phylogenetic tress reconstructed from nuclear rRNA repeats using maximum likelihood (left) and parsimony (right) methods. Portions of the phylogenetic tree that are congruent between the analyses of the nuclear rRNA repeat and mitochondrial genome datasets are indicated by bold lines on the trees. Maximum likelihood bootstrap values are shown above each node, parsimony bootstrap values are shown below the node.