| Literature DB >> 31783752 |
Amrita Srivathsan1, Emily Hartop2,3, Jayanthi Puniamoorthy1, Wan Ting Lee1, Sujatha Narayanan Kutty1,4, Olavi Kurina5, Rudolf Meier6,7.
Abstract
BACKGROUND: More than 80% of all animal species remain unknown to science. Most of these species live in the tropics and belong to animal taxa that combine small body size with high specimen abundance and large species richness. For such clades, using morphology for species discovery is slow because large numbers of specimens must be sorted based on detailed microscopic investigations. Fortunately, species discovery could be greatly accelerated if DNA sequences could be used for sorting specimens to species. Morphological verification of such "molecular operational taxonomic units" (mOTUs) could then be based on dissection of a small subset of specimens. However, this approach requires cost-effective and low-tech DNA barcoding techniques because well-equipped, well-funded molecular laboratories are not readily available in many biodiverse countries.Entities:
Keywords: DNA barcoding; Large-scale species discovery; MinION; NGS barcoding; Nanopore sequencing
Mesh:
Year: 2019 PMID: 31783752 PMCID: PMC6884855 DOI: 10.1186/s12915-019-0706-9
Source DB: PubMed Journal: BMC Biol ISSN: 1741-7007 Impact factor: 7.431
Fig. 1Flowchart for generating MinION barcodes from experimental set-up to final barcodes. The novel steps introduced in this study are highlighted in green, and the scripts available in miniBarcoder for analyses are further indicated
Number of reads and barcodes generated via MinION sequencing
| Set 1: two flowcells | Set 2: one flowcell | Combined (sets 1 and 2)* | |
|---|---|---|---|
| # Specimens | 4275 | 4519 | 8699 |
| Resequencing (re-pooled) | 2172 | 2211 | |
| # reads/# reads > 600 bp | 7,035,075/3,703,712 | 7,179,121/2,652,657 | NA |
| Initial sequencing (all) | 3,069,048/1,942,212 | 4,853,363/2,250,591 | |
| Resequencing (re-pooled) | 3,966,027/1,761,500 | 2,325,758/402,066 | |
| # demultiplexed reads | 898,979 (24.3%) | 647,152 (24.4%) | |
| Initial sequencing (all) | 562,434 (29%) | 561,383 (24.9%) | |
| Resequencing (re-pooled) | 336,545 (19%) | 85,769 (21.3%) | |
| Combined results of original and resequencing runs | |||
| # specimens with ≥ 5× coverage | 4227 (98.9%) | 4287 (94.9%) | 8428 (96.9%) |
| # MAFFT barcodes < 1% Ns | 3797 (88.8%) | 3476 (76.9%) | 7221 (83%) |
| # MAFFT+AA barcodes | 3771 (88.2%) | 3459 (76.5%) | 7178 (82.5%) |
| # RACON barcodes | 3797 (88.8%) | 3476 (76.9%) | 7221 (83%) |
| # RACON+AA barcodes | 3788 (88.6%) | 3461 (76.6%) | 7194 (82.7%) |
| # Consolidated barcodes | 3762 (88%) | 3446 (76.3%) | 7155 (82.3%) |
| # Consolidated barcodes (non-phorids removed) | 3727 (87.2%) | 3426 (75.8%) | 7059 (81.1%) |
| # mOTUS (2/3/4%) | 705/663/613 | ||
*One plate was accidentally sequenced in both runs, duplicates removed for combined set
Fig. 2Effect of re-pooling on coverage of barcodes for both sets of specimens. Barcodes with coverage < 50× were re-pooled and hence the coverage of these barcodes increases
Accuracy of MinION as assessed by Illumina barcodes. The MinION barcodes were trimmed to the 313 bp that were sequenced using Illumina. The overall optimal strategy is “Consolidated (namino = 2)”. Optimal congruence values are highlighted in bold
| Dataset | # compared with Illumina/ # 3% mOTUs | Accuracy | %Ns | # barcodes with errors/# > 3% errors | mOTU richness deviation between MinION and Illumina barcodes | ||
|---|---|---|---|---|---|---|---|
| 2% | 3% | 4% | |||||
| MAFFT | 6291/641 | 99.6136 (0.37) | 0.18 (0.2) | 4473/30 | -3 (−0.44%) | −11 (−1.85%) | |
| RACON | 6291/645 | 99.5097 (0.48) | < 0.001 (0.01) | 4494/36 | 12 (1.72%) | ||
| MAFFT+AA (namino = 1) | 6269/635 | 99.9689 (0.19) | 1.19 (0.84) | 273/27 | −6 (−0.89%) | −6 (− 0.94%) | − 6 (−1.02%) |
| MAFFT+AA (namino = 2) | 6269/635 | 99.9802 (0.15) | 1.28 (0.92) | 216/28 | −8 (−1.19%) | −6 (− 0.94%) | − 14 (− 2.37%) |
| MAFFT+AA (namino = 3) | 6269/633 | 99.9685 (0.18) | 1.25 (0.91) | 310/27 | −8 (−1.19%) | − 8 (− 1.26%) | −17 (−2.9%) |
| RACON+AA (namino = 1) | 6273/639 | 99.9636 (0.19) | 0.48 (0.47) | 392/26 | − 4 (−0.59%) | − 4 (− 0.63%) | − 13 (− 2.19%) |
| RACON+AA (namino = 2) | 6273/635 | 99.9736 (0.16) | 0.55 (0.55) | 288/26 | −5 (− 0.74%) | −8 (− 1.26%) | −14 (− 2.36%) |
| RACON+AA (namino = 3) | 6273/636 | 99.9684 (0.17) | 0.57 (0.6) | 345/28 | −7 (− 1.1%) | −13 (− 2.19%) | |
| Consolidated (namino = 1) | 6229/639 | 99.9849 (0.12) | 0.41 (0.44) | 191 | −2 (− 0.29%) | ||
| 6251/638 | − 2 (− 0.29%) | − 3 (− 0.47%) | −5 (− 0.83%) | ||||
| Consolidated (namino = 3) | 6245/639 | 99.9795 (0.12) | 0.44 (0.49) | 285/26 | −4 (− 0.59%) | −4 (− 0.67%) | |
Fig. 3Ambiguities in MAFFT+AA (purple), RACON+AA (yellow), and consolidated barcodes (green) with varying namino parameters (1, 2, and 3). One outlier value for Racon+3AA barcode was excluded from the plot. The plot shows that the consolidated barcodes have few ambiguities remaining
Fig. 4The Malaise trap that revealed the estimated > 1000 mOTUs as shown by the species richness estimation curve. Green: Chao1 Mean, Pink: S (Mean), Orange: Singleton Mean, Purple: Doubleton mean
Fig. 5Lateral habitus a and diagnostic features of Megaselia sepsioides spec. nov. b posterior view of the foreleg, c anterior view of the midleg, d, e anterior and postero-dorsal views of the hindleg, and f dorsal view of thorax and abdomen
Fig. 6Haplotype variation of Megaselia sepsioides spec. nov. a UGC0005996, b UGC0012244, and c UGC0012899. UGC numbers refer to specimen IDs
Fig. 7Haplotype network for Megaselia sepsioides spec. nov. UGC numbers refer to specimen IDs