| Literature DB >> 28291235 |
Ryuji J Machida1, Matthieu Leray2, Shian-Lei Ho1, Nancy Knowlton3.
Abstract
Mitochondrial-encoded genes are increasingly targeted in studies using high-throughput sequencing approaches for characterizing metazoan communities from environmental samples (e.g., plankton, meiofauna, filtered water). Yet, unlike nuclear ribosomal RNA markers, there is to date no high-quality reference dataset available for taxonomic assignments. Here, we retrieved all metazoan mitochondrial gene sequences from GenBank, and then quality filtered and formatted the datasets for taxonomic assignments using taxonomic assignment tools. The reference datasets-'Midori references'-are available for download at www.reference-midori.info. Two versions are provided: (I) Midori-UNIQUE that contains all unique haplotypes associated with each species and (II) Midori-LONGEST that contains a single sequence, the longest, for each species. Overall, the mitochondrial Cytochrome oxidase subunit I gene was the most sequence-rich gene. However, sequences of the mitochondrial large ribosomal subunit RNA and Cytochrome b apoenzyme genes were observed for a large number of species in some phyla. The Midori reference is compatible with some taxonomic assignment software. Therefore, automated high-throughput sequence taxonomic assignments can be particularly effective using these datasets.Entities:
Mesh:
Year: 2017 PMID: 28291235 PMCID: PMC5349245 DOI: 10.1038/sdata.2017.27
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1Numbers of sequences included in the reference datasets.
| Refer to the text for the abbreviation of gene names. | ||
|---|---|---|
| 66,937 | 30,866 | |
| 146,164 | 54,728 | |
| 23,819 | 8,947 | |
| 13,187 | 8,135 | |
| 583,043 | 110,704 | |
| 44,046 | 18,320 | |
| 15,940 | 7,436 | |
| 223,247 | 35,079 | |
| 34,090 | 14,038 | |
| 72,482 | 18,880 | |
| 15,397 | 9,025 | |
| 9,987 | 6,729 | |
| 36,819 | 10,892 | |
| 28,657 | 8,793 | |
| 12,223 | 6,795 |
Figure 2A heat map of observed gene sequence number (Z-score transformed percentage) per phylum.
Midori-LONGEST (one sequence per species) was used for this comparison. Percentages of species with sequence data are indicated in each column. Loricifera was removed because no mitochondrial sequence was found in the nt dataset. Refer to the text for the abbreviation of gene names.
Percentage of species with mitochondrial gene sequences out of the total described or estimated species in each phylum.
| Midori-LONGEST (longest sequence for each species) was used for this calculation. Numbers of described and estimated species follow Chapman[ | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acanthocephala | (1,150/1,500) | 0.61/0.47 | 0.87/0.67 | 0.61/0.47 | 0/0 | 6.35/4.87 | 0.61/0.47 | 0.61/0.47 | 0.61/0.47 | 0.61/0.47 | 0.61/0.47 | 0.61/0.47 | 0.52/0.40 | 0.52/0.40 | 0.61/0.47 | 0.61/0.47 |
| Annelida | (16,763/30,000) | 2.86/1.60 | 7.93/4.43 | 0.29/0.16 | 0.38/0.21 | 10.90/6.09 | 1.18/0.66 | 0.36/0.20 | 0.76/0.42 | 1.77/0.99 | 0.40/0.22 | 0.30/0.17 | 0.29/0.16 | 0.42/0.23 | 0.30/0.17 | 0.33/0.18 |
| Arthropoda | (1,175,708/5,892,000) | 0.90/0.18 | 2.04/0.41 | 0.15/0.03 | 0.14/0.03 | 6.40/1.28 | 1.02/0.20 | 0.16/0.03 | 0.69/0.14 | 0.48/0.09 | 0.19/0.04 | 0.13/0.03 | 0.13/0.03 | 0.18/0.04 | 0.27/0.05 | 0.13/0.03 |
| Brachiopoda | (550) | 3.45 | 3.64 | 1.09 | 1.27 | 8.00 | 1.09 | 1.09 | 1.09 | 1.09 | 1.09 | 1.09 | 1.09 | 1.09 | 1.09 | 1.09 |
| Bryozoa | (5,700/5,000) | 0.89/1.02 | 2.61/2.98 | 0.12/0.14 | 0.07/0.08 | 3.09/3.52 | 0.12/0.14 | 0.61/0.70 | 0.95/1.08 | 0.12/0.14 | 0.12/0.14 | 0.12/0.14 | 0.12/0.14 | 0.12/0.14 | 0.12/0.14 | 0.12/0.14 |
| Chaetognatha | (121) | 4.13 | 6.61 | 0 | 0 | 23.97 | 5.79 | 4.13 | 4.13 | 4.13 | 4.13 | 4.13 | 4.13 | 4.13 | 4.13 | 4.13 |
| Chordata | (64,788/80,500) | 25.00/20.12 | 30.01/24.15 | 9.79/7.88 | 9.19/7.40 | 31.47/25.32 | 7.71/6.21 | 6.77/5.45 | 38.85/31.26 | 10.32/8.30 | 24.08/19.38 | 10.15/8.17 | 6.71/5.40 | 12.14/9.77 | 7.43/5.98 | 6.76/5.44 |
| Cnidaria | (9,795) | 3.54 | 12.15 | 1.57 | 1.43 | 11.47 | 1.47 | 2.55 | 2.65 | 1.97 | 3.11 | 1.84 | 1.85 | 1.35 | 1.55 | 2.14 |
| Ctenophora | (166/200) | 0.60/0.50 | 1.20/1.00 | 0/0 | 0/0 | 3.61/3.00 | 1.81/1.50 | 1.20/1.00 | 1.20/1.00 | 1.20/1.00 | 0.60/0.50 | 1.20/1.00 | 1.20/1.00 | 1.20/1.00 | 1.20/1.00 | 0.60/0.50 |
| Cycliophora | (1) | 0 | 100 | 0 | 0 | 200 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Echinodermata | (7,003/14,000) | 4.16/2.08 | 10.58/5.29 | 0.80/0.40 | 0.79/0.39 | 13.04/6.52 | 1.14/0.57 | 1.07/0.54 | 1.29/0.64 | 0.79/0.39 | 0.80/0.40 | 0.59/0.29 | 0.69/0.34 | 0.59/0.29 | 0.59/0.29 | 0.59/0.29 |
| Entoprocta | (170/170) | 1.18/1.18 | 1.18/1.18 | 1.18/1.18 | 1.18/1.18 | 10.00/10.00 | 1.18/1.18 | 1.18/1.18 | 1.18/1.18 | 1.18/1.18 | 1.18/1.18 | 1.18/1.18 | 1.18/1.18 | 1.18/1.18 | 1.18/1.18 | 1.18/1.18 |
| Gastrotricha | (400) | 0.25 | 0.25 | 0.25 | 0.25 | 10.25 | 0.25 | 0.25 | 0.25 | 0.25 | 0.25 | 0.25 | 0.25 | 0.25 | 0.25 | 0.25 |
| Gnathostomulida | (97) | 2.06 | 2.06 | 2.06 | 2.06 | 9.28 | 2.06 | 2.06 | 2.06 | 2.06 | 2.06 | 2.06 | 2.06 | 2.06 | 2.06 | 2.06 |
| Hemichordata | (108/110) | 3.70/3.64 | 24.07/23.64 | 3.70/3.64 | 3.70/3.64 | 4.63/4.55 | 3.70/3.64 | 3.70/3.64 | 8.33/8.18 | 3.70/3.64 | 3.70/3.64 | 3.70/3.64 | 3.70/3.64 | 3.70/3.64 | 3.70/3.64 | 3.70/3.64 |
| Kinorhyncha | (130) | 0 | 0 | 0 | 0 | 13.85 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Mollusca | (85,000/200,000) | 2.76/1.17 | 8.20/3.49 | 0.33/0.14 | 0.27/0.12 | 9.61/4.09 | 0.55/0.23 | 0.45/0.19 | 0.95/0.40 | 0.78/0.33 | 0.34/0.14 | 0.34/0.15 | 0.33/0.14 | 0.35/0.15 | 0.33/0.14 | 0.32/0.14 |
| Nematoda | (25,000/500,000) | 0.98/0.05 | 0.82/0.04 | 0.50/0.03 | 0.05/0 | 2.66/0.13 | 0.79/0.04 | 0.51/0.03 | 0.59/0.03 | 0.57/0.03 | 0.50/0.03 | 0.52/0.03 | 0.48/0.02 | 0.62/0.03 | 0.57/0.03 | 0.49/0.02 |
| Nematomorpha | (331/2,000) | 0/0 | 0/0 | 0/0 | 0/0 | 2.42/0.40 | 0/0 | 0/0 | 0.60/0.10 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
| Nemertea | (1,200/10,000) | 1.17/0.14 | 11.08/1.33 | 1.17/0.14 | 1.17/0.14 | 14.00/1.68 | 1.17/0.14 | 1.67/0.20 | 1.33/0.16 | 1.33/0.16 | 1.25/0.15 | 1.25/0.15 | 1.25/0.15 | 1.25/0.15 | 1.25/0.15 | 1.33/0.16 |
| Onychophora | (165/220) | 33.94/25.45 | 20.00/15.00 | 2.42/1.82 | 2.42/1.82 | 41.82/31.36 | 3.03/2.27 | 2.42/1.82 | 2.42/1.82 | 3.03/2.27 | 2.42/1.82 | 2.42/1.82 | 2.42/1.82 | 2.42/1.82 | 2.42/1.82 | 2.42/1.82 |
| Placozoa | (1) | 100 | 200 | 100 | 0 | 100 | 100 | 100 | 100 | 100 | 0 | 100 | 100 | 100 | 0 | 100 |
| Platyhelminthes | (20,000/80,000) | 0.76/0.19 | 1.45/0.36 | 0.44/0.11 | 0/0 | 4.07/1.02 | 0.43/0.11 | 0.39/0.10 | 0.64/0.16 | 1.36/0.34 | 0.42/0.10 | 0.62/0.16 | 0.40/0.10 | 0.42/0.11 | 0.40/0.10 | 0.43/0.11 |
| Porifera | (6,000/18,000) | 0.93/0.31 | 1.55/0.52 | 1.03/0.34 | 0.78/0.26 | 9.13/3.04 | 0.93/0.31 | 0.95/0.32 | 1.02/0.34 | 1.05/0.35 | 0.87/0.29 | 0.90/0.30 | 0.90/0.30 | 0.92/0.31 | 0.93/0.31 | 0.95/0.32 |
| Priapulida | (16) | 12.50 | 12.50 | 12.50 | 12.50 | 18.75 | 12.50 | 12.50 | 12.50 | 12.50 | 12.50 | 12.50 | 12.50 | 12.50 | 12.50 | 12.50 |
| Rotifera | (2,180) | 0.18 | 1.10 | 0.50 | 0 | 8.17 | 0.37 | 0.41 | 0.37 | 0.37 | 0.28 | 0.32 | 0.37 | 0.37 | 0.37 | 0.37 |
| Tardigrada | (212) | 1.42 | 1.42 | 0.94 | 0.94 | 28.30 | 0.94 | 0.94 | 0.94 | 0.94 | 1.42 | 0.94 | 0.94 | 0.94 | 0.94 | 0.94 |
Number of clusters containing multiple higher level taxonomic groups (phylum, class, order) after 99% similarity clustering.
| Numbers of problematic sequences removed for this reason are also denoted. Midori-UNIQUE was used for this analysis. | ||||
|---|---|---|---|---|
| 0 | 5 | 23 | 17 | |
| 7 | 12 | 44 | 67 | |
| 0 | 0 | 9 | 9 | |
| 0 | 0 | 8 | 7 | |
| 33 | 61 | 210 | 245 | |
| 0 | 0 | 9 | 16 | |
| 0 | 0 | 8 | 3 | |
| 13 | 19 | 66 | 101 | |
| 0 | 1 | 5 | 4 | |
| 0 | 0 | 9 | 8 | |
| 0 | 0 | 6 | 6 | |
| 0 | 0 | 4 | 2 | |
| 0 | 0 | 4 | 3 | |
| 0 | 0 | 6 | 6 | |
| 0 | 0 | 3 | 3 |