| Literature DB >> 33148820 |
Abstract
Continued influx of metagenome-derived proteins with misannotated taxonomy into conventional databases, including RefSeq, threatens to eliminate the value of taxonomy identifiers. To prevent this, urgent efforts should be undertaken by submitters of metagenomic data sets as well as by database managers.Entities:
Keywords: MAG; RefSeq; binning; classification; metagenomics; taxonomy; transposons
Mesh:
Substances:
Year: 2020 PMID: 33148820 PMCID: PMC7643828 DOI: 10.1128/mSphere.00854-20
Source DB: PubMed Journal: mSphere ISSN: 2379-5042 Impact factor: 4.389
Top 50 misassigned bacterial metagenome entries from the nr protein database (accessed 15 October 2020) queried with a fungal non-LTR retrotransposon (MGR583) from Pyricularia (Magnaporthe) grisea (O13348) harboring the RT_nLTR_like (cd01650), R1_I_EN (cd09077), and Rnase_HI_RT_non_LTR (cd09276) conserved domains
| DB | SD | Accession no. | Description | Taxonomy | PMID or date |
|---|---|---|---|---|---|
| gb | BCT | Hypothetical protein C2W62_09385 | “ | ||
| ref | BCT | Endonuclease/exonuclease/phosphatase | Tolypothrix sp. FACHB-123 | ||
| gb | ENV | Hypothetical protein DI628_08940 | |||
| gb | ENV | Hypothetical protein DI537_52575 | |||
| gb | ENV | Hypothetical protein E6H10_03430 | |||
| gb | ENV | Hypothetical protein E6J34_00855 | |||
| gb | ENV | Hypothetical protein | |||
| gb | ENV | Hypothetical protein E6H10_10105 | |||
| gb | ENV | Hypothetical protein | |||
| ref | BCT | Endonuclease/exonuclease/phosphatase | |||
| gb | ENV | Hypothetical protein E6J34_18185 | |||
| gb | ENV | Hypothetical protein E6H10_15575 | |||
| gb | ENV | Hypothetical protein EOP64_03250 | |||
| gb | ENV | Hypothetical protein | |||
| gb | ENV | Hypothetical protein | |||
| ref | BCT | Hypothetical protein | |||
| gb | BCT | Hypothetical protein C2W62_38760 | “ | ||
| gb | ENV | Reverse transcriptase family protein | |||
| emb | BCT | Hypothetical protein BBROOKSOX_612 | |||
| ref | BCT | Hypothetical protein | |||
| gb | ENV | Hypothetical protein DSY43_06760 | |||
| gb | ENV | Hypothetical protein EOP33_08145 | |||
| ref | BCT | Hypothetical protein | |||
| ref | BCT | Hypothetical protein | |||
| gb | ENV | Endonuclease/exonuclease/phosphatase | 2 June 2020 | ||
| gb | ENV | Hypothetical protein DME65_11230 | |||
| gb | ENV | Hypothetical protein DMG62_24900 | |||
| ref | BCT | Hypothetical protein, partial | |||
| gb | ENV | Hypothetical protein EOP45_02990 | |||
| ref | BCT | Hypothetical protein | |||
| ref | BCT | RNA-directed DNA polymerase | |||
| ref | BCT | RNA-directed DNA polymerase | |||
| ref | BCT | Endonuclease/exonuclease/phosphatase | |||
| gb | ENV | Hypothetical protein EHM20_03370 | |||
| ref | BCT | Reverse transcriptase family protein | |||
| gb | ENV | Hypothetical protein DI617_08800 | |||
| gb | ENV | Reverse transcriptase family protein | 14 September 2020 | ||
| gb | ENV | Hypothetical protein DMF62_17490 | |||
| gb | ENV | RNA-directed DNA polymerase | 2 June 2020 | ||
| tpg | ENV | TPA: hypothetical protein | Tenacibaculum sp. | ||
| gb | ENV | Hypothetical protein DMF62_17715 | |||
| tpg | ENV | TPA: hypothetical protein | |||
| ref | BCT | Reverse transcriptase family protein | |||
| ref | BCT | Endonuclease/exonuclease/phosphatase | |||
| gb | ENV | Hypothetical protein E6H10_00780 | |||
| gb | ENV | Hypothetical protein | 16 February 2020 | ||
| gb | ENV | Reverse transcriptase family protein | 2 June 2020 | ||
| ref | BCT | Reverse transcriptase family protein | Bacterium 2013Ark19i | ||
| ref | BCT | Reverse transcriptase-like protein | |||
| ref | BCT | Hypothetical protein, partial |
BLASTP search was limited to Bacteria (taxid:2); database, nr. Data represent all nonredundant GenBank coding DNA sequence (CDS) translations plus PDB plus Swiss-Prot plus PIR plus PRF, excluding environmental samples from WGS projects; hits with E values of >1e−30 are shown. DB, database (GenBank, EMBL, RefSeq, or third party); SD, database subdivision (ENV, environmental samples; BCT, bacterial samples); TPA, third-party annotation. All RefSeq entries are linked to reference 2 (PMID 29112715); unpublished entries unlinked to a PMID show the release date.