Literature DB >> 35769000

Accurate identification of bacteriophages from metagenomic data using Transformer.

Jiayu Shang1, Xubo Tang1, Ruocheng Guo2, Yanni Sun1.   

Abstract

MOTIVATION: Bacteriophages are viruses infecting bacteria. Being key players in microbial communities, they can regulate the composition/function of microbiome by infecting their bacterial hosts and mediating gene transfer. Recently, metagenomic sequencing, which can sequence all genetic materials from various microbiome, has become a popular means for new phage discovery. However, accurate and comprehensive detection of phages from the metagenomic data remains difficult. High diversity/abundance, and limited reference genomes pose major challenges for recruiting phage fragments from metagenomic data. Existing alignment-based or learning-based models have either low recall or precision on metagenomic data.
RESULTS: In this work, we adopt the state-of-the-art language model, Transformer, to conduct contextual embedding for phage contigs. By constructing a protein-cluster vocabulary, we can feed both the protein composition and the proteins' positions from each contig into the Transformer. The Transformer can learn the protein organization and associations using the self-attention mechanism and predicts the label for test contigs. We rigorously tested our developed tool named PhaMer on multiple datasets with increasing difficulty, including quality RefSeq genomes, short contigs, simulated metagenomic data, mock metagenomic data and the public IMG/VR dataset. All the experimental results show that PhaMer outperforms the state-of-the-art tools. In the real metagenomic data experiment, PhaMer improves the F1-score of phage detection by 27%.
© The Author(s) 2022. Published by Oxford University Press.

Entities:  

Keywords:  deep learning; phage identification; protein cluster-based token; transformer

Mesh:

Year:  2022        PMID: 35769000      PMCID: PMC9294416          DOI: 10.1093/bib/bbac258

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   13.994


  38 in total

1.  MetaQUAST: evaluation of metagenome assemblies.

Authors:  Alla Mikheenko; Vladislav Saveliev; Alexey Gurevich
Journal:  Bioinformatics       Date:  2015-11-26       Impact factor: 6.937

2.  Structural rearrangements in the phage head-to-tail interface during assembly and infection.

Authors:  Yuriy Chaban; Rudi Lurz; Sandrine Brasilès; Charlène Cornilleau; Matthia Karreman; Sophie Zinn-Justin; Paulo Tavares; Elena V Orlova
Journal:  Proc Natl Acad Sci U S A       Date:  2015-05-19       Impact factor: 11.205

3.  Identifying viruses from metagenomic data using deep learning.

Authors:  Jie Ren; Kai Song; Chao Deng; Nathan A Ahlgren; Jed A Fuhrman; Yi Li; Xiaohui Xie; Ryan Poplin; Fengzhu Sun
Journal:  Quant Biol       Date:  2020-03

4.  Glacier ice archives nearly 15,000-year-old microbes and phages.

Authors:  Zhi-Ping Zhong; Funing Tian; Simon Roux; M Consuelo Gazitúa; Natalie E Solonenko; Yueh-Fen Li; Mary E Davis; James L Van Etten; Ellen Mosley-Thompson; Virginia I Rich; Matthew B Sullivan; Lonnie G Thompson
Journal:  Microbiome       Date:  2021-07-20       Impact factor: 14.650

5.  Assessing species biomass contributions in microbial communities via metaproteomics.

Authors:  Manuel Kleiner; Erin Thorson; Christine E Sharp; Xiaoli Dong; Dan Liu; Carmen Li; Marc Strous
Journal:  Nat Commun       Date:  2017-11-16       Impact factor: 14.919

6.  VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data.

Authors:  Jie Ren; Nathan A Ahlgren; Yang Young Lu; Jed A Fuhrman; Fengzhu Sun
Journal:  Microbiome       Date:  2017-07-06       Impact factor: 14.650

7.  Development and Utilization of a Rapid and Accurate Epidemic Investigation Support System for COVID-19.

Authors:  Young Joon Park; Sang Yun Cho; Jin Lee; Ikjin Lee; Won-Ho Park; Seungmyeong Jeong; Seongyun Kim; Seokjun Lee; Jaeho Kim; Ok Park
Journal:  Osong Public Health Res Perspect       Date:  2020-06

8.  PlasFlow: predicting plasmid sequences in metagenomic data using genome signatures.

Authors:  Pawel S Krawczyk; Leszek Lipinski; Andrzej Dziembowski
Journal:  Nucleic Acids Res       Date:  2018-04-06       Impact factor: 16.971

9.  Viral metagenomes of Lake Soyang, the largest freshwater lake in South Korea.

Authors:  Kira Moon; Suhyun Kim; Ilnam Kang; Jang-Cheon Cho
Journal:  Sci Data       Date:  2020-10-13       Impact factor: 6.444

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.