Literature DB >> 33372605

Deep learning for HGT insertion sites recognition.

Chen Li1, Jiaxing Chen1, Shuai Cheng Li2.   

Abstract

BACKGROUND: Horizontal Gene Transfer (HGT) refers to the sharing of genetic materials between distant species that are not in a parent-offspring relationship. The HGT insertion sites are important to understand the HGT mechanisms. Recent studies in main agents of HGT, such as transposon and plasmid, demonstrate that insertion sites usually hold specific sequence features. This motivates us to find a method to infer HGT insertion sites according to sequence features.
RESULTS: In this paper, we propose a deep residual network, DeepHGT, to recognize HGT insertion sites. To train DeepHGT, we extracted about 1.55 million sequence segments as training instances from 262 metagenomic samples, where the ratio between positive instances and negative instances is about 1:1. These segments are randomly partitioned into three subsets: 80% of them as the training set, 10% as the validation set, and the remaining 10% as the test set. The training loss of DeepHGT is 0.4163 and the validation loss is 0.423. On the test set, DeepHGT has achieved the area under curve (AUC) value of 0.8782. Furthermore, in order to further evaluate the generalization of DeepHGT, we constructed an independent test set containing 689,312 sequence segments from another 147 gut metagenomic samples. DeepHGT has achieved the AUC value of 0.8428, which approaches the previous test AUC value. As a comparison, the gradient boosting classifier model implemented in PyFeat achieve an AUC value of 0.694 and 0.686 on the above two test sets, respectively. Furthermore, DeepHGT could learn discriminant sequence features; for example, DeepHGT has learned a sequence pattern of palindromic subsequences as a significantly (P-value=0.0182) local feature. Hence, DeepHGT is a reliable model to recognize the HGT insertion site.
CONCLUSION: DeepHGT is the first deep learning model that can accurately recognize HGT insertion sites on genomes according to the sequence pattern.

Entities:  

Keywords:  DNA sequence feature; Deep residual model; HGT insertion site

Mesh:

Year:  2020        PMID: 33372605      PMCID: PMC7771070          DOI: 10.1186/s12864-020-07296-1

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


  46 in total

1.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning.

Authors:  Babak Alipanahi; Andrew Delong; Matthew T Weirauch; Brendan J Frey
Journal:  Nat Biotechnol       Date:  2015-07-27       Impact factor: 54.908

Review 2.  Mobile genetic elements: the agents of open source evolution.

Authors:  Laura S Frost; Raphael Leplae; Anne O Summers; Ariane Toussaint
Journal:  Nat Rev Microbiol       Date:  2005-09       Impact factor: 60.633

3.  Index for rating diagnostic tests.

Authors:  W J YOUDEN
Journal:  Cancer       Date:  1950-01       Impact factor: 6.860

Review 4.  Deep learning.

Authors:  Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal:  Nature       Date:  2015-05-28       Impact factor: 49.962

5.  Predicting effects of noncoding variants with deep learning-based sequence model.

Authors:  Jian Zhou; Olga G Troyanskaya
Journal:  Nat Methods       Date:  2015-08-24       Impact factor: 28.547

Review 6.  Bacterial repetitive extragenic palindromic sequences are DNA targets for Insertion Sequence elements.

Authors:  Raquel Tobes; Eduardo Pareja
Journal:  BMC Genomics       Date:  2006-03-24       Impact factor: 3.969

7.  A gene horizontally transferred from bacteria protects arthropods from host plant cyanide poisoning.

Authors:  Nicky Wybouw; Wannes Dermauw; Luc Tirry; Christian Stevens; Miodrag Grbić; René Feyereisen; Thomas Van Leeuwen
Journal:  Elife       Date:  2014-04-24       Impact factor: 8.140

8.  Massive Gene Flux Drives Genome Diversity between Sympatric Streptomyces Conspecifics.

Authors:  Abdoul-Razak Tidjani; Jean-Noël Lorenzi; Maxime Toussaint; Erwin van Dijk; Delphine Naquin; Olivier Lespinet; Cyril Bontemps; Pierre Leblond
Journal:  mBio       Date:  2019-09-03       Impact factor: 7.867

9.  Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci.

Authors:  Georgios K Georgakilas; Andrea Grioni; Konstantinos G Liakos; Eliska Chalupova; Fotis C Plessas; Panagiotis Alexiou
Journal:  Sci Rep       Date:  2020-06-11       Impact factor: 4.379

10.  Genome-wide analysis of horizontally acquired genes in the genus Mycobacterium.

Authors:  Arup Panda; Michel Drancourt; Tamir Tuller; Pierre Pontarotti
Journal:  Sci Rep       Date:  2018-10-04       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.