Literature DB >> 29126180

RepLong: de novo repeat identification using long read sequencing data.

Rui Guo1, Yan-Ran Li1, Shan He2,3, Le Ou-Yang4, Yiwen Sun5, Zexuan Zhu1.   

Abstract

Motivation: The identification of repetitive elements is important in genome assembly and phylogenetic analyses. The existing de novo repeat identification methods exploiting the use of short reads are impotent in identifying long repeats. Since long reads are more likely to cover repeat regions completely, using long reads is more favorable for recognizing long repeats.
Results: In this study, we propose a novel de novo repeat elements identification method namely RepLong based on PacBio long reads. Given that the reads mapped to the repeat regions are highly overlapped with each other, the identification of repeat elements is equivalent to the discovery of consensus overlaps between reads, which can be further cast into a community detection problem in the network of read overlaps. In RepLong, we first construct a network of read overlaps based on pair-wise alignment of the reads, where each vertex indicates a read and an edge indicates a substantial overlap between the corresponding two reads. Secondly, the communities whose intra connectivity is greater than the inter connectivity are extracted based on network modularity optimization. Finally, representative reads in each community are extracted to form the repeat library. Comparison studies on Drosophila melanogaster and human long read sequencing data with genome-based and short-read-based methods demonstrate the efficiency of RepLong in identifying long repeats. RepLong can handle lower coverage data and serve as a complementary solution to the existing methods to promote the repeat identification performance on long-read sequencing data. Availability and implementation: The software of RepLong is freely available at https://github.com/ruiguo-bio/replong. Contact: ywsun@szu.edu.cn or zhuzx@szu.edu.cn. Supplementary information: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Year:  2018        PMID: 29126180     DOI: 10.1093/bioinformatics/btx717

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  9 in total

1.  A new statistic for efficient detection of repetitive sequences.

Authors:  Sijie Chen; Yixin Chen; Fengzhu Sun; Michael S Waterman; Xuegong Zhang
Journal:  Bioinformatics       Date:  2019-11-01       Impact factor: 6.937

2.  RepAHR: an improved approach for de novo repeat identification by assembly of the high-frequency reads.

Authors:  Xingyu Liao; Xin Gao; Xiankai Zhang; Fang-Xiang Wu; Jianxin Wang
Journal:  BMC Bioinformatics       Date:  2020-10-19       Impact factor: 3.169

3.  TransposonUltimate: software for transposon classification, annotation and detection.

Authors:  Kevin Riehl; Cristian Riccio; Eric A Miska; Martin Hemberg
Journal:  Nucleic Acids Res       Date:  2022-06-24       Impact factor: 19.160

4.  DeepRepeat: direct quantification of short tandem repeats on signal data from nanopore sequencing.

Authors:  Li Fang; Qian Liu; Alex Mas Monteys; Pedro Gonzalez-Alegre; Beverly L Davidson; Kai Wang
Journal:  Genome Biol       Date:  2022-04-28       Impact factor: 17.906

5.  MGERT: a pipeline to retrieve coding sequences of mobile genetic elements from genome assemblies.

Authors:  Andrei S Guliaev; Seraphima K Semyenova
Journal:  Mob DNA       Date:  2019-05-14

Review 6.  Methodologies for the De novo Discovery of Transposable Element Families.

Authors:  Jessica M Storer; Robert Hubley; Jeb Rosen; Arian F A Smit
Journal:  Genes (Basel)       Date:  2022-04-17       Impact factor: 4.141

7.  A sensitive repeat identification framework based on short and long reads.

Authors:  Xingyu Liao; Min Li; Kang Hu; Fang-Xiang Wu; Xin Gao; Jianxin Wang
Journal:  Nucleic Acids Res       Date:  2021-09-27       Impact factor: 16.971

8.  DeviaTE: Assembly-free analysis and visualization of mobile genetic element composition.

Authors:  Lukas Weilguny; Robert Kofler
Journal:  Mol Ecol Resour       Date:  2019-07-03       Impact factor: 7.090

9.  BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data.

Authors:  Jinxiang Chen; Fuyi Li; Miao Wang; Junlong Li; Tatiana T Marquez-Lago; André Leier; Jerico Revote; Shuqin Li; Quanzhong Liu; Jiangning Song
Journal:  Front Big Data       Date:  2022-01-18
  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.