Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 RepLong: de novo repeat identification using long read sequencing data.

Literature DB >> 29126180

RepLong: de novo repeat identification using long read sequencing data.

Rui Guo¹, Yan-Ran Li¹, Shan He^2,3, Le Ou-Yang⁴, Yiwen Sun⁵, Zexuan Zhu¹.

Abstract

Motivation: The identification of repetitive elements is important in genome assembly and phylogenetic analyses. The existing de novo repeat identification methods exploiting the use of short reads are impotent in identifying long repeats. Since long reads are more likely to cover repeat regions completely, using long reads is more favorable for recognizing long repeats.
Results: In this study, we propose a novel de novo repeat elements identification method namely RepLong based on PacBio long reads. Given that the reads mapped to the repeat regions are highly overlapped with each other, the identification of repeat elements is equivalent to the discovery of consensus overlaps between reads, which can be further cast into a community detection problem in the network of read overlaps. In RepLong, we first construct a network of read overlaps based on pair-wise alignment of the reads, where each vertex indicates a read and an edge indicates a substantial overlap between the corresponding two reads. Secondly, the communities whose intra connectivity is greater than the inter connectivity are extracted based on network modularity optimization. Finally, representative reads in each community are extracted to form the repeat library. Comparison studies on Drosophila melanogaster and human long read sequencing data with genome-based and short-read-based methods demonstrate the efficiency of RepLong in identifying long repeats. RepLong can handle lower coverage data and serve as a complementary solution to the existing methods to promote the repeat identification performance on long-read sequencing data. Availability and implementation: The software of RepLong is freely available at https://github.com/ruiguo-bio/replong. Contact: ywsun@szu.edu.cn or zhuzx@szu.edu.cn. Supplementary information: Supplementary data are available at Bioinformatics online.

Entities: Species

Mesh：

Year: 2018 PMID： 29126180 DOI： 10.1093/bioinformatics/btx717

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

9 in total

9. BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data.

Authors: Jinxiang Chen; Fuyi Li; Miao Wang; Junlong Li; Tatiana T Marquez-Lago; André Leier; Jerico Revote; Shuqin Li; Quanzhong Liu; Jiangning Song
Journal: Front Big Data Date: 2022-01-18

9 in total

RepLong: de novo repeat identification using long read sequencing data.

1. A new statistic for efficient detection of repetitive sequences.

2. RepAHR: an improved approach for de novo repeat identification by assembly of the high-frequency reads.

3. TransposonUltimate: software for transposon classification, annotation and detection.

4. DeepRepeat: direct quantification of short tandem repeats on signal data from nanopore sequencing.

5. MGERT: a pipeline to retrieve coding sequences of mobile genetic elements from genome assemblies.

Review 6. Methodologies for the De novo Discovery of Transposable Element Families.

7. A sensitive repeat identification framework based on short and long reads.

8. DeviaTE: Assembly-free analysis and visualization of mobile genetic element composition.

9. BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data.