Literature DB >> 25028725

Resolving complex tandem repeats with long reads.

Ajay Ummat1, Ali Bashir1.   

Abstract

MOTIVATION: Resolving tandemly repeated genomic sequences is a necessary step in improving our understanding of the human genome. Short tandem repeats (TRs), or microsatellites, are often used as molecular markers in genetics, and clinically, variation in microsatellites can lead to genetic disorders like Huntington's diseases. Accurately resolving repeats, and in particular TRs, remains a challenging task in genome alignment, assembly and variation calling. Though tools have been developed for detecting microsatellites in short-read sequencing data, these are limited in the size and types of events they can resolve. Single-molecule sequencing technologies may potentially resolve a broader spectrum of TRs given their increased length, but require new approaches given their significantly higher raw error profiles. However, due to inherent error profiles of the single-molecule technologies, these reads presents a unique challenge in terms of accurately identifying and estimating the TRs.
RESULTS: Here we present PacmonSTR, a reference-based probabilistic approach, to identify the TR region and estimate the number of these TR elements in long DNA reads. We present a multistep approach that requires as input, a reference region and the reference TR element. Initially, the TR region is identified from the long DNA reads via a 3-stage modified Smith-Waterman approach and then, expected number of TR elements is calculated using a pair-Hidden Markov Models-based method. Finally, TR-based genotype selection (or clustering: homozygous/heterozygous) is performed with Gaussian mixture models, using the Akaike information criteria, and coverage expectations.
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2014        PMID: 25028725     DOI: 10.1093/bioinformatics/btu437

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  25 in total

1.  TruSPAdes: barcode assembly of TruSeq synthetic long reads.

Authors:  Anton Bankevich; Pavel A Pevzner
Journal:  Nat Methods       Date:  2016-02-01       Impact factor: 28.547

2.  MsPAC: a tool for haplotype-phased structural variant detection.

Authors:  Oscar L Rodriguez; Anna Ritz; Andrew J Sharp; Ali Bashir
Journal:  Bioinformatics       Date:  2020-02-01       Impact factor: 6.937

3.  Long Single-Molecule Reads Can Resolve the Complexity of the Influenza Virus Composed of Rare, Closely Related Mutant Variants.

Authors:  Alexander Artyomenko; Nicholas C Wu; Serghei Mangul; Eleazar Eskin; Ren Sun; Alex Zelikovsky
Journal:  J Comput Biol       Date:  2016-11-30       Impact factor: 1.479

4.  lordFAST: sensitive and Fast Alignment Search Tool for LOng noisy Read sequencing Data.

Authors:  Ehsan Haghshenas; S Cenk Sahinalp; Faraz Hach
Journal:  Bioinformatics       Date:  2019-01-01       Impact factor: 6.937

5.  Assembly of long error-prone reads using de Bruijn graphs.

Authors:  Yu Lin; Jeffrey Yuan; Mikhail Kolmogorov; Max W Shen; Mark Chaisson; Pavel A Pevzner
Journal:  Proc Natl Acad Sci U S A       Date:  2016-12-12       Impact factor: 11.205

6.  Assembly and diploid architecture of an individual human genome via single-molecule technologies.

Authors:  Matthew Pendleton; Robert Sebra; Andy Wing Chun Pang; Ajay Ummat; Oscar Franzen; Tobias Rausch; Adrian M Stütz; William Stedman; Thomas Anantharaman; Alex Hastie; Heng Dai; Markus Hsi-Yang Fritz; Han Cao; Ariella Cohain; Gintaras Deikus; Russell E Durrett; Scott C Blanchard; Roger Altman; Chen-Shan Chin; Yan Guo; Ellen E Paxinos; Jan O Korbel; Robert B Darnell; W Richard McCombie; Pui-Yan Kwok; Christopher E Mason; Eric E Schadt; Ali Bashir
Journal:  Nat Methods       Date:  2015-06-29       Impact factor: 28.547

Review 7.  Structural variation in the sequencing era.

Authors:  Steve S Ho; Alexander E Urban; Ryan E Mills
Journal:  Nat Rev Genet       Date:  2019-11-15       Impact factor: 53.242

8.  A Survey of Rare Epigenetic Variation in 23,116 Human Genomes Identifies Disease-Relevant Epivariations and CGG Expansions.

Authors:  Paras Garg; Bharati Jadhav; Oscar L Rodriguez; Nihir Patel; Alejandro Martin-Trujillo; Miten Jain; Sofie Metsu; Hugh Olsen; Benedict Paten; Beate Ritz; R Frank Kooy; Jozef Gecz; Andrew J Sharp
Journal:  Am J Hum Genet       Date:  2020-09-15       Impact factor: 11.025

Review 9.  Applying genomic and transcriptomic advances to mitochondrial medicine.

Authors:  William L Macken; Jana Vandrovcova; Michael G Hanna; Robert D S Pitceathly
Journal:  Nat Rev Neurol       Date:  2021-02-23       Impact factor: 42.937

Review 10.  An update on the neurological short tandem repeat expansion disorders and the emergence of long-read sequencing diagnostics.

Authors:  Sanjog R Chintalaphani; Sandy S Pineda; Ira W Deveson; Kishore R Kumar
Journal:  Acta Neuropathol Commun       Date:  2021-05-25       Impact factor: 7.801

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.