Literature DB >> 24135263

Discretized Gaussian mixture for genotyping of microsatellite loci containing homopolymer runs.

Hongseok Tae1, Dong-Yun Kim, John McCormick, Robert E Settlage, Harold R Garner.   

Abstract

MOTIVATION: Inferring lengths of inherited microsatellite alleles with single base pair resolution from short sequence reads is challenging due to several sources of noise caused by the repetitive nature of microsatellites and the technologies used to generate raw sequence data.
RESULTS: We have developed a program, GenoTan, using a discretized Gaussian mixture model combined with a rules-based approach to identify inherited variation of microsatellite loci from short sequence reads without paired-end information. It effectively distinguishes length variants from noise including insertion/deletion errors in homopolymer runs by addressing the bidirectional aspect of insertion and deletion errors in sequence reads. Here we first introduce a homopolymer decomposition method which estimates error bias toward insertion or deletion in homopolymer sequence runs. Combining these approaches, GenoTan was able to genotype 94.9% of microsatellite loci accurately from simulated data with 40x sequence coverage quickly while the other programs showed <90% correct calls for the same data and required 5∼30× more computational time than GenoTan. It also showed the highest true-positive rate for real data using mixed sequence data of two Drosophila inbred lines, which was a novel validation approach for genotyping. AVAILABILITY: GenoTan is open-source software available at http://genotan.sourceforge.net.

Entities:  

Mesh:

Year:  2013        PMID: 24135263      PMCID: PMC3933874          DOI: 10.1093/bioinformatics/btt595

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  15 in total

1.  The direction of microsatellite mutations is dependent upon allele length.

Authors:  X Xu; M Peng; Z Fang
Journal:  Nat Genet       Date:  2000-04       Impact factor: 38.330

2.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

3.  Dindel: accurate indel calls from short-read data.

Authors:  Cornelis A Albers; Gerton Lunter; Daniel G MacArthur; Gilean McVean; Willem H Ouwehand; Richard Durbin
Journal:  Genome Res       Date:  2010-10-27       Impact factor: 9.043

4.  Tandem repeats finder: a program to analyze DNA sequences.

Authors:  G Benson
Journal:  Nucleic Acids Res       Date:  1999-01-15       Impact factor: 16.971

5.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment.

Authors:  B Ewing; L Hillier; M C Wendl; P Green
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

6.  Evaluation of microsatellite variation in the 1000 Genomes Project pilot studies is indicative of the quality and utility of the raw data and alignments.

Authors:  L J McIver; J W Fondon; M A Skinner; H R Garner
Journal:  Genomics       Date:  2011-01-09       Impact factor: 5.736

7.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

8.  Alta-Cyclic: a self-optimizing base caller for next-generation sequencing.

Authors:  Yaniv Erlich; Partha P Mitra; Melissa delaBastide; W Richard McCombie; Gregory J Hannon
Journal:  Nat Methods       Date:  2008-07-06       Impact factor: 28.547

9.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

10.  Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles.

Authors:  Gareth Highnam; Christopher Franck; Andy Martin; Calvin Stephens; Ashwin Puthige; David Mittelman
Journal:  Nucleic Acids Res       Date:  2012-10-22       Impact factor: 16.971

View more
  13 in total

1.  Detecting Expansions of Tandem Repeats in Cohorts Sequenced with Short-Read Sequencing Data.

Authors:  Rick M Tankard; Mark F Bennett; Peter Degorski; Martin B Delatycki; Paul J Lockhart; Melanie Bahlo
Journal:  Am J Hum Genet       Date:  2018-11-29       Impact factor: 11.025

2.  Whole-exome sequencing reveals microsatellite DNA markers for response to dofetilide initiation in patients with persistent atrial fibrillation: A pilot study.

Authors:  Nick Kinney; Timothy R Larsen; David M Kim; Robin T Varghese; Steven Poelzing; Harold R Garner; Soufian T AlMahameed
Journal:  Clin Cardiol       Date:  2018-06-11       Impact factor: 2.882

3.  Assessment of Microsatellite Instability from Next-Generation Sequencing Data.

Authors:  Victor Renault; Emmanuel Tubacher; Alexandre How-Kit
Journal:  Adv Exp Med Biol       Date:  2022       Impact factor: 2.622

4.  Pheno2Geno - High-throughput generation of genetic markers and maps from molecular phenotypes for crosses between inbred strains.

Authors:  Konrad Zych; Yang Li; Joeri K van der Velde; Ronny V L Joosen; Wilco Ligterink; Ritsert C Jansen; Danny Arends
Journal:  BMC Bioinformatics       Date:  2015-02-19       Impact factor: 3.169

5.  Exome-wide somatic microsatellite variation is altered in cells with DNA repair deficiencies.

Authors:  Zalman Vaksman; Natalie C Fonville; Hongseok Tae; Harold R Garner
Journal:  PLoS One       Date:  2014-11-17       Impact factor: 3.240

6.  Novel variation at chr11p13 associated with cystic fibrosis lung disease severity.

Authors:  Hong Dang; Paul J Gallins; Rhonda G Pace; Xue-Liang Guo; Jaclyn R Stonebraker; Harriet Corvol; Garry R Cutting; Mitchell L Drumm; Lisa J Strug; Michael R Knowles; Wanda K O'Neal
Journal:  Hum Genome Var       Date:  2016-07-07

7.  Exceptionally long-range haplotypes in Plasmodium falciparum chromosome 6 maintained in an endemic African population.

Authors:  Alfred Amambua-Ngwa; Bakary Danso; Archibald Worwui; Sukai Ceesay; Nwakanma Davies; David Jeffries; Umberto D'Alessandro; David Conway
Journal:  Malar J       Date:  2016-10-21       Impact factor: 2.979

8.  ZDHHC3 as a Risk and Mortality Marker for Breast Cancer in African American Women.

Authors:  Nick Kinney; Robin T Varghese; Ramu Anandakrishnan; Harold R Skip Garner
Journal:  Cancer Inform       Date:  2017-12-13

9.  CAGm: a repository of germline microsatellite variations in the 1000 genomes project.

Authors:  Nicholas Kinney; Kyle Titus-Glover; Jonathan D Wren; Robin T Varghese; Pawel Michalak; Han Liao; Ramu Anandakrishnan; Arichanah Pulenthiran; Lin Kang; Harold R Garner
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

10.  A unified analytic framework for prioritization of non-coding variants of uncertain significance in heritable breast and ovarian cancer.

Authors:  Eliseos J Mucaki; Natasha G Caminsky; Ami M Perri; Ruipeng Lu; Alain Laederach; Matthew Halvorsen; Joan H M Knoll; Peter K Rogan
Journal:  BMC Med Genomics       Date:  2016-04-11       Impact factor: 3.063

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.