Literature DB >> 34262790

Confidence intervals for Markov chain transition probabilities based on next generation sequencing reads data.

Lin Wan1, Xin Kang2, Jie Ren3, Fengzhu Sun3.   

Abstract

BACKGROUND: Markov chains (MC) have been widely used to model molecular sequences. The estimations of MC transition matrix and confidence intervals of the transition probabilities from long sequence data have been intensively studied in the past decades. In next generation sequencing (NGS), a large amount of short reads are generated. These short reads can overlap and some regions of the genome may not be sequenced resulting in a new type of data. Based on NGS data, the transition probabilities of MC can be estimated by moment estimators. However, the classical asymptotic distribution theory for MC transition probability estimators based on long sequences is no longer valid.
METHODS: In this study, we present the asymptotic distributions of several statistics related to MC based on NGS data. We show that, after scaling by the effective coverage d defined in a previous study by the authors, these statistics based on NGS data approximate to the same distributions as the corresponding statistics for long sequences.
RESULTS: We apply the asymptotic properties of these statistics for finding the theoretical confidence regions for MC transition probabilities based on NGS short reads data. We validate our theoretical confidence intervals using both simulated data and real data sets, and compare the results with those by the parametric bootstrap method.
CONCLUSIONS: We find that the asymptotic distributions of these statistics and the theoretical confidence intervals of transition probabilities based on NGS data given in this study are highly accurate, providing a powerful tool for NGS data analysis.

Entities:  

Keywords:  Markov chains; confidence intervals; next generation sequencing; transition probabilities

Year:  2020        PMID: 34262790      PMCID: PMC8277151          DOI: 10.1007/s40484-020-0200-y

Source DB:  PubMed          Journal:  Quant Biol        ISSN: 2095-4689


  19 in total

1.  Normal and compound poisson approximations for pattern occurrences in NGS reads.

Authors:  Zhiyuan Zhai; Gesine Reinert; Kai Song; Michael S Waterman; Yihui Luan; Fengzhu Sun
Journal:  J Comput Biol       Date:  2012-06       Impact factor: 1.479

2.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors:  Daniel R Zerbino; Ewan Birney
Journal:  Genome Res       Date:  2008-03-18       Impact factor: 9.043

3.  Alignment-free sequence comparison based on next-generation sequencing reads.

Authors:  Kai Song; Jie Ren; Zhiyuan Zhai; Xuemei Liu; Minghua Deng; Fengzhu Sun
Journal:  J Comput Biol       Date:  2013-02       Impact factor: 1.479

4.  Modeling genome coverage in single-cell sequencing.

Authors:  Timothy Daley; Andrew D Smith
Journal:  Bioinformatics       Date:  2014-08-08       Impact factor: 6.937

5.  Whole genome amplification of single cells: mathematical analysis of PEP and tagged PCR.

Authors:  F Sun; N Arnheim; M S Waterman
Journal:  Nucleic Acids Res       Date:  1995-08-11       Impact factor: 16.971

6.  Mono- through hexanucleotide composition of the sense strand of yeast DNA: a Markov chain analysis.

Authors:  J Arnold; A J Cuticchia; D A Newsome; W W Jennings; R Ivarie
Journal:  Nucleic Acids Res       Date:  1988-07-25       Impact factor: 16.971

7.  A Markov analysis of DNA sequences.

Authors:  H Almagor
Journal:  J Theor Biol       Date:  1983-10-21       Impact factor: 2.691

8.  Predicting the molecular complexity of sequencing libraries.

Authors:  Timothy Daley; Andrew D Smith
Journal:  Nat Methods       Date:  2013-02-24       Impact factor: 28.547

9.  Detection and removal of biases in the analysis of next-generation sequencing reads.

Authors:  Schraga Schwartz; Ram Oren; Gil Ast
Journal:  PLoS One       Date:  2011-01-31       Impact factor: 3.240

View more
  1 in total

1.  A New Context Tree Inference Algorithm for Variable Length Markov Chain Model with Applications to Biological Sequence Analyses.

Authors:  Shaokun An; Jie Ren; Fengzhu Sun; Lin Wan
Journal:  J Comput Biol       Date:  2022-04-22       Impact factor: 1.549

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.