Literature DB >> 19222391

Hypothesis tests for point-mass mixture data with application to 'omics data with many zero values.

Sandra Taylor1, Katherine Pollard.   

Abstract

Data composed of a continuous component plus a point-mass frequently arises in genomic studies. The distribution of this type of data is characterized by the proportion of observations in the point mass and the distribution of the continuous component. Standard statistical methods focus on one of these effects at a time and can fail to detect differences between experimental groups. We propose a novel empirical likelihood ratio test (LRT) statistic for simultaneously testing the null hypothesis of no difference in point-mass proportions and no difference in means of the continuous component. This study evaluates the performance of the empirical LRT and three existing point-mass mixture statistics: 1) Two-part statistic with a t-test for testing mean differences (Two-part t), 2) Two-part statistic with Wilcoxon test for testing mean differences (Two-part W), and 3) parametric LRT. Our investigations begin with an analysis of metabolomics data from Arabidopsis thaliana, which contains many metabolites with a large proportion of observed concentrations in a point-mass at zero. All four point-mass mixture statistics identify more significant differences than standard t-tests and Wilcoxon tests. The empirical LRT appears particularly effective. These findings motivate a large simulation study that assesses Type I and Type II error of the four test statistics with various choices of null distribution. The parametric LRT is frequently the most powerful test, as long as the model assumptions are correct. As is common in 'omics data, the Arabidopsis metabolites have widely varying concentration distributions. A single parametric distribution cannot effectively represent all of these distributions, and individually selecting the optimal parametric distribution to use in the LRT for each metabolite is not practical. The empirical LRT, which does not require parametric assumptions, provides an attractive alternative to parametric and standard methods.

Entities:  

Mesh:

Year:  2009        PMID: 19222391     DOI: 10.2202/1544-6115.1425

Source DB:  PubMed          Journal:  Stat Appl Genet Mol Biol        ISSN: 1544-6115


  15 in total

1.  Using Cox regression to develop linear rank tests with zero-inflated clustered data.

Authors:  Stuart R Lipsitz; Garrett M Fitzmaurice; Debajyoti Sinha; Alexander P Cole; Christian P Meyer; Quoc-Dien Trinh
Journal:  J R Stat Soc Ser C Appl Stat       Date:  2020-02-03       Impact factor: 1.864

2.  Multivariate two-part statistics for analysis of correlated mass spectrometry data from multiple biological specimens.

Authors:  Sandra L Taylor; L Renee Ruhaak; Robert H Weiss; Karen Kelly; Kyoungmi Kim
Journal:  Bioinformatics       Date:  2016-09-04       Impact factor: 6.937

3.  Spatiotemporal Gradient of Cortical Neuron Death Contributes to Microcephaly in Knock-In Mouse Model of Ligase 4 Syndrome.

Authors:  Melody P Lun; Morgan L Shannon; Sevgi Keles; Ismail Reisli; Nicole Luche; Douglas Ryan; Kelly Capuder; Luigi D Notarangelo; Maria K Lehtinen
Journal:  Am J Pathol       Date:  2019-09-18       Impact factor: 4.307

4.  Understanding the evolution of defense metabolites in Arabidopsis thaliana using genome-wide association mapping.

Authors:  Eva K F Chan; Heather C Rowe; Daniel J Kliebenstein
Journal:  Genetics       Date:  2009-09-07       Impact factor: 4.562

5.  Accounting for undetected compounds in statistical analyses of mass spectrometry 'omic studies.

Authors:  Sandra L Taylor; Gary S Leiserowitz; Kyoungmi Kim
Journal:  Stat Appl Genet Mol Biol       Date:  2013-12

6.  Application of two-part statistics for comparison of sequence variant counts.

Authors:  Brandie D Wagner; Charles E Robertson; J Kirk Harris
Journal:  PLoS One       Date:  2011-05-23       Impact factor: 3.240

7.  Addressing the challenge of defining valid proteomic biomarkers and classifiers.

Authors:  Mohammed Dakna; Keith Harris; Alexandros Kalousis; Sebastien Carpentier; Walter Kolch; Joost P Schanstra; Marion Haubitz; Antonia Vlahou; Harald Mischak; Mark Girolami
Journal:  BMC Bioinformatics       Date:  2010-12-10       Impact factor: 3.169

8.  Comparison of imputation and imputation-free methods for statistical analysis of mass spectrometry data with missing data.

Authors:  Sandra Taylor; Matthew Ponzini; Machelle Wilson; Kyoungmi Kim
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 13.994

9.  Metabolomics variable selection and classification in the presence of observations below the detection limit using an extension of ERp.

Authors:  Mari van Reenen; Johan A Westerhuis; Carolus J Reinecke; J Hendrik Venter
Journal:  BMC Bioinformatics       Date:  2017-02-02       Impact factor: 3.169

10.  A capillary electrophoresis coupled to mass spectrometry pipeline for long term comparable assessment of the urinary metabolome.

Authors:  Franck Boizard; Valérie Brunchault; Panagiotis Moulos; Benjamin Breuil; Julie Klein; Nadia Lounis; Cécile Caubet; Stéphanie Tellier; Jean-Loup Bascands; Stéphane Decramer; Joost P Schanstra; Bénédicte Buffin-Meyer
Journal:  Sci Rep       Date:  2016-10-03       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.