Literature DB >> 27087700

Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining.

Alfred O Hero1, Bala Rajaratnam2.   

Abstract

When can reliable inference be drawn in fue "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, wifu implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics fue dataset is often variable-rich but sample-starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than fue number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data". Sample complexity however has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address fuis gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where fue variable dimension is fixed and fue sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa cale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables fua t are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. we demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.

Entities:  

Keywords:  Big Data; asymptotic regimes; correlation estimation; correlation mining; correlation screening; correlation selection; graphical models; large scale inference; purely high dimensional; sample complexity; triple asymptotic framework; unifying learning theory

Year:  2015        PMID: 27087700      PMCID: PMC4827453          DOI: 10.1109/JPROC.2015.2494178

Source DB:  PubMed          Journal:  Proc IEEE Inst Electr Electron Eng        ISSN: 0018-9219            Impact factor:   10.961


  26 in total

1.  Discovery of meaningful associations in genomic data using partial correlation coefficients.

Authors:  Alberto de la Fuente; Nan Bing; Ina Hoeschele; Pedro Mendes
Journal:  Bioinformatics       Date:  2004-07-29       Impact factor: 6.937

2.  Inferring statistical complexity.

Authors: 
Journal:  Phys Rev Lett       Date:  1989-07-10       Impact factor: 9.161

Review 3.  Exploring the brain network: a review on resting-state fMRI functional connectivity.

Authors:  Martijn P van den Heuvel; Hilleke E Hulshoff Pol
Journal:  Eur Neuropsychopharmacol       Date:  2010-05-14       Impact factor: 4.600

4.  High throughput screening of co-expressed gene pairs with controlled false discovery rate (FDR) and minimum acceptable strength (MAS).

Authors:  Dongxiao Zhu; Alfred O Hero; Zhaohui S Qin; Anand Swaroop
Journal:  J Comput Biol       Date:  2005-09       Impact factor: 1.479

5.  A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics.

Authors:  Juliane Schäfer; Korbinian Strimmer
Journal:  Stat Appl Genet Mol Biol       Date:  2005-11-14

6.  Comparison of GLR and invariant detectors under structured clutter covariance.

Authors:  H S Kim; A O Hero
Journal:  IEEE Trans Image Process       Date:  2001       Impact factor: 10.856

7.  Localization of brain electrical activity via linearly constrained minimum variance spatial filtering.

Authors:  B D Van Veen; W van Drongelen; M Yuchtman; A Suzuki
Journal:  IEEE Trans Biomed Eng       Date:  1997-09       Impact factor: 4.538

8.  Evaluating the impact of database heterogeneity on observational study results.

Authors:  David Madigan; Patrick B Ryan; Martijn Schuemie; Paul E Stang; J Marc Overhage; Abraham G Hartzema; Marc A Suchard; William DuMouchel; Jesse A Berlin
Journal:  Am J Epidemiol       Date:  2013-05-05       Impact factor: 4.897

9.  Gene signature-based prediction of tumor response to cyclophosphamide.

Authors:  André Korrat; Thomas Greiner; Martina Maurer; Thomas Metz; Heinz-Herbert Fiebig
Journal:  Cancer Genomics Proteomics       Date:  2007 May-Jun       Impact factor: 4.069

10.  Partial Correlation Estimation by Joint Sparse Regression Models.

Authors:  Jie Peng; Pei Wang; Nengfeng Zhou; Ji Zhu
Journal:  J Am Stat Assoc       Date:  2009-06-01       Impact factor: 5.033

View more
  1 in total

1.  Spatial Dynamic Functional Connectivity Analysis Identifies Distinctive Biomarkers in Schizophrenia.

Authors:  Suchita Bhinge; Qunfang Long; Vince D Calhoun; Tülay Adali
Journal:  Front Neurosci       Date:  2019-09-24       Impact factor: 4.677

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.