Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining.

Literature DB >> 27087700

Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining.

Abstract

When can reliable inference be drawn in fue "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, wifu implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics fue dataset is often variable-rich but sample-starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than fue number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data". Sample complexity however has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address fuis gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where fue variable dimension is fixed and fue sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa cale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables fua t are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. we demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.

Entities: Chemical Disease Gene Species

Keywords: Big Data; asymptotic regimes; correlation estimation; correlation mining; correlation screening; correlation selection; graphical models; large scale inference; purely high dimensional; sample complexity; triple asymptotic framework; unifying learning theory

Year: 2015 PMID： 27087700 PMCID： PMC4827453 DOI： 10.1109/JPROC.2015.2494178

Source DB: PubMed Journal: Proc IEEE Inst Electr Electron Eng ISSN： 0018-9219 Impact factor: 10.961

26 in total

1. Discovery of meaningful associations in genomic data using partial correlation coefficients.

Authors: Alberto de la Fuente; Nan Bing; Ina Hoeschele; Pedro Mendes
Journal: Bioinformatics Date: 2004-07-29 Impact factor: 6.937

2. Inferring statistical complexity.

Authors:
Journal: Phys Rev Lett Date: 1989-07-10 Impact factor: 9.161

Review 3. Exploring the brain network: a review on resting-state fMRI functional connectivity.

Authors: Martijn P van den Heuvel; Hilleke E Hulshoff Pol
Journal: Eur Neuropsychopharmacol Date: 2010-05-14 Impact factor: 4.600

4. High throughput screening of co-expressed gene pairs with controlled false discovery rate (FDR) and minimum acceptable strength (MAS).

Authors: Dongxiao Zhu; Alfred O Hero; Zhaohui S Qin; Anand Swaroop
Journal: J Comput Biol Date: 2005-09 Impact factor: 1.479

5. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics.

Authors: Juliane Schäfer; Korbinian Strimmer
Journal: Stat Appl Genet Mol Biol Date: 2005-11-14

6. Comparison of GLR and invariant detectors under structured clutter covariance.

Authors: H S Kim; A O Hero
Journal: IEEE Trans Image Process Date: 2001 Impact factor: 10.856

7. Localization of brain electrical activity via linearly constrained minimum variance spatial filtering.

Authors: B D Van Veen; W van Drongelen; M Yuchtman; A Suzuki
Journal: IEEE Trans Biomed Eng Date: 1997-09 Impact factor: 4.538

8. Evaluating the impact of database heterogeneity on observational study results.

Authors: David Madigan; Patrick B Ryan; Martijn Schuemie; Paul E Stang; J Marc Overhage; Abraham G Hartzema; Marc A Suchard; William DuMouchel; Jesse A Berlin
Journal: Am J Epidemiol Date: 2013-05-05 Impact factor: 4.897

9. Gene signature-based prediction of tumor response to cyclophosphamide.

Authors: André Korrat; Thomas Greiner; Martina Maurer; Thomas Metz; Heinz-Herbert Fiebig
Journal: Cancer Genomics Proteomics Date: 2007 May-Jun Impact factor: 4.069

10. Partial Correlation Estimation by Joint Sparse Regression Models.

Authors: Jie Peng; Pei Wang; Nengfeng Zhou; Ji Zhu
Journal: J Am Stat Assoc Date: 2009-06-01 Impact factor: 5.033

1 in total

1. Spatial Dynamic Functional Connectivity Analysis Identifies Distinctive Biomarkers in Schizophrenia.

Authors: Suchita Bhinge; Qunfang Long; Vince D Calhoun; Tülay Adali
Journal: Front Neurosci Date: 2019-09-24 Impact factor: 4.677

1 in total