Literature DB >> 34206624

Categorical Exploratory Data Analysis: From Multiclass Classification and Response Manifold Analytics Perspectives of Baseball Pitching Dynamics.

Fushing Hsieh1, Elizabeth P Chou2.   

Abstract

All features of any data type are universally equipped with categorical nature revealed through histograms. A contingency table framed by two histograms affords directional and mutual associations based on rescaled conditional Shannon entropies for any feature-pair. The heatmap of the mutual association matrix of all features becomes a roadmap showing which features are highly associative with which features. We develop our data analysis paradigm called categorical exploratory data analysis (CEDA) with this heatmap as a foundation. CEDA is demonstrated to provide new resolutions for two topics: multiclass classification (MCC) with one single categorical response variable and response manifold analytics (RMA) with multiple response variables. We compute visible and explainable information contents with multiscale and heterogeneous deterministic and stochastic structures in both topics. MCC involves all feature-group specific mixing geometries of labeled high-dimensional point-clouds. Upon each identified feature-group, we devise an indirect distance measure, a robust label embedding tree (LET), and a series of tree-based binary competitions to discover and present asymmetric mixing geometries. Then, a chain of complementary feature-groups offers a collection of mixing geometric pattern-categories with multiple perspective views. RMA studies a system's regulating principles via multiple dimensional manifolds jointly constituted by targeted multiple response features and selected major covariate features. This manifold is marked with categorical localities reflecting major effects. Diverse minor effects are checked and identified across all localities for heterogeneity. Both MCC and RMA information contents are computed for data's information content with predictive inferences as by-products. We illustrate CEDA developments via Iris data and demonstrate its applications on data taken from the PITCHf/x database.

Entities:  

Keywords:  PITCHf/x; categorical exploratory data analysis; multiclass classification

Year:  2021        PMID: 34206624     DOI: 10.3390/e23070792

Source DB:  PubMed          Journal:  Entropy (Basel)        ISSN: 1099-4300            Impact factor:   2.524


  3 in total

1.  More is different.

Authors:  P W Anderson
Journal:  Science       Date:  1972-08-04       Impact factor: 47.728

2.  Complexity of possibly gapped histogram and analysis of histogram.

Authors:  Hsieh Fushing; Tania Roy
Journal:  R Soc Open Sci       Date:  2018-02-28       Impact factor: 2.963

3.  From patterned response dependency to structured covariate dependency: Entropy based categorical-pattern-matching.

Authors:  Hsieh Fushing; Shan-Yu Liu; Yin-Chen Hsieh; Brenda McCowan
Journal:  PLoS One       Date:  2018-06-14       Impact factor: 3.240

  3 in total
  2 in total

1.  Unraveling Hidden Major Factors by Breaking Heterogeneity into Homogeneous Parts within Many-System Problems.

Authors:  Elizabeth P Chou; Ting-Li Chen; Hsieh Fushing
Journal:  Entropy (Basel)       Date:  2022-01-24       Impact factor: 2.524

2.  Categorical Nature of Major Factor Selection via Information Theoretic Measurements.

Authors:  Ting-Li Chen; Elizabeth P Chou; Hsieh Fushing
Journal:  Entropy (Basel)       Date:  2021-12-15       Impact factor: 2.524

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.