| Literature DB >> 33789571 |
Mohammad Tabatabai1, Stephanie Bailey2, Zoran Bursac3, Habib Tabatabai4, Derek Wilus2, Karan P Singh5.
Abstract
BACKGROUND: The most common measure of association between two continuous variables is the Pearson correlation (Maronna et al. in Safari an OMC. Robust statistics, 2019. https://login.proxy.bib.uottawa.ca/login?url=https://learning.oreilly.com/library/view/-/9781119214687/?ar&orpq&email=^u). When outliers are present, Pearson does not accurately measure association and robust measures are needed. This article introduces three new robust measures of correlation: Taba (T), TabWil (TW), and TabWil rank (TWR). The correlation estimators T and TW measure a linear association between two continuous or ordinal variables; whereas TWR measures a monotonic association. The robustness of these proposed measures in comparison with Pearson (P), Spearman (S), Quadrant (Q), Median (M), and Minimum Covariance Determinant (MCD) are examined through simulation. Taba distance is used to analyze genes, and statistical tests were used to identify those genes most significantly associated with Williams Syndrome (WS).Entities:
Keywords: Dissimilarity measures; Gene expression; Median correlation; Minimum covariance determinant correlation; Pearson correlation; Quadrant correlation; Spearman correlation; Williams syndrome
Mesh:
Year: 2021 PMID: 33789571 PMCID: PMC8011137 DOI: 10.1186/s12859-021-04098-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Graph of using Wolfram Mathematica software version 12.1
Fig. 2Graph of TabWil Correlation using Wolfram Mathematica software version 12.1
Fig. 3Frequency of lowest measurement for simulated data stratified by sample size using IBM SPSS software version 27
Fig. 4Frequency of lowest measurement for simulated data stratified by simulated value of correlation using IBM SPSS software version 27
Fig. 5Frequency of lowest measurement for simulated data stratified by contamination level using IBM SPSS software version 27
Fig. 6Frequency of lowest measurement for simulated data using IBM SPSS software version 27
Fig. 7Heatmap of all 13,909 genes generated using RStudio software version 1.3.1073
Fig. 8Ordered Forest Plot of 43 Genes (P value < 0.005) generated using RStudio software version 1.3.1073
Fig. 9Clustered Heatmap of Genes with P values < 0.005 generated using RStudio software version 1.3.1073