Literature DB >> 21644644

Outlier detection in multivariate analytical chemical data.

W J Egan1, S L Morgan.   

Abstract

The unreliability of multivariate outlier detection techniques such as Mahalanobis distance and hat matrix leverage has been known in the statistical community for well over a decade. However, only within the past few years has a serious effort been made to introduce robust methods for the detection of multivariate outliers into the chemical literature. Techniques such as the minimum volume ellipsoid (MVE), multivariate trimming (MVT), and M-estimators (e.g., PROP), and others similar to them, such as the minimum covariance determinant (MCD), rely upon algorithms that are difficult to program and may require significant processing times. While MCD and MVE have been shown to be statistically sound, we found MVT unreliable due to the method's use of the Mahalanobis distance measure in its initial step. We examined the performance of MCD and MVT on selected data sets and in simulations and compared the results with two methods of our own devising. Both the proposed resampling by the half-means method and the smallest half-volume method are simple to use, are conceptually clear, and provide results superior to MVT and the current best-performing technique, MCD. Either proposed method is recommended for the detection of multiple outliers in multivariate data.

Entities:  

Year:  1998        PMID: 21644644     DOI: 10.1021/ac970763d

Source DB:  PubMed          Journal:  Anal Chem        ISSN: 0003-2700            Impact factor:   6.986


  7 in total

1.  Principal components analysis of protein structure ensembles calculated using NMR data.

Authors:  P W Howe
Journal:  J Biomol NMR       Date:  2001-05       Impact factor: 2.835

2.  ANVAS: artificial neural variables adaptation system for descriptor selection.

Authors:  Paolo Mazzatorta; Marjan Vracko; Emilio Benfenati
Journal:  J Comput Aided Mol Des       Date:  2003 May-Jun       Impact factor: 3.686

3.  Prediction of PKCθ inhibitory activity using the Random Forest Algorithm.

Authors:  Ming Hao; Yan Li; Yonghua Wang; Shuwei Zhang
Journal:  Int J Mol Sci       Date:  2010-09-20       Impact factor: 5.923

4.  Detection of gastric cancer with Fourier transform infrared spectroscopy and support vector machine classification.

Authors:  Qingbo Li; Wei Wang; Xiaofeng Ling; Jin Guang Wu
Journal:  Biomed Res Int       Date:  2013-08-13       Impact factor: 3.411

5.  Automated multigroup outlier identification in molecular high-throughput data using bagplots and gemplots.

Authors:  Jochen Kruppa; Klaus Jung
Journal:  BMC Bioinformatics       Date:  2017-05-02       Impact factor: 3.169

6.  Robust methods for population stratification in genome wide association studies.

Authors:  Li Liu; Donghui Zhang; Hong Liu; Christopher Arendt
Journal:  BMC Bioinformatics       Date:  2013-04-19       Impact factor: 3.169

7.  Insight into the Structural Determinants of Imidazole Scaffold-Based Derivatives as TNF-α Release Inhibitors by in Silico Explorations.

Authors:  Yuan Wang; Mingwei Wu; Chunzhi Ai; Yonghua Wang
Journal:  Int J Mol Sci       Date:  2015-08-25       Impact factor: 5.923

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.