MOTIVATION: DNA microarray technology typically generates many measurements of which only a relatively small subset is informative for the interpretation of the experiment. To avoid false positive results, it is therefore critical to select the informative genes from the large noisy data before the actual analysis. Most currently available filtering techniques are supervised and therefore suffer from a potential risk of overfitting. The unsupervised filtering techniques, on the other hand, are either not very efficient or too stringent as they may mix up signal with noise. We propose to use the multiple probes measuring the same target mRNA as repeated measures to quantify the signal-to-noise ratio of that specific probe set. A Bayesian factor analysis with specifically chosen prior settings, which models this probe level information, is providing an objective feature filtering technique, named informative/non-informative calls (I/NI calls). RESULTS: Based on 30 real-life data sets (including various human, rat, mice and Arabidopsis studies) and a spiked-in data set, it is shown that I/NI calls is highly effective, with exclusion rates ranging from 70% to 99%. Consequently, it offers a critical solution to the curse of high-dimensionality in the analysis of microarray data. AVAILABILITY: This filtering approach is publicly available as a function implemented in the R package FARMS (www.bioinf.jku.at/software/farms/farms.html).
MOTIVATION: DNA microarray technology typically generates many measurements of which only a relatively small subset is informative for the interpretation of the experiment. To avoid false positive results, it is therefore critical to select the informative genes from the large noisy data before the actual analysis. Most currently available filtering techniques are supervised and therefore suffer from a potential risk of overfitting. The unsupervised filtering techniques, on the other hand, are either not very efficient or too stringent as they may mix up signal with noise. We propose to use the multiple probes measuring the same target mRNA as repeated measures to quantify the signal-to-noise ratio of that specific probe set. A Bayesian factor analysis with specifically chosen prior settings, which models this probe level information, is providing an objective feature filtering technique, named informative/non-informative calls (I/NI calls). RESULTS: Based on 30 real-life data sets (including various human, rat, mice and Arabidopsis studies) and a spiked-in data set, it is shown that I/NI calls is highly effective, with exclusion rates ranging from 70% to 99%. Consequently, it offers a critical solution to the curse of high-dimensionality in the analysis of microarray data. AVAILABILITY: This filtering approach is publicly available as a function implemented in the R package FARMS (www.bioinf.jku.at/software/farms/farms.html).
Authors: K N Kashkin; E A Musatkina; A V Komelkov; E A Tonevitsky; D A Sakharov; T V Vinogradova; E P Kopantsev; M V Zinovyeva; I A Favorskaya; Ya A Kainov; V N Aushev; I B Zborovskaya; A G Tonevitsky; E D Sverdlov Journal: Dokl Biochem Biophys Date: 2011-07-03 Impact factor: 0.788
Authors: K N Kashkin; E A Musatkina; A V Komelkov; D A Sakharov; E V Trushkin; E A Tonevitsky; T V Vinogradova; E P Kopantzev; M V Zinovyeva; O V Kovaleva; K A Arkhipova; I B Zborovskaya; A G Tonevitsky; E D Sverdlov Journal: Dokl Biochem Biophys Date: 2011-05-18 Impact factor: 0.788
Authors: Jessika Adrian; Jessica Chang; Catherine E Ballenger; Bastiaan O R Bargmann; Julien Alassimone; Kelli A Davies; On Sun Lau; Juliana L Matos; Charles Hachez; Amy Lanctot; Anne Vatén; Kenneth D Birnbaum; Dominique C Bergmann Journal: Dev Cell Date: 2015-04-06 Impact factor: 12.270
Authors: R Auvergne; C Wu; A Connell; S Au; A Cornwell; M Osipovitch; A Benraiss; S Dangelmajer; H Guerrero-Cazares; A Quinones-Hinojosa; S A Goldman Journal: Oncogene Date: 2015-11-30 Impact factor: 9.867
Authors: Sepp Hochreiter; Ulrich Bodenhofer; Martin Heusel; Andreas Mayr; Andreas Mitterecker; Adetayo Kasim; Tatsiana Khamiakova; Suzy Van Sanden; Dan Lin; Willem Talloen; Luc Bijnens; Hinrich W H Göhlmann; Ziv Shkedy; Djork-Arné Clevert Journal: Bioinformatics Date: 2010-04-23 Impact factor: 6.937
Authors: R Schachtner; D Lutter; P Knollmüller; A M Tomé; F J Theis; G Schmitz; M Stetter; P Gómez Vilda; E W Lang Journal: Bioinformatics Date: 2008-06-05 Impact factor: 6.937