MOTIVATION: Matching both the retention index (RI) and the mass spectrum of an unknown compound against a mass spectral reference library provides strong evidence for a correct identification of that compound. Data on retention indices are, however, available for only a small fraction of the compounds in such libraries. We propose a quantitative structure-RI model that enables the ranking and filtering of putative identifications of compounds for which the predicted RI falls outside a predefined window. RESULTS: We constructed multiple linear regression and support vector regression (SVR) models using a set of descriptors obtained with a genetic algorithm as variable selection method. The SVR model is a significant improvement over previous models built for structurally diverse compounds as it covers a large range (360-4100) of RI values and gives better prediction of isomer compounds. The hit list reduction varied from 41% to 60% and depended on the size of the original hit list. Large hit lists were reduced to a greater extend compared with small hit lists. AVAILABILITY: http://appliedbioinformatics.wur.nl/GC-MS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Matching both the retention index (RI) and the mass spectrum of an unknown compound against a mass spectral reference library provides strong evidence for a correct identification of that compound. Data on retention indices are, however, available for only a small fraction of the compounds in such libraries. We propose a quantitative structure-RI model that enables the ranking and filtering of putative identifications of compounds for which the predicted RI falls outside a predefined window. RESULTS: We constructed multiple linear regression and support vector regression (SVR) models using a set of descriptors obtained with a genetic algorithm as variable selection method. The SVR model is a significant improvement over previous models built for structurally diverse compounds as it covers a large range (360-4100) of RI values and gives better prediction of isomer compounds. The hit list reduction varied from 41% to 60% and depended on the size of the original hit list. Large hit lists were reduced to a greater extend compared with small hit lists. AVAILABILITY: http://appliedbioinformatics.wur.nl/GC-MS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Lochana C Menikarachchi; Shannon Cawley; Dennis W Hill; L Mark Hall; Lowell Hall; Steven Lai; Janine Wilder; David F Grant Journal: Anal Chem Date: 2012-10-23 Impact factor: 6.986
Authors: Jan Krumsiek; Karsten Suhre; Anne M Evans; Matthew W Mitchell; Robert P Mohney; Michael V Milburn; Brigitte Wägele; Werner Römisch-Margl; Thomas Illig; Jerzy Adamski; Christian Gieger; Fabian J Theis; Gabi Kastenmüller Journal: PLoS Genet Date: 2012-10-18 Impact factor: 5.917
Authors: Noel M O'Boyle; Michael Banck; Craig A James; Chris Morley; Tim Vandermeersch; Geoffrey R Hutchison Journal: J Cheminform Date: 2011-10-07 Impact factor: 5.514