Marc Kirchner1, Buote Xu, Hanno Steen, Judith A J Steen. 1. Proteomics Center, Department of Pathology, Children's Hospital Boston, Harvard Medical School, Boston, MA 02115, USA. marc.kirchner@childrens.harvard.edu
Abstract
MOTIVATION: Algorithms for sparse data require fast search and subset selection capabilities for the determination of point neighborhoods. A natural data representation for such cases are space partitioning data structures. However, the associated range queries assume noise-free observations and cannot take into account observation-specific uncertainty estimates that are present in e.g. modern mass spectrometry data. In order to accommodate the inhomogeneous noise characteristics of sparse real-world datasets, point queries need to be reformulated in terms of box intersection queries, where box sizes correspond to uncertainty regions for each observation. RESULTS: This contribution introduces libfbi, a standard C++, header-only template implementation for fast box intersection in an arbitrary number of dimensions, with arbitrary data types in each dimension. The implementation is applied to a data aggregation task on state-of-the-art liquid chromatography/mass spectrometry data, where it shows excellent run time properties. AVAILABILITY: The library is available under an MIT license and can be downloaded from http://software.steenlab.org/libfbi. CONTACT: marc.kirchner@childrens.harvard.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Algorithms for sparse data require fast search and subset selection capabilities for the determination of point neighborhoods. A natural data representation for such cases are space partitioning data structures. However, the associated range queries assume noise-free observations and cannot take into account observation-specific uncertainty estimates that are present in e.g. modern mass spectrometry data. In order to accommodate the inhomogeneous noise characteristics of sparse real-world datasets, point queries need to be reformulated in terms of box intersection queries, where box sizes correspond to uncertainty regions for each observation. RESULTS: This contribution introduces libfbi, a standard C++, header-only template implementation for fast box intersection in an arbitrary number of dimensions, with arbitrary data types in each dimension. The implementation is applied to a data aggregation task on state-of-the-art liquid chromatography/mass spectrometry data, where it shows excellent run time properties. AVAILABILITY: The library is available under an MIT license and can be downloaded from http://software.steenlab.org/libfbi. CONTACT: marc.kirchner@childrens.harvard.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Robert Lindner; Xinghua Lou; Jochen Reinstein; Robert L Shoeman; Fred A Hamprecht; Andreas Winkler Journal: J Am Soc Mass Spectrom Date: 2014-03-28 Impact factor: 3.109
Authors: Dong L Tong; David J Boocock; Clare Coveney; Jaimy Saif; Susana G Gomez; Sergio Querol; Robert Rees; Graham R Ball Journal: Clin Proteomics Date: 2011-09-19 Impact factor: 3.988