MOTIVATION: Sampling the conformational space of biological macromolecules generates large sets of data with considerable complexity. Data-mining techniques, such as clustering, can extract meaningful information. Among them, the self-organizing maps (SOMs) algorithm has shown great promise; in particular since its computation time rises only linearly with the size of the data set. Whereas SOMs are generally used with few neurons, we investigate here their behavior with large numbers of neurons. RESULTS: We present here a python library implementing the full SOM analysis workflow. Large SOMs can readily be applied on heavy data sets. Coupled with visualization tools they have very interesting properties. Descriptors for each conformation of a trajectory are calculated and mapped onto a 3D landscape, the U-matrix, reporting the distance between neighboring neurons. To delineate clusters, we developed the flooding algorithm, which hierarchically identifies local basins of the U-matrix from the global minimum to the maximum. AVAILABILITY AND IMPLEMENTATION: The python implementation of the SOM library is freely available on github: https://github.com/bougui505/SOM. CONTACT: michael.nilges@pasteur.fr or guillaume.bouvier@pasteur.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Sampling the conformational space of biological macromolecules generates large sets of data with considerable complexity. Data-mining techniques, such as clustering, can extract meaningful information. Among them, the self-organizing maps (SOMs) algorithm has shown great promise; in particular since its computation time rises only linearly with the size of the data set. Whereas SOMs are generally used with few neurons, we investigate here their behavior with large numbers of neurons. RESULTS: We present here a python library implementing the full SOM analysis workflow. Large SOMs can readily be applied on heavy data sets. Coupled with visualization tools they have very interesting properties. Descriptors for each conformation of a trajectory are calculated and mapped onto a 3D landscape, the U-matrix, reporting the distance between neighboring neurons. To delineate clusters, we developed the flooding algorithm, which hierarchically identifies local basins of the U-matrix from the global minimum to the maximum. AVAILABILITY AND IMPLEMENTATION: The python implementation of the SOM library is freely available on github: https://github.com/bougui505/SOM. CONTACT: michael.nilges@pasteur.fr or guillaume.bouvier@pasteur.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Mathias Ferber; Jan Kosinski; Alessandro Ori; Umar J Rashid; María Moreno-Morcillo; Bernd Simon; Guillaume Bouvier; Paulo Ricardo Batista; Christoph W Müller; Martin Beck; Michael Nilges Journal: Nat Methods Date: 2016-04-25 Impact factor: 28.547
Authors: Andrea Cassioli; Benjamin Bardiaux; Guillaume Bouvier; Antonio Mucherino; Rafael Alves; Leo Liberti; Michael Nilges; Carlile Lavor; Thérèse E Malliavin Journal: BMC Bioinformatics Date: 2015-01-28 Impact factor: 3.169
Authors: Emmanuel Bresso; Diana Fernandez; Deisy X Amora; Philippe Noel; Anne-Sophie Petitot; Maria-Eugênia Lisei de Sa; Erika V S Albuquerque; Etienne G J Danchin; Bernard Maigret; Natália F Martins Journal: Molecules Date: 2019-10-22 Impact factor: 4.411