| Literature DB >> 20150997 |
Tobias Kind1, Tim Leamy, Julie A Leary, Oliver Fiehn.
Abstract
BACKGROUND: Modern chemistry laboratories operate with a wide range of software applications under different operating systems, such as Windows, LINUX or Mac OS X. Instead of installing software on different computers it is possible to install those applications on a single computer using Virtual Machine software. Software platform virtualization allows a single guest operating system to execute multiple other operating systems on the same computer. We apply and discuss the use of virtual machines in chemistry research and teaching laboratories.Entities:
Year: 2009 PMID: 20150997 PMCID: PMC2820496 DOI: 10.1186/1758-2946-1-18
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Figure 1Number of scientific papers and citations about virtualization and virtual machines. Source ISI Web of Science January 2009.
Figure 2The virtual machine software installed on a host operating system allows the use of different operating systems on a single computer system. A Macintosh system could run native Windows or LINUX software or even multiple instances of the same operating system. All virtual machines can communicate with each other and are allowed to use all hardware computer resources such as graphic cards, DVD drives and USB ports. (Logo sources: Wikipedia, TUX mascot: Larry Ewing).
List of common desktop virtual machines for Windows, LINUX and Mac OS X operating systems.
| Host OS | Virtualization Software | WINDOWS as Guest OS | LINUX as Guest OS | Mac OS X as Guest OS |
|---|---|---|---|---|
| Windows OS | ||||
| VMware Workstation | Yes | yes | no | |
| Microsoft Virtual PC | Yes | yes | no | |
| SUN Virtual BOX | Yes | yes | no | |
| LINUX OS | ||||
| VMWare | Yes | yes | no | |
| Citrix XEN | Yes | yes | no | |
| Virtual Iron | Yes | yes | no | |
| MAC OS | ||||
| VMWare Fusion | Yes | yes | yes* | |
| Parallels Server | Yes | yes | yes* |
Mac OS X can in principle run on any host, but it is not officially supported. A star (*) denotes license issues (January 2009).
Figure 3A Windows Vista host using Sun's VirtualBox runs three UBUNTU Linuxes, one WIN XP, one Windows VISTA and one Windows Server guest operating system simultaneously. The hardware is an Intel Nehalem Core i7 950 quad core CPU (3 GHz) with 12 GByte RAM and 4 hard disks in RAID10. The system virtualizes a total number of 41 CPUs.
Figure 4Server consolidation: A single powerful computer runs multiple virtual machines and serves as compute server, backup server and web server. Such a setup improves maintenance efficiency and reduces hardware costs. The right picture shows a production server with a XEN Virtual Machine Monitor and 17 independent running systems (Actual VM names were replaced; Picture source: Zhi-Wei Lu; UC Davis Genome Center Bioinformatics Core).
List of system statistics and micro-benchmarks comparing an original Windows XP performance and Windows XP inside a virtual machine (Guest OS).
| ID | Task | WINDOWS XP Host | WINDOWS XP | of Guest VM |
|---|---|---|---|---|
| System benchmarks | ||||
| 1 | Operating system start time | 2 min | 1 min | 50% less time |
| 2 | Size of windows system folder | 6.95 GByte | 3.01 GByte | 57% less space |
| 3 | RAM memory requirement (IDLE) | 760 MByte | 150 MByte | 80% less RAM |
| 4 | Average hard disk transfer rate | 180 MByte/sec | 127 MByte/sec | 70% |
|
| ||||
| 5 | NIST SciMark 2.0a (JAVA 1.6 Server) | score of 661 | score of 621 | 94% |
| 6 | Molgen Demo - count all23862255 isomers of C12H12 | 42.23 sec | 46.20 sec | 91% |
| 7 | CDK Descriptor GUI -- Kier & Hall SMARTS for all C8H16O2 isomers | 100 sec | 95 sec | 95% |
| 8 | Seven Golden Rules -- generate all 28008691 formulas below 1000 Da | 42 sec | 42 sec | 100% |
|
| ||||
| 9 | ChemAxon Marvin - calculate all stereoisomers of C8H16O2 | 21 sec | 42 sec | 50% |
| 10 | MZMine2 -- chromatographic alignment of LC-MS runs | 70 sec | 130 sec | 54% |
Compared are an aged 2 year old Windows XP (Host OS) and a clean installed Windows XP system (Guest OS) on Microsoft Virtual PC 2007 on a Dual Opteron 254 (2.8 GHz).
Cheminformatics and mass spectrometry software course as part of an experimental mass spectrometry class, some of the software was deployed using WIN XP virtual machines in the computer laboratory.
| General course | Topics covered |
|---|---|
| General Introduction | Fighting computer illiteracy -- bits, bytes, CPUs |
| Regular expressions as emergency helpers | |
| Structures -- resonance forms, stereoisomers, tautomers | |
| Mass spectrometry publications via Yahoo Pipes | |
|
| Mass spectral data formats and conversion of mass spectra |
| Open exchange formats for mass spectra (mzData, mzXML, JCAMP-DX, netCDF) | |
| Structure handling software and structure conversion (SMILES/SMARTS, SDF/MOL, InChI/InChIKey, PDB, CML) | |
| Chemical structure handling (Instant-JChem, BioClipse) | |
|
| Mass spectral databases (EI, ESI, APCI) and search algorithms (PBM, dot product, mass spectral trees) and library conversion |
| Proteomics data analysis (database search, de-novo sequencing, hybrid methods) | |
| Molecule search (exact search, substructure search, similarity search, Markush search) | |
| Databases (PubChem, SciFinder, Beilstein, BlueObelisk) | |
|
| Resolving power, mass accuracy, isotopic pattern, charge states, charge state deconvolution |
| Molecular formula space of small molecules | |
| Isotopic abundances as orthogonal filter for elemental compositions | |
| Molecular Isomer Generators, substructure predictions, simulation of mass spectra | |
|
| Automatic peak detection |
| Peak picking and mass spectral deconvolution | |
| Comprehensive GCxGC-TOF-MS | |
|
| Deconvolution and evaluation of LC-MS data |
| Adduct removal and detection during ESI-LC-MS | |
| Seven Golden Rules for generation of possible molecular formulas | |
| Structural isomer lookup example in ChemSpider | |
|
| Dendral - Artificial intelligence and mass spectrometry |
| Prediction of the isomer substructures from a given mass spectrum | |
| Simulation of mass spectra from given isomer structures |
Figure 5The Windows Vista Ultimate host with Sun's VirtualBox virtualizes an Ubuntu Linux system with 32 CPU threads (left side) and a Windows Server system with 10 CPU threads (right side). The guest hardware is a quad core Nehalem Core i7 950 CPU with only 8 threads. Both guest systems work without problem, but fully exhaust all underlying hardware resources when all parallel threads are in use.
Figure 6Hands-on labs: Virtual machines are used for teaching cheminformatics and mass spectrometry software classes. The hands-on class provides everybody with the same software and setups hence avoids installation and settings problems. All required software is installed and tested on a single virtual machine and this software image is later deployed to all computers in the class room. Right picture: Screenshot of the teaching VM with WIN XP and the AMDIS and MarvinView software running.
Figure 7Popularity of cheminformatics vs. bioinformatics based on site specific Google hit counts across 325 universities (US) with research chemistry faculty. For all 325 universities a site specific search on Google was performed and mapped on the graph, i.e. cheminformatics site:berkeley.edu returned 93 hits and bioinformatics site:berkeley.edu returned 3620 hits. Because UC Berkeley hosts more bioinformatics related material it is safe to assume that bioinformatics is more popular than cheminformatics at UC Berkeley. Around 100 universities had no occurrence of the words cheminformatics or chemoinformatics on their global university websites (scores combined); Search date: August 2009.