Lin Wang1,2, Xi Xing1,2, Li Chen1,2, Lifeng Yang1,2, Xiaoyang Su1,3, Herschel Rabitz2, Wenyun Lu1,2, Joshua D Rabinowitz1,2. 1. Lewis Sigler Institute for Integrative Genomics , Princeton University , Princeton , New Jersey 08544 , United States. 2. Department of Chemistry , Princeton University , Princeton , New Jersey 08544 , United States. 3. Department of Medicine, Robert Wood Johnson Medical School , Rutgers University , New Brunswick , New Jersey 08904 , United States.
Abstract
Untargeted metabolomics can detect more than 10 000 peaks in a single LC-MS run. The correspondence between these peaks and metabolites, however, remains unclear. Here, we introduce a Peak Annotation and Verification Engine (PAVE) for annotating untargeted microbial metabolomics data. The workflow involves growing cells in 13C and 15N isotope-labeled media to identify peaks from biological compounds and their carbon and nitrogen atom counts. Improved deisotoping and deadducting are enabled by algorithms that integrate positive mode, negative mode, and labeling data. To distinguish metabolites and their fragments, PAVE experimentally measures the response of each peak to weak in-source collision induced dissociation, which increases the peak intensity for fragments while decreasing it for their parent ions. The molecular formulas of the putative metabolites are then assigned based on database searching using both m/ z and C/N atom counts. Application of this procedure to Saccharomyces cerevisiae and Escherichia coli revealed that more than 80% of peaks do not label, i.e., are environmental contaminants. More than 70% of the biological peaks are isotopic variants, adducts, fragments, or mass spectrometry artifacts yielding ∼2000 apparent metabolites across the two organisms. About 650 match to a known metabolite formula based on m/ z and C/N atom counts, with 220 assigned structures based on MS/MS and/or retention time to match to authenticated standards. Thus, PAVE enables systematic annotation of LC-MS metabolomics data with only ∼4% of peaks annotated as apparent metabolites.
Untargeted metabolomics can detect more than 10 000 peaks in a single LC-MS run. The correspondence between these peaks and metabolites, however, remains unclear. Here, we introduce a Peak Annotation and Verification Engine (PAVE) for annotating untargeted n class="Chemical">microbial metabolomics data. The workflow involves growing cells in 13C and 15Nisotope-labeled media to identify peaks from biological compounds and their carbon and nitrogen atom counts. Improved deisotoping and deadducting are enabled by algorithms that integrate positive mode, negative mode, and labeling data. To distinguish metabolites and their fragments, PAVE experimentally measures the response of each peak to weak in-source collision induced dissociation, which increases the peak intensity for fragments while decreasing it for their parent ions. The molecular formulas of the putative metabolites are then assigned based on database searching using both m/ z and C/N atom counts. Application of this procedure to Saccharomyces cerevisiae and Escherichia coli revealed that more than 80% of peaks do not label, i.e., are environmental contaminants. More than 70% of the biological peaks are isotopic variants, adducts, fragments, or mass spectrometry artifacts yielding ∼2000 apparent metabolites across the two organisms. About 650 match to a known metabolite formula based on m/ z and C/N atom counts, with 220 assigned structures based on MS/MS and/or retention time to match to authenticated standards. Thus, PAVE enables systematic annotation of LC-MS metabolomics data with only ∼4% of peaks annotated as apparent metabolites.
Authors: Adrian D Hegeman; Christopher F Schulte; Qiu Cui; Ian A Lewis; Edward L Huttlin; Hamid Eghbalnia; Amy C Harms; Eldon L Ulrich; John L Markley; Michael R Sussman Journal: Anal Chem Date: 2007-08-21 Impact factor: 6.986
Authors: Bryson D Bennett; Elizabeth H Kimball; Melissa Gao; Robin Osterhout; Stephen J Van Dien; Joshua D Rabinowitz Journal: Nat Chem Biol Date: 2009-06-28 Impact factor: 15.040
Authors: David S Wishart; Dan Tzur; Craig Knox; Roman Eisner; An Chi Guo; Nelson Young; Dean Cheng; Kevin Jewell; David Arndt; Summit Sawhney; Chris Fung; Lisa Nikolai; Mike Lewis; Marie-Aude Coutouly; Ian Forsythe; Peter Tang; Savita Shrivastava; Kevin Jeroncic; Paul Stothard; Godwin Amegbey; David Block; David D Hau; James Wagner; Jessica Miniaci; Melisa Clements; Mulu Gebremedhin; Natalie Guo; Ying Zhang; Gavin E Duggan; Glen D Macinnis; Alim M Weljie; Reza Dowlatabadi; Fiona Bamforth; Derrick Clive; Russ Greiner; Liang Li; Tom Marrie; Brian D Sykes; Hans J Vogel; Lori Querengesser Journal: Nucleic Acids Res Date: 2007-01 Impact factor: 16.971
Authors: Maureen Kachman; Hani Habra; William Duren; Janis Wigginton; Peter Sajjakulnukit; George Michailidis; Charles Burant; Alla Karnovsky Journal: Bioinformatics Date: 2020-03-01 Impact factor: 6.937
Authors: Xiaoyang Su; Eric Chiles; Sara Maimouni; Fredric E Wondisford; Wei-Xing Zong; Chi Song Journal: Anal Chem Date: 2020-03-13 Impact factor: 6.986
Authors: Wenyun Lu; Xi Xing; Lin Wang; Li Chen; Sisi Zhang; Melanie R McReynolds; Joshua D Rabinowitz Journal: Anal Chem Date: 2020-08-12 Impact factor: 6.986