Literature DB >> 33404064

Creation and filtering of a recurrent spectral library of CHO cell metabolites and media components.

Kelly H Telu1, Ramesh Marupaka1, Nirina R Andriamaharavo1, Yamil Simón-Manso1, Yuxue Liang1, Yuri A Mirokhin1, Tallat H Bukhari1, Renae J Preston2, Lila Kashi2, Zvi Kelman2, Stephen E Stein1.   

Abstract

This paper reports the first implementation of a new type of mass spectral library for the analysis of Chinese hamster ovary (CHO) cell metabolites that allows users to quickly identify most compounds in any complex metabolite sample. We also describe an annotation methodology developed to filter out artifacts and low-quality spectra from recurrent unidentified spectra of metabolites. CHO cells are commonly used to produce biological therapeutics. Metabolic profiles of CHO cells and media can be used to monitor process variability and look for markers that discriminate between batches of product. We have created a comprehensive library of both identified and unidentified metabolites derived from CHO cells that can be used in conjunction with tandem mass spectrometry to identify metabolites. In addition, we present a workflow that can be used for assigning confidence to a NIST MS/MS Library search match based on prior probability of general utility. The goal of our work is to annotate and identify (when possible), all liquid chromatography-mass spectrometry generated metabolite ions as well as create automatable library building and identification pipelines for use by others in the field.
© 2021 The Authors. Biotechnology and Bioengineering Published by Wiley Periodicals LLC. This article has been contributed to by US Government employees and their work is in the public domain in the USA.

Entities:  

Keywords:  Chinese hamster ovary cells; global metabolite profiling; liquid chromatography-tandem mass spectrometry; nontargeted metabolomics; recurrent unidentified spectra

Mesh:

Substances:

Year:  2021        PMID: 33404064      PMCID: PMC8048470          DOI: 10.1002/bit.27661

Source DB:  PubMed          Journal:  Biotechnol Bioeng        ISSN: 0006-3592            Impact factor:   4.530


INTRODUCTION

Chinese hamster ovary (CHO) cells are the predominant host cells for monoclonal antibody (mAb) production (Kunert & Reinhart, 2016). Metabolomics provides information on cellular phenotypes. Several metabolites have been demonstrated to be biomarkers of CHO cell status (Mohmad‐Saberi et al., 2013). Metabolomic analysis of CHO cells has primarily been used in process or media/feed development and has predominantly focused on targeted metabolite analysis of major metabolites, although there are several studies that utilized global metabolite analysis (Stolfa et al., 2018). A comprehensive assessment of CHO cell metabolic profiles could lead to improvements in product yield and quality by providing further understanding of the CHO cell metabolome (Stolfa et al., 2018). Mass spectral libraries have been extremely popular for more than 40 years for identifying volatile chemical compounds using gas chromatography‐mass spectrometry (GC‐MS). They are used to locate the most similar spectra in the reference library and present the compounds that generated them in a “hit list” sorted by their similarity to the acquired spectrum (S. Stein, 2012). Liquid chromatography‐mass spectrometry (LC‐MS) is a widely practiced method for identifying the chemical components in metabolomics (Gowda & Djukovic, 2014). For confident metabolite identifications, liquid chromatography‐tandem mass spectrometry (LC‐MS/MS) can be performed and the fragmentation pattern can be compared to a MS/MS spectral library. Commercial MS/MS libraries that contain curated spectra (the NIST Tandem [MS/MS] Mass Spectral Library and the Wiley MSforID Library) as well as free libraries that facilitate data sharing (MassBank, MassBank of North America [MoNA], LipidBlast, METLIN, mzCloud, GNPS, etc.) are available and have been reviewed recently (Kind et al., 2018). These libraries contain experimental spectra of known compounds and spectra of unidentified compounds are not documented there. Other libraries such as LipidBlast, Greazy/LipidLama, CFM‐ID, and so forth are based on in silico prediction of the spectra of known or predicted metabolites (Kind et al., 2018). A comprehensive library of both known and unidentified CHO cell metabolites will be beneficial to the field of CHO cell metabolite analysis. In addition to producing the NIST MS/MS Library, the NIST Mass Spectrometry Data Center (MSDC) has recently begun creating material‐oriented libraries that are generated from the analysis of complex mixtures such as human plasma and urine (https://chemdata.nist.gov/dokuwiki/doku.php?id = chemdata:arus) to address the issue of unknown metabolites (metabolites not identified by library searching), identify cross‐platform metabolite signatures, and catalogue all spectra associated with a particular material of interest (Mallard et al., 2014; Remoroza et al., 2018; Simon‐Manso et al., 2013; Simon‐Manso et al., 2019; S. Stein, 2012; Telu et al., 2016). These material‐oriented libraries contain recurrent spectra (spectra that occur repeatedly in the sample) for all detectable metabolites, both known and unknown that are processed to produce high‐quality consensus spectra for the library. The MSDC has also created spectral libraries (Dong et al., 2018; Dong, Yan, Liang, & Stein, 2016) of the NISTmAb, a humanized IgG1κ Monoclonal Antibody Reference Material (RM 8671; https://www.nist.gov/programs-projects/nist-monoclonal-antibody-reference-material-8671). The use of tandem mass spectral libraries in biomedical and biomanufacturing applications has been very limited until recently with the development of omics technologies. To date, there are no reports of libraries being used for optimizing biomanufacturing processes and very little for discovering new metabolic pathways. Here, we implemented recurrent spectral libraries for use in CHO cell metabolite analysis that allows users to quickly identify most compounds in any complex metabolite sample. We also developed an annotation strategy for these libraries to filter out artifacts and low‐quality spectra from recurrent unidentified spectra of metabolites. These libraries are focused on metabolite analysis, however, small peptides that extract along with the metabolites are also present. Furthermore, the limited coverage of tandem libraries is somewhat ameliorated by the use of the recently developed hybrid search (Burke et al., 2017; Cooper et al., 2019; Moorthy et al., 2017), which can identify compounds similar to, but not present in the library. The recurrent spectral library is unique in that it can be used to determine if an ion has been seen before in other analyses, assign the class identification for compounds not found in a library or commercially available, and enables library evolution based upon feedback from users. As more experiments are done, the library can continue to grow in coverage. The library and the associated metabolite identifications are freely available for download for use in the analysis of CHO cell metabolites by LC‐MS/MS. Although this study was demonstrated in CHO cells, the developed methods for filtering spectra and assigning match confidence can be applied to not only other cell types, but also other metabolomics studies. In addition, work is currently underway at NIST to create a metabolite identification pipeline and graphical user interface (GUI) that those in the biomanufacturing community can use to implement their own libraries.

EXPERIMENTAL METHODS*

For the coverage of metabolites to be broad, CHO cells were extracted by four different methods available in the literature: (1) 50% acetonitrile in water (Dietmair et al., 2012), (2) Methanol (Dietmair, Timmins, Gray, Nielsen, & Kromer, 2010; Sellick et al., 2011), (3) methanol/methyl tert‐butyl ether(MTBE)/water, and (4) methanol/dichloromethane(DCM)/water (Matyash et al., 2008). Metabolites were separated with three different LC methods (reversed‐phase [C18], hydrophilic interaction liquid chromatography [HILIC] and a reversed‐phase method optimized for lipids [lipid C18]), and analyzed in positive and negative ionization mode with both higher‐energy C‐trap dissociation (HCD**)  over a range of collision energies and ion trap (IT) collision‐induced dissociation. Media samples (fresh and spent) were resuspended in two different solvents (50% acetonitrile or pure methanol) after protein precipitation, separated with two different LC methods (C18 and HILIC), and analyzed with the same breadth of methods as the CHO cell metabolites.

Sample preparation

CHO‐S cells (Thermo Fisher Scientific) were grown in ProCHO5 protein‐free medium (Lonza) supplemented with 4 mmol/L L‐glutamine (Thermo Fisher Scientific). CHO cells and spent media were harvested and metabolite extractions were performed. Protein precipitation was performed on the media with 80% (vol/vol) methanol. After drying and before analysis, media samples were resuspended in either pure methanol or 50% acetonitrile (vol/vol). Metabolites were extracted by four different methods: 50% acetonitrile in water, methanol, methanol/methyl tert‐butyl ether (MTBE)/water, and methanol/dichloromethane (DCM)/water. Additional details regarding sample preparation can be found in the supporting information.

LC‐MS/MS analysis

The metabolites were separated by three different liquid chromatography methods. Extracts containing polar metabolites (50% acetonitrile, methanol, lower phase for the methanol/MTBE/water extraction, and upper phase for the methanol/DCM/water extraction) were separated by both C18 and HILIC. The organic phases of the two lipid extractions were separated by a lipid C18 method. Fresh and spent media samples were separated by C18 and HILIC. These separations were coupled to either a Q Exactive or Orbitrap Fusion Lumos (Thermo Fisher Scientific). The data were collected in positive and negative ionization mode with data‐dependent MS/MS acquisition. To provide as many spectra as possible for the library, HCD spectra were collected over a range of normalized collision energies from 10 to 50 using nitrogen as the collision gas. In addition, low‐resolution IT and high‐resolution IT spectra were acquired on the Lumos at a normalized collision energy of 35% using helium as the collision gas. The collision gases used were those recommended by the equipment manufacturer. Additional details regarding analysis can be found in the supporting information.

Data analysis

Data were analyzed to produce recurrent spectral libraries as reported previously (Telu et al., 2016). Briefly, all data were processed with the NIST MSCQ pipeline (see below under “Annotation of Spectra” for a description of the pipeline). Recurrent spectra were exported from the output of the pipeline with a perfect score cutoff (1.0) to ensure all spectra (even identified ones) were included. Following this, consensus spectra were created from the experimental data using in‐house developed software after grouping the data by polarity, fragmentation type (HCD or IT), and collision energy. The similarity of the spectra was based on precursor and the dot‐product (Yang et al., 2014). Only similar spectra (a cluster) were used to create the consensus spectrum. Spectra dissimilar to the given cluster were placed in another cluster or, if unique, were ignored. After the libraries were created, the consensus spectra were searched against the NIST17 Library to obtain metabolite identifications. In addition, an annotation strategy was developed following manual evaluation of a representative data file. The data file analyzed was a 50% acetonitrile extraction that was separated on a C18 column and fragmented at HCD 20. The file was searched against the NIST17 Library with the NIST MSPepSearch software to provide tandem mass spectral library identifications as discussed below.

RESULTS AND DISCUSSION

Identification of metabolites

The first goal of this study was to collect, organize, and to the degree possible, identify all measurable tandem mass spectra in CHO cell metabolite and growth media extracts acquired using electrospray LC‐MS/MS methods. To do this, we developed an HCD and IT fragmentation spectral library containing consensus spectra in both positive and negative ionization mode using a spectral clustering method developed in‐house. The libraries contain data from both CHO cell metabolite analyses as well as media analyses and are annotated to show the origin of the spectra. In addition to metabolites, peptides that are co‐extracted are also present in the libraries, although these are not the focus of the work. The resulting HCD recurrent spectral libraries contain 109,601 and 61,677 spectra for the positive and negative ionization mode libraries, respectively. The IT libraries contain 15,703 and 12,499 spectra for the positive and negative ionization mode libraries, respectively. IT spectra are similar to low energy HCD spectra, except for their low mass cut‐off at about one‐third of the precursor mass and their higher degree of fragmentation at these low energies; IT fragment ions are therefore more intense than low energy HCD spectra. Note that low energy spectra are generally easier to interpret than higher energy spectra due to their simpler mechanisms. Additional information about the libraries, including collision energies, precursor ion types, and source (CHO cell, media, or both) of the consensus spectra can be found in the supporting information. The results of CHO cell metabolite and media analyses are highly orthogonal as only 8%–13% of the consensus spectra in the libraries originate from both samples. The overlap would likely be higher if a chemically defined media was used. To identify spectra, we searched the consensus spectra generated for the recurrent spectral libraries against the NIST17 MS/MS library (Yang et al., 2014; Yang et al., 2017). To compare our results with those previously published in the literature, the CHO cell metabolite identifications were summarized and compared to a literature review of CHO cell metabolite identifications. To summarize the identifications, we sorted the library match identifications by name and library match score. We kept only the top‐scoring hit of each identification and then manually validated the library match result. Any poor matches were removed. In addition, we curated the data to remove identifications that are not previously observed as endogenous metabolites by searching for the identification in the Human Metabolome Database (HMDB) (Wishart et al., 2013, 2018), PubChem (Kim et al., 2019), or the LIPID MAPS structure database (Sud et al., 2007) as no comprehensive CHO cell metabolite library is available. If there was no information on if an identification was a metabolite, it was not removed. Spreadsheet 1 of supporting information contains all the library match identifications and can be mined for new or unexpected metabolites by experts in CHO cell metabolism. Our curated list resulted in 365 CHO cell metabolites (the majority identified by multiple ions or in multiple libraries) and an additional 304 di‐ or tri‐peptides. We split out the peptides into a separate list because they are likely less interesting than other metabolites. Metabolites identified are reported in Table 1. A literature search resulted in a list of 232 metabolites. Identifications made by HPLC, GC‐MS, MALDI‐MS, and LC‐MS were included. Of these 232 reported metabolites, we identified 43% in our data. For ones that were not identified, the majority (66%) were represented in the NIST17 library, but not identified in our experiments, possibly because they were below the detection limit. The remaining literature identifications not present in the NIST17 library that are compatible with analysis by LC‐MS can be added to future versions of the NIST MS/MS library. Lists of identified metabolites summarized from our data as well as the literature review can be found in Spreadsheets 2 and 3 of the supporting information, respectively. These spreadsheets also contain information demonstrating the percentages reported herein.
Table 1

Compounds identified in the Recurrent Spectral Library created from CHO cell metabolite extracts

MetaboliteLibraryPubChem ID
10Z‐Nonadecenoic acidHCD‐Pos5312513
1‐MethylnicotinamideHCD‐Pos457
1‐MethylxanthineHCD‐Pos80220
2,3‐Dehydro‐2‐deoxy‐N‐acetylneuraminic acidHCD‐Pos, IT‐Pos65309
2,3‐Diaminopropionic acidHCD‐Pos364
2‐Arachidonyl glycerol etherHCD‐Pos6483057
2'‐Deoxyguanosine 5'‐monophosphateHCD‐Neg, IT‐Neg645
2‐hydroxy‐2‐(4‐hydroxy‐3‐methoxyphenyl)acetic acidHCD‐Pos1245
2‐Hydroxyhexadecanoic acidHCD‐Neg92836
2‐HydroxyphenethylamineHCD‐Pos1000
2‐MethylbutyrylcarnitineHCD‐Pos6426901
2‐Methylhippuric acidHCD‐Pos91637
2'‐O‐MethyladenosineHCD‐Neg, IT‐Neg102213
2‐Phospho‐d‐glyceric acidHCD‐Neg, IT‐Neg59
3,4‐Dihydroxymandelic acidHCD‐Pos85782
3'‐AMPHCD‐Neg, HCD‐Pos41211
3'‐CMPHCD‐Neg, HCD‐Pos, IT‐Pos66535
3‐Deoxy‐d‐glycero‐d‐galacto‐2‐nonulosonic acidHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos123691
3‐Hexenedioic acidHCD‐Pos107550
3‐Oxoglutaric acidHCD‐Pos68328
3‐PhosphoglycerateHCD‐Neg724
3‐Sialyl‐N‐acetyllactosamineHCD‐Neg, HCD‐Pos, IT‐Neg4150746
4‐Coumaryl alcoholHCD‐Pos5280535
4‐Hydroxybutyric acidHCD‐Neg10413
4‐Hydroxyglutamic acidHCD‐Neg439902
5α‐cholest‐7‐en‐3β‐olHCD‐Pos, IT‐Pos420
5‐Aminovaleric acidHCD‐Pos138
5‐HydroxyindoleHCD‐Pos16054
5‐HydroxylysineHCD‐Pos439437
5'‐MethylthioadenosineHCD‐Pos, IT‐Pos149
5‐Phosphonatoribosyl 1‐pyrophosphateHCD‐Neg1041
5‐Thymidylic acidHCD‐Neg, IT‐Neg1139
6‐Phosphogluconic acidHCD‐Pos91493
7‐KetocholesterolHCD‐Pos91474
7‐MethylguanineHCD‐Pos135398679
7‐MethylguanosineHCD‐Pos135445750
9,10‐Epoxyoctadecenoic acidHCD‐Pos5283018
AcetylcholineHCD‐Pos187
Acetyl‐CoAHCD‐Neg, HCD‐Pos, IT‐Neg6302
AdenineHCD‐Pos190
AdenosineHCD‐Pos60961
Adenosine 2',3'‐cyclic phosphateHCD‐Neg, HCD‐Pos, IT‐Pos2024
Adenosine 2'‐phosphateHCD‐Neg, HCD‐Pos94136
Adenosine diphosphate riboseHCD‐Neg, HCD‐Pos, IT‐Pos30243
Adenosine monophosphateHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos6083
Adenosine phosphosulfateHCD‐Neg10238
Adenosine triphosphateHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos5957
Adenylsuccinic acidHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos440122
ADPHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos6022
AgmatineHCD‐Pos199
α‐d‐Glucose 1,6‐bisphosphateHCD‐Neg82400
α‐IononeHCD‐Pos24680
α‐Ketoisovaleric acidIT‐Neg49
Aminoadipic acidHCD‐Pos469
Arabinonic acidHCD‐Neg, IT‐Neg122045
AspartylglycosamineHCD‐Pos123826
Asymmetric dimethylarginineHCD‐Pos, IT‐Pos123831
β‐CarbolineHCD‐Pos64961
β‐Glycerophosphoric acidHCD‐Neg, IT‐Neg2526
BetaineHCD‐Pos247
BiopterinHCD‐Pos135403659
But‐2‐enoic acidHCD‐Neg637090
CarnosineHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos439224
CDPHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos6132
Cer(d18:1/24:1(15Z))HCD‐Pos, IT‐Pos5283568
Cholest‐5‐en‐3‐oneHCD‐Pos9908107
Cholesta‐4,6‐dien‐3‐oneHCD‐Pos, IT‐Pos3034666
CholesterolHCD‐Pos5997
CholineHCD‐Pos, IT‐Pos305
cis‐Aconitic acidHCD‐Neg309
cis‐Vaccenic acidHCD‐Pos5282761
CiticolineHCD‐Neg, HCD‐Pos, IT‐Pos13804
Citraconic acidHCD‐Neg643798
Citric acidHCD‐Neg, IT‐Neg311
CitrullineHCD‐Neg, HCD‐Pos833
Coenzyme AHCD‐Neg87642
Coenzyme Q9HCD‐Pos, IT‐Pos5280473
Cyclic ADP‐riboseHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos123847
Cyclic AMPHCD‐Neg6076
CytidineHCD‐Neg, HCD‐Pos6175
Cytidine 5'‐diphosphate ethanolamineHCD‐Neg, HCD‐Pos, IT‐Neg123727
Cytidine monophosphateHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos6131
Cytidine monophosphate N‐acetylneuraminic acidHCD‐Neg448209
Cytidine triphosphateHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos6176
CytosineHCD‐Pos597
dCDPHCD‐Neg, IT‐Neg150855
dCMPHCD‐Neg, HCD‐Pos, IT‐Neg13945
Deoxyadenosine monophosphateHCD‐Neg12599
DeoxycytidineHCD‐Pos13711
DeoxyinosineHCD‐Pos, IT‐Pos135398593
Dephospho‐CoAHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos444485
d‐ErythroseHCD‐Neg94176
d‐FructoseHCD‐Neg5984
DG(14:0/14:0/0:0)HCD‐Pos10369168
DG(16:0/16:0/0:0)HCD‐Pos644078
DG(16:0/18:1(9Z)/0:0)HCD‐Pos, IT‐Pos5282283
DG(18:1(9Z)/18:1(9Z)/0:0)HCD‐Pos, IT‐Pos9543716
d‐GalactoseHCD‐Neg, IT‐Neg6036
d‐Glucaro‐1,4‐lactoneHCD‐Neg, IT‐Neg122306
d‐GlucoseHCD‐Neg5793
d‐Glucuronic acidHCD‐Neg94715
Diadenosine triphosphateHCD‐Neg165381
DihydrobiopterinHCD‐Pos135402011
d‐Malic acidHCD‐Neg, IT‐Neg525
d‐MaltoseHCD‐Neg, IT‐Pos294
d‐Mannose 1‐phosphateHCD‐Neg, IT‐Neg644175
d‐OrnithineHCD‐Pos71082
d‐Phenyllactic acidHCD‐Neg643327
d‐Pipecolinic acidHCD‐Pos736316
ε‐caprolactamHCD‐Pos, IT‐Pos7768
ErucamideHCD‐Pos, IT‐Pos5365371
Erucic acidHCD‐Pos, IT‐Pos8216
FAPy‐adenineHCD‐Pos114926
Flavin mononucleotideHCD‐Pos, IT‐Pos710
Folic acidHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos135398658
Fructose 1,6‐bisphosphateHCD‐Neg, IT‐Neg10267
Fructose‐6‐phosphateHCD‐Neg69507
Galactaric acidHCD‐Neg3037582
GalactinolHCD‐Neg, IT‐Neg11727586
GalactitolHCD‐Neg11850
Galactonic acidHCD‐Neg128869
Galactose 1‐phosphateHCD‐Neg, IT‐Neg123912
GalactosylsphingosineHCD‐Pos5280458
Galβ1,3GlcNAcHCD‐Pos440994
ϒ‐Aminobutyric acidHCD‐Pos119
ϒ‐Glutamylglutamic acidHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos92865
GDP‐glucoseHCD‐Pos135398625
GDP‐l‐fucoseHCD‐Neg135398655
Glucaric acidHCD‐Neg, IT‐Neg33037
Glucose 1‐phosphateHCD‐Neg65533
Glucose 6‐phosphateHCD‐Neg, HCD‐Pos, IT‐Neg5958
GlutathioneHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos124886
Glyceraldehyde 3‐phosphateIT‐Neg729
Glyceric acidHCD‐Neg752
GlycerophosphocholineHCD‐Pos, IT‐Pos71920
Glyceryl monooleateHCD‐Pos33022
Guanidinosuccinic acidHCD‐Neg, HCD‐Pos97856
GuanineHCD‐Neg, HCD‐Pos135398634
GuanosineHCD‐Neg, HCD‐Pos, IT‐Neg135398635
Guanosine diphosphateHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos135398619
Guanosine diphosphate mannoseHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos135398627
Guanosine monophosphateHCD‐Neg, HCD‐Pos, IT‐Pos135398631
Guanosine triphosphateHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos135398633
HelicinHCD‐Pos101799
Hexadecanedioic acidHCD‐Pos10459
Hydroxyphenyllactic acidHCD‐Neg, HCD‐Pos, IT‐Neg9378
HypoxanthineHCD‐Pos135398638
Indole‐3‐carboxylic acidHCD‐Pos69867
Indolelactic acidHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos92904
IndolepyruvateHCD‐Pos803
InosineHCD‐Pos135398641
Inosinic acidHCD‐Neg, HCD‐Pos, IT‐Neg135398640
Inositol 1,3,4‐trisphosphateHCD‐Neg123680
Inositol 1,3‐bisphosphateHCD‐Neg, IT‐Neg128419
Inositol 1,4,5‐trisphosphateHCD‐Pos55310
Inositol 1,4‐bisphosphateHCD‐Neg, HCD‐Pos123903
Inositol 1‐phosphateIT‐Pos107737
Inositol 3‐phosphateHCD‐Pos440194
Inositol 4‐phosphateHCD‐Neg440043
Isobutyryl‐l‐carnitineHCD‐Pos168379
Isocitric acidHCD‐Neg1198
IsomaltoseHCD‐Neg872
Isovaleryl coenzyme AHCD‐Neg165435
Isovaleryl‐l‐carnitineHCD‐Pos169235
KetoleucineHCD‐Neg70
l2‐Hydroxyglutaric acidHCD‐Neg43
Lacto‐N‐triaoseHCD‐Pos53477860
LactoseHCD‐Neg6134
l‐ArginineHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos6322
l‐AsparagineHCD‐Neg, HCD‐Pos6267
l‐Aspartic acidHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos424
l‐CarnitineHCD‐Pos, IT‐Pos10917
l‐CystathionineHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos834
l‐CystineHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos67678
l‐ErythruloseHCD‐Neg162406
Lewis A trisaccharideHCD‐Pos4139998
Lewis X trisaccharideHCD‐Pos, IT‐Pos4571095
l‐Glutamic acidHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos33032
l‐GlutamineHCD‐Neg, HCD‐Pos5961
l‐GulonolactoneHCD‐Neg, IT‐Neg439373
l‐HistidineHCD‐Neg, HCD‐Pos6274
l‐HomoserineHCD‐Pos, IT‐Pos12647
l‐IditolHCD‐Pos5460044
l‐IsoleucineHCD‐Pos6306
l‐KynurenineHCD‐Pos161166
l‐LeucineHCD‐Neg, HCD‐Pos, IT‐Neg6106
l‐LysineHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos5962
l‐MethionineHCD‐Pos, IT‐Pos6137
l‐PhenylalanineHCD‐Neg, HCD‐Pos, IT‐Pos6140
l‐ProlineHCD‐Neg, HCD‐Pos145742
l‐SerineHCD‐Neg, HCD‐Pos, IT‐Pos5951
l‐ThreonineHCD‐Neg, HCD‐Pos, IT‐Pos6288
l‐TryptophanHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos6305
l‐TyrosineHCD‐Neg, HCD‐Pos, IT‐Pos6057
l‐ValineHCD‐Pos6287
MaltotetraoseHCD‐Neg, HCD‐Pos, IT‐Pos870
MaltotrioseHCD‐Neg, IT‐Neg92146
Mannose 6‐phosphateHCD‐Neg, HCD‐Pos, IT‐Neg65127
MelibioseHCD‐Neg219994
Methionine sulfoxideHCD‐Neg, HCD‐Pos, IT‐Pos158980
MG(0:0/16:0/0:0)HCD‐Pos, IT‐Pos123409
myo‐InositolHCD‐Neg892
N8‐AcetylspermidineHCD‐Pos, IT‐Pos123689
N‐Acetyl‐d‐galactosamineHCD‐Pos35717
N‐Acetyl‐d‐glucosamineHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos439174
N‐Acetyl‐d‐glucosamine 6‐phosphateHCD‐Neg439219
N‐Acetyl‐d‐lactosamineHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos9800166
N‐Acetyl‐l‐aspartic acidHCD‐Pos65065
N‐Acetyl‐l‐carnosineHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos9903482
N‐Acetyl‐l‐glutamic acidHCD‐Neg70914
N‐Acetyl‐l‐glutamineHCD‐Pos, IT‐Pos182230
N‐Acetyl‐l‐methionineHCD‐Neg, IT‐Neg448580
N‐Acetyl‐l‐phenylalanineHCD‐Neg, HCD‐Pos, IT‐Pos74839
N‐Acetyl‐l‐tyrosineHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos68310
N‐AcetylmannosamineHCD‐Pos, IT‐Pos65150
N‐Acetylneuraminic acidHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos906
NADHCD‐Neg, HCD‐Pos, IT‐Pos925
NADHHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos439153
NADPHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos4412
N‐alpha‐Acetyl‐l‐ornithineHCD‐Pos907
N‐Formyl‐l‐methionineHCD‐Neg, IT‐Neg439750
N‐Glycolylneuraminic acidHCD‐Neg, IT‐Neg123802
NiacinamideHCD‐Pos, IT‐Pos936
Nicotinamide ribosideHCD‐Pos439924
Nicotinamide ribotideHCD‐Pos16219737
Nicotinic acid adenine dinucleotideHCD‐Neg165490
Nicotinic acid mononucleotideHCD‐Neg, IT‐Neg5288991
N‐Methyl‐l‐glutamic acidHCD‐Pos439377
N‐MethyllysineHCD‐Pos164795
N‐MethyltyramineHCD‐Pos9727
N‐Palmitoyl‐d‐sphingosineHCD‐Pos, IT‐Pos5353456
OleamideHCD‐Pos5283387
Oleic acidHCD‐Pos445639
Oleoyl glycineHCD‐Pos6436908
Oleoyl serineHCD‐Neg, HCD‐Pos, IT‐Pos44190514
O‐PhosphotyrosineIT‐Pos30819
Orotic acidHCD‐Neg967
O‐TyrosineHCD‐Pos91482
Oxidized glutathioneHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos65359
PA(16:0/16:0)HCD‐Neg3099
PA(16:0/18:1(9Z))HCD‐Neg5283523
PA(18:1/0:0)HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos5311263
Palmitic amideHCD‐Pos69421
Palmitoyl ethanolamideHCD‐Pos4671
Palmitoyl sphingomyelinHCD‐Neg, HCD‐Pos, IT‐Pos9939941
Pantothenic acidHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos6613
Paullinic acidHCD‐Pos5312518
PC(14:0/0:0)HCD‐Neg, HCD‐Pos, IT‐Pos460604
PC(14:0/14:0)HCD‐Neg, HCD‐Pos, IT‐Neg5459377
PC(14:0/16:0)HCD‐Neg, HCD‐Pos129657
PC(14:0/18:0)HCD‐Pos131150
PC(15:0/15:0)HCD‐Pos, IT‐Neg24778654
PC(16:0/0:0)HCD‐Pos, IT‐Pos460602
PC(16:0/12:0)HCD‐Pos10676014
PC(16:0/14:0)HCD‐Neg, HCD‐Pos, IT‐Neg24778679
PC(16:0/16:0)HCD‐Pos452110
PC(16:0/18:1(9Z))HCD‐Neg, HCD‐Pos, IT‐Pos5497103
PC(16:0/18:2(9Z,12Z))HCD‐Pos5287971
PC(16:1(9Z)/16:1(9Z))HCD‐Pos24778764
PC(18:0/0:0)HCD‐Neg, HCD‐Pos497299
PC(18:0/14:0)HCD‐Pos3082163
PC(18:0/18:0)HCD‐Pos94190
PC(18:0/18:1(9Z))HCD‐Neg24778825
PC(18:0/18:2(9Z,12Z))HCD‐Pos6441487
PC(18:1(9Z)/0:0)HCD‐Neg, HCD‐Pos, IT‐Pos16081932
PC(18:1(9Z)/14:0)HCD‐Pos, IT‐Neg24778931
PC(18:1(9Z)/16:0)HCD‐Neg, HCD‐Pos24778933
PC(20:1(11Z)/20:1(11Z))HCD‐Pos24779063
PC(22:0/0:0)HCD‐Pos24779479
PC(22:1(13Z)/22:1(13Z))HCD‐Pos24779126
PC(24:0/0:0)HCD‐Pos24779481
PC(O‐16:0/0:0)HCD‐Pos, IT‐Pos162126
PC(O‐16:0/18:1(9Z))HCD‐Pos24779266
PC(O‐16:0/2:0)HCD‐Pos108156
PC(O‐16:0/20:3(8Z,11Z,14Z))HCD‐Pos16759365
PC(O‐18:0/0:0)HCD‐Pos2733532
PC(P‐16:0/0:0)HCD‐Pos10917802
PC(P‐18:0/0:0)HCD‐Neg, HCD‐Pos24779527
PC(P‐18:0/18:1(9Z))HCD‐Pos42607428
PE(14:0/0:0)HCD‐Neg, HCD‐Pos, IT‐Neg9547070
PE(16:0/0:0)HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos9547069
PE(16:0/16:0)HCD‐Neg, IT‐Neg445468
PE(16:0/18:1(9Z))HCD‐Pos5283496
PE(16:0/18:2(9Z,12Z))HCD‐Pos9546747
PE(18:0/0:0)HCD‐Neg, HCD‐Pos, IT‐Pos9547068
PE(18:0/18:1(9Z))HCD‐Pos9546742
PE(18:0/18:2(9Z,12Z))HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos9546749
PE(18:1(9Z)/0:0)HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos9547071
PE(18:1(9Z)/18:1(9Z))HCD‐Pos9546757
PE(O‐16:0/18:1(9Z))HCD‐Neg, HCD‐Pos42607455
PE(P‐18:0/18:1(9Z))HCD‐Neg42607457
PE(P‐18:0/22:6(4Z,7Z,10Z,13Z,16Z,19Z))HCD‐Neg, IT‐Neg42607458
PE‐NMe2(18:1(9Z)/18:1(9Z))HCD‐Neg, IT‐Neg9547022
PG(18:0/18:1)HCD‐Neg, IT‐Neg24779551
Phenylacetic acidHCD‐Neg999
PhenylacetylglutamineHCD‐Neg, HCD‐Pos, IT‐Pos92258
Phosphoadenosine phosphosulfateHCD‐Neg, IT‐Neg10214
PhosphorylcholineHCD‐Pos, IT‐Pos135437
PhosphoserineHCD‐Neg106
PI(16:0/18:1(9Z))HCD‐Neg, IT‐Neg5771758
Pip(18:1(9Z)/18:1(9Z))HCD‐Neg53480169
p‐OctopamineHCD‐Pos4581
Proline betaineIT‐Pos115244
PS(16:0/18:1(9Z))HCD‐Neg, HCD‐Pos, IT‐Neg5283499
PS(16:0/20:4)HCD‐Neg24779544
PS(18:0/18:0)HCD‐Neg9547096
PS(18:0/18:1(9Z))HCD‐Neg, HCD‐Pos, IT‐Neg9547087
PS(18:0/20:4(5Z,8Z,11Z,14Z))HCD‐Neg24779545
PS(18:1(9Z)/18:1(9Z))HCD‐Neg, HCD‐Pos, IT‐Neg6438639
PterinHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos73000
PyridoxalHCD‐Pos, IT‐Pos1050
Pyridoxal 5'‐phosphateHCD‐Neg, HCD‐Pos1051
PyridoxamineHCD‐Pos, IT‐Pos1052
Pyroglutamic acidHCD‐Neg, HCD‐Pos, IT‐Pos7405
RaffinoseHCD‐Neg, HCD‐Pos10542
RibitolHCD‐Neg6912
RiboflavinHCD‐Neg, HCD‐Pos, IT‐Pos493570
Ribono‐ϒ‐lactoneHCD‐Neg111064
Ribose 1‐phosphateHCD‐Neg, IT‐Neg1074
Ribose 5‐phosphateHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos77982
Ribulose 5‐phosphateHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos439184
S‐AdenosylhomocysteineHCD‐Pos, IT‐Pos13792
S‐AdenosylmethionineHCD‐Pos, IT‐Pos34755
Sebacic acidHCD‐Neg, IT‐Neg5192
SedoheptulosanHCD‐Neg5460956
SerotoninHCD‐Pos5202
S‐Glutathionyl‐l‐cysteineHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos10455148
SM(d18:1/18:0)HCD‐Pos6453725
SM(d18:1/18:1(9Z))HCD‐Pos6443882
SM(d18:1/24:1(15Z))HCD‐Pos53481791
SorbitolHCD‐Neg, IT‐Neg5780
SpermineHCD‐Pos1103
StachyoseHCD‐Pos, IT‐Pos439531
Stearoyl ethanolamideHCD‐Pos27902
Succinic acidHCD‐Neg, IT‐Neg1110
Succinic acid semialdehydeHCD‐Neg1112
SucroseHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos5988
Tetradecanoyl‐CoAHCD‐Neg11966124
ThiamineHCD‐Pos, IT‐Pos1130
Thiamine monophosphateHCD‐Pos3382778
Thiamine pyrophosphateHCD‐Neg, HCD‐Pos, IT‐Pos5431
Threonic acidHCD‐Neg, IT‐Neg151152
trans‐13‐Octadecenoic acidHCD‐Pos6161490
trans‐Vaccenic acidHCD‐Pos5281127
TrehaloseHCD‐Neg7427
TrigonellineHCD‐Pos5570
TrioleinHCD‐Pos, IT‐Pos5497163
TripalmitoleinHCD‐Pos9543989
Ubiquinone‐1HCD‐Pos4462
Undecanedioic acidHCD‐Neg, IT‐Neg15816
UracilHCD‐Pos1174
Uric acidHCD‐Neg, HCD‐Pos, IT‐Neg1175
UridineHCD‐Pos6029
Uridine 5'‐diphosphateHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos6031
Uridine 5'‐monophosphateHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos6030
Uridine 5'‐triphosphateHCD‐Neg, HCD‐Pos, IT‐Neg6133
Uridine diphosphate glucoseHCD‐Pos, IT‐Pos8629
Uridine diphosphate glucuronic acidHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos17473
Uridine diphosphategalactoseHCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos6857410
Urocanic acidHCD‐Pos736715
XanthineHCD‐Pos, IT‐Pos1188
XanthosineHCD‐Pos64959
Xanthylic acidHCD‐Neg, HCD‐Pos, IT‐Pos73323
Xylulose 5‐phosphateHCD‐Neg, HCD‐Pos, IT‐Pos439190
ZymosterolHCD‐Pos92746
Compounds identified in the Recurrent Spectral Library created from CHO cell metabolite extracts

Improvement of accuracy of pipeline identifications

We developed a procedure to improve the accuracy of identifications obtained using the NIST MSQC pipeline by modifying the order of identifications in a hit list. The NIST pipeline, by default, sorts hits entirely by their score which reflects the quality of the spectral match between the experimental and library spectra. We identified four categories of errors in identification. For clarity, we labeled these as category A–D errors. Additional information on the errors, examples, and solutions for these errors can be found in the supporting information.

Hybrid search

To discover the identity of compounds not represented in the library, a hybrid search was performed. The hybrid search match is a new search strategy available in the 2017 release of NIST MS Search software (version 2.3) (Burke et al., 2017; Cooper et al., 2019; Moorthy et al., 2017). This search finds compounds that differ by an inert chemical group, hence, can often match unidentified spectra with members of the same chemical classes that are present in the library. The term delta mass is used to represent the difference in mass between the query spectrum and library entry. An example of a hybrid match in the CHO cell metabolite data is for the match of a spectrum (ion m/z = 472.0011) to a sodiated Adenosine 5'‐diphosphate library spectrum with a delta mass of 21.9824 Da. This delta mass corresponds to a sodium, so the correct annotation of this ion is adenosine 5'‐diphosphate [M‐H+2Na]+. The hybrid search was also utilized to assist in the identification of two groups of related spectra. Information on these identifications can be found in the supporting information.

Utility of recurrent spectral libraries

There are multiple metabolomics analysis software tools available. A recent review summarized those that are freely available (Spicer et al., 2017). In addition, there are a variety of freely available packages for processing MS/MS spectra (Kind et al., 2018). One such tool, RAMClustR (Broeckling et al., 2014) can group features extracted via XCMS (Smith et al., 2006) into spectra in an unsupervised manner and therefore identify features that originate from the same compound in an indiscriminant MS/MS (idMS/MS) data acquisition. Spectra can then be searched against a reference library such as the NIST MS/MS Library. The NIST MSQC pipeline (Rudnick et al., 2010; https://chemdata.nist.gov/dokuwiki/doku.php?id=peptidew:msqcpipeline), a fully integrated software pipeline that was developed for the analysis of a tryptic protein digest to assist in the identification of variability caused by issues with analytical platforms, was used to process data files in this study. We have extended the application of the pipeline to identification of small molecule metabolites by modifying searching and scoring. The pipeline begins by reading a data file from a commercial instrument, extracting all spectral data, and searching the spectra against the NIST library using the NIST MS Search software. When multiple spectra are acquired for a single precursor ion, the most intense one is selected and its maximum MS1 abundance is recorded at its retention time. Figure 1 shows ion plots generated from the pipeline output after searching against the NIST17 MS/MS library or the Recurrent library and provides a visual representation of the data. Each object represents a clustered mass spectrum. More detailed plots of those shown in Figure 1 can be found in the supporting information. For these ion plots, the pipeline has found 5335 ion clusters in this data file. When searched against the NIST17 MS/MS library, 80% of these clusters have no identification. When searched against the positive ion HCD recurrent spectral library, the number of clusters with no identification drops to 23%. Thirty‐eight percent of the clusters have a recurrent label, which indicates they have matched spectra in the recurrent spectral library by either direct or hybrid MS/MS search. This increase in cluster identification demonstrates the utility of the recurrent spectral libraries. As we are cataloguing every observed ion in the libraries instead of just previously identified metabolites, we can identify these ions in future analyses of the same or similar materials.
Figure 1

Plot of a single LC‐MS/MS analysis of a 50% acetonitrile extract of CHO cell metabolites after searching against the NIST17 MS/MS library (left) or Recurrent Library (right). LC, liquid chromatography; MS, mass spectrometry [Color figure can be viewed at wileyonlinelibrary.com]

Plot of a single LC‐MS/MS analysis of a 50% acetonitrile extract of CHO cell metabolites after searching against the NIST17 MS/MS library (left) or Recurrent Library (right). LC, liquid chromatography; MS, mass spectrometry [Color figure can be viewed at wileyonlinelibrary.com]

Annotation of spectra

The second goal of this study was to develop a comprehensive, automatable approach to annotate the spectra in the libraries of recurrent spectra for the purpose of filtering out artifacts and low‐quality spectra from recurrent unidentified spectra of metabolites. This type of filtering is important because unknowns can be redundant signals, artifacts (man‐made signals), and contaminants (real chemicals), instead of metabolites that are not present in the library used for spectral matching (Sindelar & Patti, 2020). Credentialing features (Mahieu et al., 2014) and isotopic ratio outlier analysis (IROA; de Jong & Beecher, 2012) are two isotopic‐labeling techniques that have been developed to provide further confidence in metabolite identifications. Such techniques were not used in the creation of the mass spectral libraries. Therefore, an annotation strategy based on the comparison of the extracted ion chromatogram (EIC) between the sample and blank runs was developed to filter library spectra. Once spectra are filtered, efforts can be focused on identifying compounds that are likely to be unidentified recurrent spectra originating from CHO cells and/or media (vs. the environment or instrument) by searching the spectra against other available tandem mass spectral libraries and in silico prediction libraries. MassBank of North America (MoNA) in combination with the NIST MS/MS Library and in silico fragmentation tools CSI:FingerID and LipidBlast has been demonstrated to be effective in assigning structural annotation to MS/MS spectra (Blaženović et al., 2019). To develop the annotation strategy, the search results of all the mass spectra contained in a representative data file were manually evaluated. Figure 2 is a graphical summary of the annotation strategy developed for filtering. First, spectra are removed if they do not have a sufficiently narrow chromatographic peak width (<30 s), unless they are identified. Second, spectra without sufficiently high spectral purity (>80%) are removed. Third, spectra without sufficient fragment ion abundances (summed product ion abundance/precursor ion abundance < 10) are removed. The data shown in Figure 1 was filtered using these parameters, which eliminated two‐thirds of the 5335 spectra. The spreadsheet used for sorting and eliminating spectra can be found in Spreadsheet 4 of the supporting information. Of those eliminated, 9.5% of were background, 77.8% were possibly contaminated (due to the presence of another peak close in mass to the parent ion), and 12.8% contained insufficient fragmentation. Figure 3 shows the distribution of abundances of the 1752 identified (by direct and hybrid MS/MS match) and unidentified ion clusters. This figure shows that less abundant compounds are less likely to be identified. These unidentified ion clusters could be comprised of spectra of previously unidentified metabolites or metabolites that are not represented in the library as well as spectra of background and artifacts, which is why annotation is crucial.
Figure 2

Workflow for annotation of spectra [Color figure can be viewed at wileyonlinelibrary.com]

Figure 3

Distribution of identified and unidentified ion clusters after filtering.

*0.2% of hybrid identified and 2.9% of unidentified spectra that were assigned abundance in this bin because the abundance could not be calculated were removed [Color figure can be viewed at wileyonlinelibrary.com]

Workflow for annotation of spectra [Color figure can be viewed at wileyonlinelibrary.com] Distribution of identified and unidentified ion clusters after filtering. *0.2% of hybrid identified and 2.9% of unidentified spectra that were assigned abundance in this bin because the abundance could not be calculated were removed [Color figure can be viewed at wileyonlinelibrary.com] In the next step in spectral annotation after filtering, the EIC of the ion of the corresponding spectrum is compared to the EIC of the same ion in a blank run via visual inspection. If the peak is not present in the blank with intensity within 100x that of the sample, then this spectrum is labeled as either a known (if it is identified by MS/MS match) and annotated with the identification or as an unknown (metabolite not identified by library searching). If the peak is in the blank, then the spectrum can either be due to an artifact/carryover or background. During manual evaluation of EICs, we found that for the purposes of spectral classification, artifact/carryover ions can be separated from background ions by examining the peak width. The background has a broad peak width (in regions where both hydrophilic and hydrophobic compounds elute) while an artifact/carryover has a narrow peak width. For most of the cases, differentiating between background and artifact/carryover was straightforward, however, for cases that were difficult to differentiate, we labeled ions as background if there was a substantial signal in both halves of the chromatogram (0–15 and 15–30 min). Separation using peak width is a quick method to classify spectra, but more accurate methods could be applied in an automated pipeline. Multiple algorithms (Cleary et al., 2019; Ho, Kuo, Wang, Chen, & Tseng, 2013; Zhang & Yang, 2008; Zhu et al., 2009) have been developed for the purpose of subtracting the background from LC‐MS data. In addition, a hierarchical cluster analysis technique was developed to identify chemical interferants that are not removable by background subtraction (Caesar, Kvalheim, & Cech, 2018). Figure 4 shows examples of each of the above‐mentioned classifications. For the unidentified recurrent spectra, these classifications are an effective way (that can be automated) to annotate the spectra for the library. These labels allow us to prioritize spectra needing identification first through library and literature searching. Unknowns represent compounds that originate from the CHO cells or cell culture media and are the highest priority to attempt to identify. Artifacts/carryover are the next priority because these may still be compounds that originate from the CHO cells or cell culture media. Background spectra are likely not worth an analyst's time to try to identify as the background will be different in analyses from different labs. Table S4 shows the resultant annotation of the 20 most abundant unidentified ion clusters. Fifteen percent of the clusters are unknowns and would be the most useful to search the literature and online databases for the identities. Fifty percent of the clusters are artifacts/carryover and the remaining 35% are background.
Figure 4

Examples of each type of annotated ion [Color figure can be viewed at wileyonlinelibrary.com]

Examples of each type of annotated ion [Color figure can be viewed at wileyonlinelibrary.com]

Confidence in library match identifications

A framework for reporting confidence in metabolomics identifications was proposed in 2007 by the Chemical Analysis Working Group of the Metabolomics Standards Initiative (MSI) and is composed of four levels of metabolite confidence. These are identified compounds (Level 1), putatively annotated compounds (Level 2), putatively characterized compound classes (Level 3), and unknown compounds (Level 4) (Sumner et al., 2007). There has been discussion in the metabolomics community about providing more information about confidence by modifying/expanding the level system, introducing a quantitative system, or providing alphanumeric identification metrics, but no consensus has been reached (Creek et al., 2014; Schrimpe‐Rutledge et al., 2016; Schymanski et al., 2014; Sumner et al., 2014; Viant et al., 2017). Schymanski et. al. (2014) proposed a framework for reporting confidence that was based on the MSI levels and adapted for high‐resolution mass spectrometry (HR‐MS). These HR‐MS specific confidence levels are most appropriate for our data and consist of five confidence levels. These are confirmed structure (Level 1), probable structure (Level 2), tentative candidate(s) (Level 3), unequivocal molecular formula (Level 4), and exact mass (Level 5). In this study, we have Level 2, 3, and 5 confidences. Level 1 is confirmed using two or more properties of reference standards using the same experimental conditions. Although the NIST17 MS/MS library is acquired using reference standards, the experimental data in this paper was not acquired on the same platform, so it is not a Level 1 confidence. This type of confirmation is unrealistic for our work where we are trying to catalogue all metabolites and identify as many as possible. The direct identifications reported in this study represent Level 2 confidence structure identifications as they are obtained with library matching. Hybrid match identifications are Level 3 because they are chemical class identifications made with library searching. To assign a Level 4 confidence, we would need to attempt to assign a chemical structure to the spectra in the libraries, which we have not done to date. All the spectra in the libraries are associated with accurate mass data, and spectra annotated as unknowns would have a Level 5 confidence. Some of the spectra annotated as artifact/carryover could be originating from the sample and have a Level 5 confidence but finding these could be challenging and a method for doing this would require further development. To provide additional detail about the confidence of both our direct and hybrid library MS/MS matches, we developed a workflow to assign a qualitative confidence level to each metabolite identification. The workflow starts with the match score and incorporates prior probability information about whether the identified compound has been previously observed as a metabolite. The workflow also incorporates the annotation as described above to ensure the identified spectrum originates from the sample. Match scoring performed by the pipeline is well documented in the literature for the NIST Tandem MS library and for the NIST MS Search program and is based upon the dot product of the spectra being compared (S. E. Stein, 1999). The match score has been validated by manual inspection of matches and correlates very well with the match quality as determined by visual inspection. A score cut‐off of 400 removes essentially all poor matches and has been chosen as the default cutoff for metabolites. To assign confidence, an identification can initially be classified as high, medium, or low confidence, depending on the match score. Scores of 400–599, 600–799, and 800–999 correspond to low, medium, and high confidence, respectively. Of course, in cases of isomers with similar spectra, distinguishing them may not be possible without the use of reference standards. Prior probability as well the spectrum annotation can be used to raise or lower the qualitative level of confidence and can, to some degree, assist in isomer identification. Figure 5 depicts the workflow that was developed for assigning confidence. The workflow starting with a medium confidence is depicted at the top of the figure and is described below. The workflow starting with a low or high confidence is depicted at the bottom of the figure with the differences from the medium highlighted in green. The first step in the workflow is to determine if the identified compound is a known metabolite. For this study, we performed a literature search for reported CHO cell metabolites and searched the Human Metabolome Database (HMDB; Wishart et al., 2013, 2018) and/or PubChem (Kim et al., 2019) to see if the compound was a reported human metabolite (it was not considered a metabolite if it was on HMDB, but not endogenous). In addition, we searched for lipids using the LIPID MAPS structure database (Sud et al., 2007). If it was found in any of these places, the qualitative confidence level was increased and if not, it was decreased. For the initial confidence of medium, an identification was elevated to high confidence if it was a known metabolite and lowered to low confidence if it was not. The next step is determining if the spectrum is annotated as a known/unknown. For the right side of the workflow, if the spectrum is not a known/unknown, confidence remains low and if it is, confidence is elevated to medium. On the left side of the workflow, if the spectrum is a known/unknown, confidence remains high and if it is not, it is determined if the spectrum is annotated as an artifact/carryover. If the spectrum is an artifact/carryover, then confidence remains high, and if it is not, confidence is lowered to medium. Confidence is only elevated once in the workflow to prevent a match with a low score from being elevated to high confidence. Table S5 shows the 20 most abundant identified ions from the data in Figure 1 and their associated confidence.
Figure 5

Workflow for assigning confidence in MS/MS identifications. Initial confidence level is determined by the match score and initial medium confidence is shown at the top. MS, mass spectrometry [Color figure can be viewed at wileyonlinelibrary.com]

Workflow for assigning confidence in MS/MS identifications. Initial confidence level is determined by the match score and initial medium confidence is shown at the top. MS, mass spectrometry [Color figure can be viewed at wileyonlinelibrary.com]

Automation

One of the goals of this study was to develop tools that could be automated after initial development. The two workflows for annotation of spectra in the library and assignment of a qualitative confidence level for library identifications are amenable to automation via development of software tools. This will drastically increase the speed at which annotation and confidence assignment can occur. In addition, development of software tools for assessing prior probability and tools for automatic detection of spectra that are likely to be originating from the Pluronic F‐68 in the cell culture media will be beneficial. However, expert evaluation of the output of any developed software tools will be required until the methods become routine.

CONCLUSIONS

We have created the first recurrent spectral library for use in identifying CHO cell metabolites and outlined a procedure for future extensions. The library contains metabolites originating from a single CHO cell variety in a single cell culture media and represents the spectra of all compounds repeatedly observed in these samples and can be used as a tool by others in the field to quickly identify compounds in a CHO cell metabolite sample. During this analysis, we have developed a method capable of identifying all components commonly found in the LC‐MS analysis of CHO cell metabolite extracts and media. An extension of this approach is expected to lead to both an automated way to extend this library and to develop similar libraries for other metabolite materials. Finally, we developed a strategy to assign qualitative confidence to NIST MS/MS library identifications. Although methods of representing the confidence of measurement have been developed for reporting individual metabolite identifications, this scheme could not adequately represent the confidence needed to properly annotate the identification made here—many of which cannot be regarded as definitive. The next step for this project will be automation of the workflows and release of the recurrent spectral libraries. The libraries can then be used in metabolomics studies of CHO cell metabolites using LC‐MS/MS analyses.

AUTHOR CONTRIBUTIONS

Kelly H. Telu and Stephen E. Stein contributed intellectually to project conceptualization and experiment design. Renae J. Preston and Lila Kashi grew the CHO cells used in the experiments. Zvi Kelman supervised CHO cell growth. Kelly H. Telu, Ramesh Marupaka, and Nirina R. Andriamaharavo performed the experiments. Kelly H. Telu, Yamil Simón‐Manso, and Yuxue Liang contributed to LC‐MS/MS method development. Yuri A. Mirokhin developed the algorithm Tallat H. Bukhari used to create the recurrent spectral libraries. The manuscript was drafted by Kelly H. Telu, revised by Stephen E. Stein, and then critiqued and approved by all co‐authors. Stephen E. Stein supervised the project. Supporting information. Click here for additional data file. Supporting information. Click here for additional data file. Supporting information. Click here for additional data file. Supporting information. Click here for additional data file. Supporting information. Click here for additional data file.
  44 in total

1.  Structure Annotation of All Mass Spectra in Untargeted Metabolomics.

Authors:  Ivana Blaženović; Tobias Kind; Michael R Sa; Jian Ji; Arpana Vaniya; Benjamin Wancewicz; Bryan S Roberts; Hrvoje Torbašinović; Tack Lee; Sajjan S Mehta; Megan R Showalter; Hosook Song; Jessica Kwok; Dieter Jahn; Jayoung Kim; Oliver Fiehn
Journal:  Anal Chem       Date:  2019-01-16       Impact factor: 6.986

2.  Towards quantitative metabolomics of mammalian cells: development of a metabolite extraction protocol.

Authors:  Stefanie Dietmair; Nicholas E Timmins; Peter P Gray; Lars K Nielsen; Jens O Krömer
Journal:  Anal Biochem       Date:  2010-05-21       Impact factor: 3.365

3.  Creation of libraries of recurring mass spectra from large data sets assisted by a dual-column workflow.

Authors:  W Gary Mallard; N Rabe Andriamaharavo; Yuri A Mirokhin; John M Halket; Stephen E Stein
Journal:  Anal Chem       Date:  2014-10-01       Impact factor: 6.986

Review 4.  Chemical Discovery in the Era of Metabolomics.

Authors:  Miriam Sindelar; Gary J Patti
Journal:  J Am Chem Soc       Date:  2020-05-11       Impact factor: 15.419

Review 5.  Overview of mass spectrometry-based metabolomics: opportunities and challenges.

Authors:  G A Nagana Gowda; Danijel Djukovic
Journal:  Methods Mol Biol       Date:  2014

6.  Hierarchical cluster analysis of technical replicates to identify interferents in untargeted mass spectrometry metabolomics.

Authors:  Lindsay K Caesar; Olav M Kvalheim; Nadja B Cech
Journal:  Anal Chim Acta       Date:  2018-03-19       Impact factor: 6.558

7.  BLANKA: an Algorithm for Blank Subtraction in Mass Spectrometry of Complex Biological Samples.

Authors:  Jessica L Cleary; Gordon T Luu; Emily C Pierce; Rachel J Dutton; Laura M Sanchez
Journal:  J Am Soc Mass Spectrom       Date:  2019-04-16       Impact factor: 3.109

Review 8.  CHO-Omics Review: The Impact of Current and Emerging Technologies on Chinese Hamster Ovary Based Bioproduction.

Authors:  Gino Stolfa; Matthew T Smonskey; Ryan Boniface; Anna-Barbara Hachmann; Paul Gulde; Atul D Joshi; Anson P Pierce; Scott J Jacobia; Andrew Campbell
Journal:  Biotechnol J       Date:  2017-11-15       Impact factor: 4.677

9.  Creating a Mass Spectral Reference Library for Oligosaccharides in Human Milk.

Authors:  Connie A Remoroza; Tytus D Mak; Maria Lorna A De Leoz; Yuri A Mirokhin; Stephen E Stein
Journal:  Anal Chem       Date:  2018-07-17       Impact factor: 6.986

Review 10.  How close are we to complete annotation of metabolomes?

Authors:  Mark R Viant; Irwin J Kurland; Martin R Jones; Warwick B Dunn
Journal:  Curr Opin Chem Biol       Date:  2017-01-21       Impact factor: 8.822

View more
  1 in total

1.  Creation and filtering of a recurrent spectral library of CHO cell metabolites and media components.

Authors:  Kelly H Telu; Ramesh Marupaka; Nirina R Andriamaharavo; Yamil Simón-Manso; Yuxue Liang; Yuri A Mirokhin; Tallat H Bukhari; Renae J Preston; Lila Kashi; Zvi Kelman; Stephen E Stein
Journal:  Biotechnol Bioeng       Date:  2021-02-02       Impact factor: 4.530

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.