Literature DB >> 33404064

Creation and filtering of a recurrent spectral library of CHO cell metabolites and media components.

Kelly H Telu¹, Ramesh Marupaka¹, Nirina R Andriamaharavo¹, Yamil Simón-Manso¹, Yuxue Liang¹, Yuri A Mirokhin¹, Tallat H Bukhari¹, Renae J Preston², Lila Kashi², Zvi Kelman², Stephen E Stein¹.

Abstract

This paper reports the first implementation of a new type of mass spectral library for the analysis of Chinese hamster ovary (CHO) cell metabolites that allows users to quickly identify most compounds in any complex metabolite sample. We also describe an annotation methodology developed to filter out artifacts and low-quality spectra from recurrent unidentified spectra of metabolites. CHO cells are commonly used to produce biological therapeutics. Metabolic profiles of CHO cells and media can be used to monitor process variability and look for markers that discriminate between batches of product. We have created a comprehensive library of both identified and unidentified metabolites derived from CHO cells that can be used in conjunction with tandem mass spectrometry to identify metabolites. In addition, we present a workflow that can be used for assigning confidence to a NIST MS/MS Library search match based on prior probability of general utility. The goal of our work is to annotate and identify (when possible), all liquid chromatography-mass spectrometry generated metabolite ions as well as create automatable library building and identification pipelines for use by others in the field.

© 2021 The Authors. Biotechnology and Bioengineering Published by Wiley Periodicals LLC. This article has been contributed to by US Government employees and their work is in the public domain in the USA.

Entities: Chemical

Keywords: Chinese hamster ovary cells; global metabolite profiling; liquid chromatography-tandem mass spectrometry; nontargeted metabolomics; recurrent unidentified spectra

Mesh：

Substances：

Year: 2021 PMID： 33404064 PMCID： PMC8048470 DOI： 10.1002/bit.27661

Source DB: PubMed Journal: Biotechnol Bioeng ISSN： 0006-3592 Impact factor: 4.530

INTRODUCTION

Chinese hamster ovary (CHO) cells are the predominant host cells for monoclonal antibody (mAb) production (Kunert & Reinhart, 2016). Metabolomics provides information on cellular phenotypes. Several metabolites have been demonstrated to be biomarkers of CHO cell status (Mohmad‐Saberi et al., 2013). Metabolomic analysis of CHO cells has primarily been used in process or media/feed development and has predominantly focused on targeted metabolite analysis of major metabolites, although there are several studies that utilized global metabolite analysis (Stolfa et al., 2018). A comprehensive assessment of CHO cell metabolic profiles could lead to improvements in product yield and quality by providing further understanding of the CHO cell metabolome (Stolfa et al., 2018). Mass spectral libraries have been extremely popular for more than 40 years for identifying volatile chemical compounds using gas chromatography‐mass spectrometry (GC‐MS). They are used to locate the most similar spectra in the reference library and present the compounds that generated them in a “hit list” sorted by their similarity to the acquired spectrum (S. Stein, 2012). Liquid chromatography‐mass spectrometry (LC‐MS) is a widely practiced method for identifying the chemical components in metabolomics (Gowda & Djukovic, 2014). For confident metabolite identifications, liquid chromatography‐tandem mass spectrometry (LC‐MS/MS) can be performed and the fragmentation pattern can be compared to a MS/MS spectral library. Commercial MS/MS libraries that contain curated spectra (the NIST Tandem [MS/MS] Mass Spectral Library and the Wiley MSforID Library) as well as free libraries that facilitate data sharing (MassBank, MassBank of North America [MoNA], LipidBlast, METLIN, mzCloud, GNPS, etc.) are available and have been reviewed recently (Kind et al., 2018). These libraries contain experimental spectra of known compounds and spectra of unidentified compounds are not documented there. Other libraries such as LipidBlast, Greazy/LipidLama, CFM‐ID, and so forth are based on in silico prediction of the spectra of known or predicted metabolites (Kind et al., 2018). A comprehensive library of both known and unidentified CHO cell metabolites will be beneficial to the field of CHO cell metabolite analysis. In addition to producing the NIST MS/MS Library, the NIST Mass Spectrometry Data Center (MSDC) has recently begun creating material‐oriented libraries that are generated from the analysis of complex mixtures such as human plasma and urine (https://chemdata.nist.gov/dokuwiki/doku.php?id = chemdata:arus) to address the issue of unknown metabolites (metabolites not identified by library searching), identify cross‐platform metabolite signatures, and catalogue all spectra associated with a particular material of interest (Mallard et al., 2014; Remoroza et al., 2018; Simon‐Manso et al., 2013; Simon‐Manso et al., 2019; S. Stein, 2012; Telu et al., 2016). These material‐oriented libraries contain recurrent spectra (spectra that occur repeatedly in the sample) for all detectable metabolites, both known and unknown that are processed to produce high‐quality consensus spectra for the library. The MSDC has also created spectral libraries (Dong et al., 2018; Dong, Yan, Liang, & Stein, 2016) of the NISTmAb, a humanized IgG1κ Monoclonal Antibody Reference Material (RM 8671; https://www.nist.gov/programs-projects/nist-monoclonal-antibody-reference-material-8671). The use of tandem mass spectral libraries in biomedical and biomanufacturing applications has been very limited until recently with the development of omics technologies. To date, there are no reports of libraries being used for optimizing biomanufacturing processes and very little for discovering new metabolic pathways. Here, we implemented recurrent spectral libraries for use in CHO cell metabolite analysis that allows users to quickly identify most compounds in any complex metabolite sample. We also developed an annotation strategy for these libraries to filter out artifacts and low‐quality spectra from recurrent unidentified spectra of metabolites. These libraries are focused on metabolite analysis, however, small peptides that extract along with the metabolites are also present. Furthermore, the limited coverage of tandem libraries is somewhat ameliorated by the use of the recently developed hybrid search (Burke et al., 2017; Cooper et al., 2019; Moorthy et al., 2017), which can identify compounds similar to, but not present in the library. The recurrent spectral library is unique in that it can be used to determine if an ion has been seen before in other analyses, assign the class identification for compounds not found in a library or commercially available, and enables library evolution based upon feedback from users. As more experiments are done, the library can continue to grow in coverage. The library and the associated metabolite identifications are freely available for download for use in the analysis of CHO cell metabolites by LC‐MS/MS. Although this study was demonstrated in CHO cells, the developed methods for filtering spectra and assigning match confidence can be applied to not only other cell types, but also other metabolomics studies. In addition, work is currently underway at NIST to create a metabolite identification pipeline and graphical user interface (GUI) that those in the biomanufacturing community can use to implement their own libraries.

EXPERIMENTAL METHODS*

For the coverage of metabolites to be broad, CHO cells were extracted by four different methods available in the literature: (1) 50% acetonitrile in water (Dietmair et al., 2012), (2) Methanol (Dietmair, Timmins, Gray, Nielsen, & Kromer, 2010; Sellick et al., 2011), (3) methanol/methyl tert‐butyl ether(MTBE)/water, and (4) methanol/dichloromethane(DCM)/water (Matyash et al., 2008). Metabolites were separated with three different LC methods (reversed‐phase [C18], hydrophilic interaction liquid chromatography [HILIC] and a reversed‐phase method optimized for lipids [lipid C18]), and analyzed in positive and negative ionization mode with both higher‐energy C‐trap dissociation (HCD**) over a range of collision energies and ion trap (IT) collision‐induced dissociation. Media samples (fresh and spent) were resuspended in two different solvents (50% acetonitrile or pure methanol) after protein precipitation, separated with two different LC methods (C18 and HILIC), and analyzed with the same breadth of methods as the CHO cell metabolites.

Sample preparation

CHO‐S cells (Thermo Fisher Scientific) were grown in ProCHO5 protein‐free medium (Lonza) supplemented with 4 mmol/L L‐glutamine (Thermo Fisher Scientific). CHO cells and spent media were harvested and metabolite extractions were performed. Protein precipitation was performed on the media with 80% (vol/vol) methanol. After drying and before analysis, media samples were resuspended in either pure methanol or 50% acetonitrile (vol/vol). Metabolites were extracted by four different methods: 50% acetonitrile in water, methanol, methanol/methyl tert‐butyl ether (MTBE)/water, and methanol/dichloromethane (DCM)/water. Additional details regarding sample preparation can be found in the supporting information.

LC‐MS/MS analysis

The metabolites were separated by three different liquid chromatography methods. Extracts containing polar metabolites (50% acetonitrile, methanol, lower phase for the methanol/MTBE/water extraction, and upper phase for the methanol/DCM/water extraction) were separated by both C18 and HILIC. The organic phases of the two lipid extractions were separated by a lipid C18 method. Fresh and spent media samples were separated by C18 and HILIC. These separations were coupled to either a Q Exactive or Orbitrap Fusion Lumos (Thermo Fisher Scientific). The data were collected in positive and negative ionization mode with data‐dependent MS/MS acquisition. To provide as many spectra as possible for the library, HCD spectra were collected over a range of normalized collision energies from 10 to 50 using nitrogen as the collision gas. In addition, low‐resolution IT and high‐resolution IT spectra were acquired on the Lumos at a normalized collision energy of 35% using helium as the collision gas. The collision gases used were those recommended by the equipment manufacturer. Additional details regarding analysis can be found in the supporting information.

Data analysis

Data were analyzed to produce recurrent spectral libraries as reported previously (Telu et al., 2016). Briefly, all data were processed with the NIST MSCQ pipeline (see below under “Annotation of Spectra” for a description of the pipeline). Recurrent spectra were exported from the output of the pipeline with a perfect score cutoff (1.0) to ensure all spectra (even identified ones) were included. Following this, consensus spectra were created from the experimental data using in‐house developed software after grouping the data by polarity, fragmentation type (HCD or IT), and collision energy. The similarity of the spectra was based on precursor and the dot‐product (Yang et al., 2014). Only similar spectra (a cluster) were used to create the consensus spectrum. Spectra dissimilar to the given cluster were placed in another cluster or, if unique, were ignored. After the libraries were created, the consensus spectra were searched against the NIST17 Library to obtain metabolite identifications. In addition, an annotation strategy was developed following manual evaluation of a representative data file. The data file analyzed was a 50% acetonitrile extraction that was separated on a C18 column and fragmented at HCD 20. The file was searched against the NIST17 Library with the NIST MSPepSearch software to provide tandem mass spectral library identifications as discussed below.

RESULTS AND DISCUSSION

Identification of metabolites

The first goal of this study was to collect, organize, and to the degree possible, identify all measurable tandem mass spectra in CHO cell metabolite and growth media extracts acquired using electrospray LC‐MS/MS methods. To do this, we developed an HCD and IT fragmentation spectral library containing consensus spectra in both positive and negative ionization mode using a spectral clustering method developed in‐house. The libraries contain data from both CHO cell metabolite analyses as well as media analyses and are annotated to show the origin of the spectra. In addition to metabolites, peptides that are co‐extracted are also present in the libraries, although these are not the focus of the work. The resulting HCD recurrent spectral libraries contain 109,601 and 61,677 spectra for the positive and negative ionization mode libraries, respectively. The IT libraries contain 15,703 and 12,499 spectra for the positive and negative ionization mode libraries, respectively. IT spectra are similar to low energy HCD spectra, except for their low mass cut‐off at about one‐third of the precursor mass and their higher degree of fragmentation at these low energies; IT fragment ions are therefore more intense than low energy HCD spectra. Note that low energy spectra are generally easier to interpret than higher energy spectra due to their simpler mechanisms. Additional information about the libraries, including collision energies, precursor ion types, and source (CHO cell, media, or both) of the consensus spectra can be found in the supporting information. The results of CHO cell metabolite and media analyses are highly orthogonal as only 8%–13% of the consensus spectra in the libraries originate from both samples. The overlap would likely be higher if a chemically defined media was used. To identify spectra, we searched the consensus spectra generated for the recurrent spectral libraries against the NIST17 MS/MS library (Yang et al., 2014; Yang et al., 2017). To compare our results with those previously published in the literature, the CHO cell metabolite identifications were summarized and compared to a literature review of CHO cell metabolite identifications. To summarize the identifications, we sorted the library match identifications by name and library match score. We kept only the top‐scoring hit of each identification and then manually validated the library match result. Any poor matches were removed. In addition, we curated the data to remove identifications that are not previously observed as endogenous metabolites by searching for the identification in the Human Metabolome Database (HMDB) (Wishart et al., 2013, 2018), PubChem (Kim et al., 2019), or the LIPID MAPS structure database (Sud et al., 2007) as no comprehensive CHO cell metabolite library is available. If there was no information on if an identification was a metabolite, it was not removed. Spreadsheet 1 of supporting information contains all the library match identifications and can be mined for new or unexpected metabolites by experts in CHO cell metabolism. Our curated list resulted in 365 CHO cell metabolites (the majority identified by multiple ions or in multiple libraries) and an additional 304 di‐ or tri‐peptides. We split out the peptides into a separate list because they are likely less interesting than other metabolites. Metabolites identified are reported in Table 1. A literature search resulted in a list of 232 metabolites. Identifications made by HPLC, GC‐MS, MALDI‐MS, and LC‐MS were included. Of these 232 reported metabolites, we identified 43% in our data. For ones that were not identified, the majority (66%) were represented in the NIST17 library, but not identified in our experiments, possibly because they were below the detection limit. The remaining literature identifications not present in the NIST17 library that are compatible with analysis by LC‐MS can be added to future versions of the NIST MS/MS library. Lists of identified metabolites summarized from our data as well as the literature review can be found in Spreadsheets 2 and 3 of the supporting information, respectively. These spreadsheets also contain information demonstrating the percentages reported herein.

Table 1

Compounds identified in the Recurrent Spectral Library created from CHO cell metabolite extracts

Metabolite	Library	PubChem ID
10Z‐Nonadecenoic acid	HCD‐Pos	5312513
1‐Methylnicotinamide	HCD‐Pos	457
1‐Methylxanthine	HCD‐Pos	80220
2,3‐Dehydro‐2‐deoxy‐N‐acetylneuraminic acid	HCD‐Pos, IT‐Pos	65309
2,3‐Diaminopropionic acid	HCD‐Pos	364
2‐Arachidonyl glycerol ether	HCD‐Pos	6483057
2'‐Deoxyguanosine 5'‐monophosphate	HCD‐Neg, IT‐Neg	645
2‐hydroxy‐2‐(4‐hydroxy‐3‐methoxyphenyl)acetic acid	HCD‐Pos	1245
2‐Hydroxyhexadecanoic acid	HCD‐Neg	92836
2‐Hydroxyphenethylamine	HCD‐Pos	1000
2‐Methylbutyrylcarnitine	HCD‐Pos	6426901
2‐Methylhippuric acid	HCD‐Pos	91637
2'‐O‐Methyladenosine	HCD‐Neg, IT‐Neg	102213
2‐Phospho‐d‐glyceric acid	HCD‐Neg, IT‐Neg	59
3,4‐Dihydroxymandelic acid	HCD‐Pos	85782
3'‐AMP	HCD‐Neg, HCD‐Pos	41211
3'‐CMP	HCD‐Neg, HCD‐Pos, IT‐Pos	66535
3‐Deoxy‐d‐glycero‐d‐galacto‐2‐nonulosonic acid	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	123691
3‐Hexenedioic acid	HCD‐Pos	107550
3‐Oxoglutaric acid	HCD‐Pos	68328
3‐Phosphoglycerate	HCD‐Neg	724
3‐Sialyl‐N‐acetyllactosamine	HCD‐Neg, HCD‐Pos, IT‐Neg	4150746
4‐Coumaryl alcohol	HCD‐Pos	5280535
4‐Hydroxybutyric acid	HCD‐Neg	10413
4‐Hydroxyglutamic acid	HCD‐Neg	439902
5α‐cholest‐7‐en‐3β‐ol	HCD‐Pos, IT‐Pos	420
5‐Aminovaleric acid	HCD‐Pos	138
5‐Hydroxyindole	HCD‐Pos	16054
5‐Hydroxylysine	HCD‐Pos	439437
5'‐Methylthioadenosine	HCD‐Pos, IT‐Pos	149
5‐Phosphonatoribosyl 1‐pyrophosphate	HCD‐Neg	1041
5‐Thymidylic acid	HCD‐Neg, IT‐Neg	1139
6‐Phosphogluconic acid	HCD‐Pos	91493
7‐Ketocholesterol	HCD‐Pos	91474
7‐Methylguanine	HCD‐Pos	135398679
7‐Methylguanosine	HCD‐Pos	135445750
9,10‐Epoxyoctadecenoic acid	HCD‐Pos	5283018
Acetylcholine	HCD‐Pos	187
Acetyl‐CoA	HCD‐Neg, HCD‐Pos, IT‐Neg	6302
Adenine	HCD‐Pos	190
Adenosine	HCD‐Pos	60961
Adenosine 2',3'‐cyclic phosphate	HCD‐Neg, HCD‐Pos, IT‐Pos	2024
Adenosine 2'‐phosphate	HCD‐Neg, HCD‐Pos	94136
Adenosine diphosphate ribose	HCD‐Neg, HCD‐Pos, IT‐Pos	30243
Adenosine monophosphate	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	6083
Adenosine phosphosulfate	HCD‐Neg	10238
Adenosine triphosphate	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	5957
Adenylsuccinic acid	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	440122
ADP	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	6022
Agmatine	HCD‐Pos	199
α‐d‐Glucose 1,6‐bisphosphate	HCD‐Neg	82400
α‐Ionone	HCD‐Pos	24680
α‐Ketoisovaleric acid	IT‐Neg	49
Aminoadipic acid	HCD‐Pos	469
Arabinonic acid	HCD‐Neg, IT‐Neg	122045
Aspartylglycosamine	HCD‐Pos	123826
Asymmetric dimethylarginine	HCD‐Pos, IT‐Pos	123831
β‐Carboline	HCD‐Pos	64961
β‐Glycerophosphoric acid	HCD‐Neg, IT‐Neg	2526
Betaine	HCD‐Pos	247
Biopterin	HCD‐Pos	135403659
But‐2‐enoic acid	HCD‐Neg	637090
Carnosine	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	439224
CDP	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	6132
Cer(d18:1/24:1(15Z))	HCD‐Pos, IT‐Pos	5283568
Cholest‐5‐en‐3‐one	HCD‐Pos	9908107
Cholesta‐4,6‐dien‐3‐one	HCD‐Pos, IT‐Pos	3034666
Cholesterol	HCD‐Pos	5997
Choline	HCD‐Pos, IT‐Pos	305
cis‐Aconitic acid	HCD‐Neg	309
cis‐Vaccenic acid	HCD‐Pos	5282761
Citicoline	HCD‐Neg, HCD‐Pos, IT‐Pos	13804
Citraconic acid	HCD‐Neg	643798
Citric acid	HCD‐Neg, IT‐Neg	311
Citrulline	HCD‐Neg, HCD‐Pos	833
Coenzyme A	HCD‐Neg	87642
Coenzyme Q9	HCD‐Pos, IT‐Pos	5280473
Cyclic ADP‐ribose	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	123847
Cyclic AMP	HCD‐Neg	6076
Cytidine	HCD‐Neg, HCD‐Pos	6175
Cytidine 5'‐diphosphate ethanolamine	HCD‐Neg, HCD‐Pos, IT‐Neg	123727
Cytidine monophosphate	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	6131
Cytidine monophosphate N‐acetylneuraminic acid	HCD‐Neg	448209
Cytidine triphosphate	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	6176
Cytosine	HCD‐Pos	597
dCDP	HCD‐Neg, IT‐Neg	150855
dCMP	HCD‐Neg, HCD‐Pos, IT‐Neg	13945
Deoxyadenosine monophosphate	HCD‐Neg	12599
Deoxycytidine	HCD‐Pos	13711
Deoxyinosine	HCD‐Pos, IT‐Pos	135398593
Dephospho‐CoA	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	444485
d‐Erythrose	HCD‐Neg	94176
d‐Fructose	HCD‐Neg	5984
DG(14:0/14:0/0:0)	HCD‐Pos	10369168
DG(16:0/16:0/0:0)	HCD‐Pos	644078
DG(16:0/18:1(9Z)/0:0)	HCD‐Pos, IT‐Pos	5282283
DG(18:1(9Z)/18:1(9Z)/0:0)	HCD‐Pos, IT‐Pos	9543716
d‐Galactose	HCD‐Neg, IT‐Neg	6036
d‐Glucaro‐1,4‐lactone	HCD‐Neg, IT‐Neg	122306
d‐Glucose	HCD‐Neg	5793
d‐Glucuronic acid	HCD‐Neg	94715
Diadenosine triphosphate	HCD‐Neg	165381
Dihydrobiopterin	HCD‐Pos	135402011
d‐Malic acid	HCD‐Neg, IT‐Neg	525
d‐Maltose	HCD‐Neg, IT‐Pos	294
d‐Mannose 1‐phosphate	HCD‐Neg, IT‐Neg	644175
d‐Ornithine	HCD‐Pos	71082
d‐Phenyllactic acid	HCD‐Neg	643327
d‐Pipecolinic acid	HCD‐Pos	736316
ε‐caprolactam	HCD‐Pos, IT‐Pos	7768
Erucamide	HCD‐Pos, IT‐Pos	5365371
Erucic acid	HCD‐Pos, IT‐Pos	8216
FAPy‐adenine	HCD‐Pos	114926
Flavin mononucleotide	HCD‐Pos, IT‐Pos	710
Folic acid	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	135398658
Fructose 1,6‐bisphosphate	HCD‐Neg, IT‐Neg	10267
Fructose‐6‐phosphate	HCD‐Neg	69507
Galactaric acid	HCD‐Neg	3037582
Galactinol	HCD‐Neg, IT‐Neg	11727586
Galactitol	HCD‐Neg	11850
Galactonic acid	HCD‐Neg	128869
Galactose 1‐phosphate	HCD‐Neg, IT‐Neg	123912
Galactosylsphingosine	HCD‐Pos	5280458
Galβ1,3GlcNAc	HCD‐Pos	440994
ϒ‐Aminobutyric acid	HCD‐Pos	119
ϒ‐Glutamylglutamic acid	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	92865
GDP‐glucose	HCD‐Pos	135398625
GDP‐l‐fucose	HCD‐Neg	135398655
Glucaric acid	HCD‐Neg, IT‐Neg	33037
Glucose 1‐phosphate	HCD‐Neg	65533
Glucose 6‐phosphate	HCD‐Neg, HCD‐Pos, IT‐Neg	5958
Glutathione	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	124886
Glyceraldehyde 3‐phosphate	IT‐Neg	729
Glyceric acid	HCD‐Neg	752
Glycerophosphocholine	HCD‐Pos, IT‐Pos	71920
Glyceryl monooleate	HCD‐Pos	33022
Guanidinosuccinic acid	HCD‐Neg, HCD‐Pos	97856
Guanine	HCD‐Neg, HCD‐Pos	135398634
Guanosine	HCD‐Neg, HCD‐Pos, IT‐Neg	135398635
Guanosine diphosphate	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	135398619
Guanosine diphosphate mannose	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	135398627
Guanosine monophosphate	HCD‐Neg, HCD‐Pos, IT‐Pos	135398631
Guanosine triphosphate	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	135398633
Helicin	HCD‐Pos	101799
Hexadecanedioic acid	HCD‐Pos	10459
Hydroxyphenyllactic acid	HCD‐Neg, HCD‐Pos, IT‐Neg	9378
Hypoxanthine	HCD‐Pos	135398638
Indole‐3‐carboxylic acid	HCD‐Pos	69867
Indolelactic acid	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	92904
Indolepyruvate	HCD‐Pos	803
Inosine	HCD‐Pos	135398641
Inosinic acid	HCD‐Neg, HCD‐Pos, IT‐Neg	135398640
Inositol 1,3,4‐trisphosphate	HCD‐Neg	123680
Inositol 1,3‐bisphosphate	HCD‐Neg, IT‐Neg	128419
Inositol 1,4,5‐trisphosphate	HCD‐Pos	55310
Inositol 1,4‐bisphosphate	HCD‐Neg, HCD‐Pos	123903
Inositol 1‐phosphate	IT‐Pos	107737
Inositol 3‐phosphate	HCD‐Pos	440194
Inositol 4‐phosphate	HCD‐Neg	440043
Isobutyryl‐l‐carnitine	HCD‐Pos	168379
Isocitric acid	HCD‐Neg	1198
Isomaltose	HCD‐Neg	872
Isovaleryl coenzyme A	HCD‐Neg	165435
Isovaleryl‐l‐carnitine	HCD‐Pos	169235
Ketoleucine	HCD‐Neg	70
l2‐Hydroxyglutaric acid	HCD‐Neg	43
Lacto‐N‐triaose	HCD‐Pos	53477860
Lactose	HCD‐Neg	6134
l‐Arginine	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	6322
l‐Asparagine	HCD‐Neg, HCD‐Pos	6267
l‐Aspartic acid	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	424
l‐Carnitine	HCD‐Pos, IT‐Pos	10917
l‐Cystathionine	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	834
l‐Cystine	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	67678
l‐Erythrulose	HCD‐Neg	162406
Lewis A trisaccharide	HCD‐Pos	4139998
Lewis X trisaccharide	HCD‐Pos, IT‐Pos	4571095
l‐Glutamic acid	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	33032
l‐Glutamine	HCD‐Neg, HCD‐Pos	5961
l‐Gulonolactone	HCD‐Neg, IT‐Neg	439373
l‐Histidine	HCD‐Neg, HCD‐Pos	6274
l‐Homoserine	HCD‐Pos, IT‐Pos	12647
l‐Iditol	HCD‐Pos	5460044
l‐Isoleucine	HCD‐Pos	6306
l‐Kynurenine	HCD‐Pos	161166
l‐Leucine	HCD‐Neg, HCD‐Pos, IT‐Neg	6106
l‐Lysine	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	5962
l‐Methionine	HCD‐Pos, IT‐Pos	6137
l‐Phenylalanine	HCD‐Neg, HCD‐Pos, IT‐Pos	6140
l‐Proline	HCD‐Neg, HCD‐Pos	145742
l‐Serine	HCD‐Neg, HCD‐Pos, IT‐Pos	5951
l‐Threonine	HCD‐Neg, HCD‐Pos, IT‐Pos	6288
l‐Tryptophan	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	6305
l‐Tyrosine	HCD‐Neg, HCD‐Pos, IT‐Pos	6057
l‐Valine	HCD‐Pos	6287
Maltotetraose	HCD‐Neg, HCD‐Pos, IT‐Pos	870
Maltotriose	HCD‐Neg, IT‐Neg	92146
Mannose 6‐phosphate	HCD‐Neg, HCD‐Pos, IT‐Neg	65127
Melibiose	HCD‐Neg	219994
Methionine sulfoxide	HCD‐Neg, HCD‐Pos, IT‐Pos	158980
MG(0:0/16:0/0:0)	HCD‐Pos, IT‐Pos	123409
myo‐Inositol	HCD‐Neg	892
N8‐Acetylspermidine	HCD‐Pos, IT‐Pos	123689
N‐Acetyl‐d‐galactosamine	HCD‐Pos	35717
N‐Acetyl‐d‐glucosamine	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	439174
N‐Acetyl‐d‐glucosamine 6‐phosphate	HCD‐Neg	439219
N‐Acetyl‐d‐lactosamine	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	9800166
N‐Acetyl‐l‐aspartic acid	HCD‐Pos	65065
N‐Acetyl‐l‐carnosine	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	9903482
N‐Acetyl‐l‐glutamic acid	HCD‐Neg	70914
N‐Acetyl‐l‐glutamine	HCD‐Pos, IT‐Pos	182230
N‐Acetyl‐l‐methionine	HCD‐Neg, IT‐Neg	448580
N‐Acetyl‐l‐phenylalanine	HCD‐Neg, HCD‐Pos, IT‐Pos	74839
N‐Acetyl‐l‐tyrosine	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	68310
N‐Acetylmannosamine	HCD‐Pos, IT‐Pos	65150
N‐Acetylneuraminic acid	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	906
NAD	HCD‐Neg, HCD‐Pos, IT‐Pos	925
NADH	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	439153
NADP	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	4412
N‐alpha‐Acetyl‐l‐ornithine	HCD‐Pos	907
N‐Formyl‐l‐methionine	HCD‐Neg, IT‐Neg	439750
N‐Glycolylneuraminic acid	HCD‐Neg, IT‐Neg	123802
Niacinamide	HCD‐Pos, IT‐Pos	936
Nicotinamide riboside	HCD‐Pos	439924
Nicotinamide ribotide	HCD‐Pos	16219737
Nicotinic acid adenine dinucleotide	HCD‐Neg	165490
Nicotinic acid mononucleotide	HCD‐Neg, IT‐Neg	5288991
N‐Methyl‐l‐glutamic acid	HCD‐Pos	439377
N‐Methyllysine	HCD‐Pos	164795
N‐Methyltyramine	HCD‐Pos	9727
N‐Palmitoyl‐d‐sphingosine	HCD‐Pos, IT‐Pos	5353456
Oleamide	HCD‐Pos	5283387
Oleic acid	HCD‐Pos	445639
Oleoyl glycine	HCD‐Pos	6436908
Oleoyl serine	HCD‐Neg, HCD‐Pos, IT‐Pos	44190514
O‐Phosphotyrosine	IT‐Pos	30819
Orotic acid	HCD‐Neg	967
O‐Tyrosine	HCD‐Pos	91482
Oxidized glutathione	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	65359
PA(16:0/16:0)	HCD‐Neg	3099
PA(16:0/18:1(9Z))	HCD‐Neg	5283523
PA(18:1/0:0)	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	5311263
Palmitic amide	HCD‐Pos	69421
Palmitoyl ethanolamide	HCD‐Pos	4671
Palmitoyl sphingomyelin	HCD‐Neg, HCD‐Pos, IT‐Pos	9939941
Pantothenic acid	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	6613
Paullinic acid	HCD‐Pos	5312518
PC(14:0/0:0)	HCD‐Neg, HCD‐Pos, IT‐Pos	460604
PC(14:0/14:0)	HCD‐Neg, HCD‐Pos, IT‐Neg	5459377
PC(14:0/16:0)	HCD‐Neg, HCD‐Pos	129657
PC(14:0/18:0)	HCD‐Pos	131150
PC(15:0/15:0)	HCD‐Pos, IT‐Neg	24778654
PC(16:0/0:0)	HCD‐Pos, IT‐Pos	460602
PC(16:0/12:0)	HCD‐Pos	10676014
PC(16:0/14:0)	HCD‐Neg, HCD‐Pos, IT‐Neg	24778679
PC(16:0/16:0)	HCD‐Pos	452110
PC(16:0/18:1(9Z))	HCD‐Neg, HCD‐Pos, IT‐Pos	5497103
PC(16:0/18:2(9Z,12Z))	HCD‐Pos	5287971
PC(16:1(9Z)/16:1(9Z))	HCD‐Pos	24778764
PC(18:0/0:0)	HCD‐Neg, HCD‐Pos	497299
PC(18:0/14:0)	HCD‐Pos	3082163
PC(18:0/18:0)	HCD‐Pos	94190
PC(18:0/18:1(9Z))	HCD‐Neg	24778825
PC(18:0/18:2(9Z,12Z))	HCD‐Pos	6441487
PC(18:1(9Z)/0:0)	HCD‐Neg, HCD‐Pos, IT‐Pos	16081932
PC(18:1(9Z)/14:0)	HCD‐Pos, IT‐Neg	24778931
PC(18:1(9Z)/16:0)	HCD‐Neg, HCD‐Pos	24778933
PC(20:1(11Z)/20:1(11Z))	HCD‐Pos	24779063
PC(22:0/0:0)	HCD‐Pos	24779479
PC(22:1(13Z)/22:1(13Z))	HCD‐Pos	24779126
PC(24:0/0:0)	HCD‐Pos	24779481
PC(O‐16:0/0:0)	HCD‐Pos, IT‐Pos	162126
PC(O‐16:0/18:1(9Z))	HCD‐Pos	24779266
PC(O‐16:0/2:0)	HCD‐Pos	108156
PC(O‐16:0/20:3(8Z,11Z,14Z))	HCD‐Pos	16759365
PC(O‐18:0/0:0)	HCD‐Pos	2733532
PC(P‐16:0/0:0)	HCD‐Pos	10917802
PC(P‐18:0/0:0)	HCD‐Neg, HCD‐Pos	24779527
PC(P‐18:0/18:1(9Z))	HCD‐Pos	42607428
PE(14:0/0:0)	HCD‐Neg, HCD‐Pos, IT‐Neg	9547070
PE(16:0/0:0)	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	9547069
PE(16:0/16:0)	HCD‐Neg, IT‐Neg	445468
PE(16:0/18:1(9Z))	HCD‐Pos	5283496
PE(16:0/18:2(9Z,12Z))	HCD‐Pos	9546747
PE(18:0/0:0)	HCD‐Neg, HCD‐Pos, IT‐Pos	9547068
PE(18:0/18:1(9Z))	HCD‐Pos	9546742
PE(18:0/18:2(9Z,12Z))	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	9546749
PE(18:1(9Z)/0:0)	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	9547071
PE(18:1(9Z)/18:1(9Z))	HCD‐Pos	9546757
PE(O‐16:0/18:1(9Z))	HCD‐Neg, HCD‐Pos	42607455
PE(P‐18:0/18:1(9Z))	HCD‐Neg	42607457
PE(P‐18:0/22:6(4Z,7Z,10Z,13Z,16Z,19Z))	HCD‐Neg, IT‐Neg	42607458
PE‐NMe2(18:1(9Z)/18:1(9Z))	HCD‐Neg, IT‐Neg	9547022
PG(18:0/18:1)	HCD‐Neg, IT‐Neg	24779551
Phenylacetic acid	HCD‐Neg	999
Phenylacetylglutamine	HCD‐Neg, HCD‐Pos, IT‐Pos	92258
Phosphoadenosine phosphosulfate	HCD‐Neg, IT‐Neg	10214
Phosphorylcholine	HCD‐Pos, IT‐Pos	135437
Phosphoserine	HCD‐Neg	106
PI(16:0/18:1(9Z))	HCD‐Neg, IT‐Neg	5771758
Pip(18:1(9Z)/18:1(9Z))	HCD‐Neg	53480169
p‐Octopamine	HCD‐Pos	4581
Proline betaine	IT‐Pos	115244
PS(16:0/18:1(9Z))	HCD‐Neg, HCD‐Pos, IT‐Neg	5283499
PS(16:0/20:4)	HCD‐Neg	24779544
PS(18:0/18:0)	HCD‐Neg	9547096
PS(18:0/18:1(9Z))	HCD‐Neg, HCD‐Pos, IT‐Neg	9547087
PS(18:0/20:4(5Z,8Z,11Z,14Z))	HCD‐Neg	24779545
PS(18:1(9Z)/18:1(9Z))	HCD‐Neg, HCD‐Pos, IT‐Neg	6438639
Pterin	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	73000
Pyridoxal	HCD‐Pos, IT‐Pos	1050
Pyridoxal 5'‐phosphate	HCD‐Neg, HCD‐Pos	1051
Pyridoxamine	HCD‐Pos, IT‐Pos	1052
Pyroglutamic acid	HCD‐Neg, HCD‐Pos, IT‐Pos	7405
Raffinose	HCD‐Neg, HCD‐Pos	10542
Ribitol	HCD‐Neg	6912
Riboflavin	HCD‐Neg, HCD‐Pos, IT‐Pos	493570
Ribono‐ϒ‐lactone	HCD‐Neg	111064
Ribose 1‐phosphate	HCD‐Neg, IT‐Neg	1074
Ribose 5‐phosphate	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	77982
Ribulose 5‐phosphate	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	439184
S‐Adenosylhomocysteine	HCD‐Pos, IT‐Pos	13792
S‐Adenosylmethionine	HCD‐Pos, IT‐Pos	34755
Sebacic acid	HCD‐Neg, IT‐Neg	5192
Sedoheptulosan	HCD‐Neg	5460956
Serotonin	HCD‐Pos	5202
S‐Glutathionyl‐l‐cysteine	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	10455148
SM(d18:1/18:0)	HCD‐Pos	6453725
SM(d18:1/18:1(9Z))	HCD‐Pos	6443882
SM(d18:1/24:1(15Z))	HCD‐Pos	53481791
Sorbitol	HCD‐Neg, IT‐Neg	5780
Spermine	HCD‐Pos	1103
Stachyose	HCD‐Pos, IT‐Pos	439531
Stearoyl ethanolamide	HCD‐Pos	27902
Succinic acid	HCD‐Neg, IT‐Neg	1110
Succinic acid semialdehyde	HCD‐Neg	1112
Sucrose	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	5988
Tetradecanoyl‐CoA	HCD‐Neg	11966124
Thiamine	HCD‐Pos, IT‐Pos	1130
Thiamine monophosphate	HCD‐Pos	3382778
Thiamine pyrophosphate	HCD‐Neg, HCD‐Pos, IT‐Pos	5431
Threonic acid	HCD‐Neg, IT‐Neg	151152
trans‐13‐Octadecenoic acid	HCD‐Pos	6161490
trans‐Vaccenic acid	HCD‐Pos	5281127
Trehalose	HCD‐Neg	7427
Trigonelline	HCD‐Pos	5570
Triolein	HCD‐Pos, IT‐Pos	5497163
Tripalmitolein	HCD‐Pos	9543989
Ubiquinone‐1	HCD‐Pos	4462
Undecanedioic acid	HCD‐Neg, IT‐Neg	15816
Uracil	HCD‐Pos	1174
Uric acid	HCD‐Neg, HCD‐Pos, IT‐Neg	1175
Uridine	HCD‐Pos	6029
Uridine 5'‐diphosphate	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	6031
Uridine 5'‐monophosphate	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	6030
Uridine 5'‐triphosphate	HCD‐Neg, HCD‐Pos, IT‐Neg	6133
Uridine diphosphate glucose	HCD‐Pos, IT‐Pos	8629
Uridine diphosphate glucuronic acid	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	17473
Uridine diphosphategalactose	HCD‐Neg, HCD‐Pos, IT‐Neg, IT‐Pos	6857410
Urocanic acid	HCD‐Pos	736715
Xanthine	HCD‐Pos, IT‐Pos	1188
Xanthosine	HCD‐Pos	64959
Xanthylic acid	HCD‐Neg, HCD‐Pos, IT‐Pos	73323
Xylulose 5‐phosphate	HCD‐Neg, HCD‐Pos, IT‐Pos	439190
Zymosterol	HCD‐Pos	92746

Compounds identified in the Recurrent Spectral Library created from CHO cell metabolite extracts

Improvement of accuracy of pipeline identifications

We developed a procedure to improve the accuracy of identifications obtained using the NIST MSQC pipeline by modifying the order of identifications in a hit list. The NIST pipeline, by default, sorts hits entirely by their score which reflects the quality of the spectral match between the experimental and library spectra. We identified four categories of errors in identification. For clarity, we labeled these as category A–D errors. Additional information on the errors, examples, and solutions for these errors can be found in the supporting information.

Hybrid search

To discover the identity of compounds not represented in the library, a hybrid search was performed. The hybrid search match is a new search strategy available in the 2017 release of NIST MS Search software (version 2.3) (Burke et al., 2017; Cooper et al., 2019; Moorthy et al., 2017). This search finds compounds that differ by an inert chemical group, hence, can often match unidentified spectra with members of the same chemical classes that are present in the library. The term delta mass is used to represent the difference in mass between the query spectrum and library entry. An example of a hybrid match in the CHO cell metabolite data is for the match of a spectrum (ion m/z = 472.0011) to a sodiated Adenosine 5'‐diphosphate library spectrum with a delta mass of 21.9824 Da. This delta mass corresponds to a sodium, so the correct annotation of this ion is adenosine 5'‐diphosphate [M‐H+2Na]+. The hybrid search was also utilized to assist in the identification of two groups of related spectra. Information on these identifications can be found in the supporting information.

Utility of recurrent spectral libraries

There are multiple metabolomics analysis software tools available. A recent review summarized those that are freely available (Spicer et al., 2017). In addition, there are a variety of freely available packages for processing MS/MS spectra (Kind et al., 2018). One such tool, RAMClustR (Broeckling et al., 2014) can group features extracted via XCMS (Smith et al., 2006) into spectra in an unsupervised manner and therefore identify features that originate from the same compound in an indiscriminant MS/MS (idMS/MS) data acquisition. Spectra can then be searched against a reference library such as the NIST MS/MS Library. The NIST MSQC pipeline (Rudnick et al., 2010; https://chemdata.nist.gov/dokuwiki/doku.php?id=peptidew:msqcpipeline), a fully integrated software pipeline that was developed for the analysis of a tryptic protein digest to assist in the identification of variability caused by issues with analytical platforms, was used to process data files in this study. We have extended the application of the pipeline to identification of small molecule metabolites by modifying searching and scoring. The pipeline begins by reading a data file from a commercial instrument, extracting all spectral data, and searching the spectra against the NIST library using the NIST MS Search software. When multiple spectra are acquired for a single precursor ion, the most intense one is selected and its maximum MS1 abundance is recorded at its retention time. Figure 1 shows ion plots generated from the pipeline output after searching against the NIST17 MS/MS library or the Recurrent library and provides a visual representation of the data. Each object represents a clustered mass spectrum. More detailed plots of those shown in Figure 1 can be found in the supporting information. For these ion plots, the pipeline has found 5335 ion clusters in this data file. When searched against the NIST17 MS/MS library, 80% of these clusters have no identification. When searched against the positive ion HCD recurrent spectral library, the number of clusters with no identification drops to 23%. Thirty‐eight percent of the clusters have a recurrent label, which indicates they have matched spectra in the recurrent spectral library by either direct or hybrid MS/MS search. This increase in cluster identification demonstrates the utility of the recurrent spectral libraries. As we are cataloguing every observed ion in the libraries instead of just previously identified metabolites, we can identify these ions in future analyses of the same or similar materials.

Figure 1

Plot of a single LC‐MS/MS analysis of a 50% acetonitrile extract of CHO cell metabolites after searching against the NIST17 MS/MS library (left) or Recurrent Library (right). LC, liquid chromatography; MS, mass spectrometry [Color figure can be viewed at wileyonlinelibrary.com]

Annotation of spectra

The second goal of this study was to develop a comprehensive, automatable approach to annotate the spectra in the libraries of recurrent spectra for the purpose of filtering out artifacts and low‐quality spectra from recurrent unidentified spectra of metabolites. This type of filtering is important because unknowns can be redundant signals, artifacts (man‐made signals), and contaminants (real chemicals), instead of metabolites that are not present in the library used for spectral matching (Sindelar & Patti, 2020). Credentialing features (Mahieu et al., 2014) and isotopic ratio outlier analysis (IROA; de Jong & Beecher, 2012) are two isotopic‐labeling techniques that have been developed to provide further confidence in metabolite identifications. Such techniques were not used in the creation of the mass spectral libraries. Therefore, an annotation strategy based on the comparison of the extracted ion chromatogram (EIC) between the sample and blank runs was developed to filter library spectra. Once spectra are filtered, efforts can be focused on identifying compounds that are likely to be unidentified recurrent spectra originating from CHO cells and/or media (vs. the environment or instrument) by searching the spectra against other available tandem mass spectral libraries and in silico prediction libraries. MassBank of North America (MoNA) in combination with the NIST MS/MS Library and in silico fragmentation tools CSI:FingerID and LipidBlast has been demonstrated to be effective in assigning structural annotation to MS/MS spectra (Blaženović et al., 2019). To develop the annotation strategy, the search results of all the mass spectra contained in a representative data file were manually evaluated. Figure 2 is a graphical summary of the annotation strategy developed for filtering. First, spectra are removed if they do not have a sufficiently narrow chromatographic peak width (<30 s), unless they are identified. Second, spectra without sufficiently high spectral purity (>80%) are removed. Third, spectra without sufficient fragment ion abundances (summed product ion abundance/precursor ion abundance < 10) are removed. The data shown in Figure 1 was filtered using these parameters, which eliminated two‐thirds of the 5335 spectra. The spreadsheet used for sorting and eliminating spectra can be found in Spreadsheet 4 of the supporting information. Of those eliminated, 9.5% of were background, 77.8% were possibly contaminated (due to the presence of another peak close in mass to the parent ion), and 12.8% contained insufficient fragmentation. Figure 3 shows the distribution of abundances of the 1752 identified (by direct and hybrid MS/MS match) and unidentified ion clusters. This figure shows that less abundant compounds are less likely to be identified. These unidentified ion clusters could be comprised of spectra of previously unidentified metabolites or metabolites that are not represented in the library as well as spectra of background and artifacts, which is why annotation is crucial.

Figure 2

Workflow for annotation of spectra [Color figure can be viewed at wileyonlinelibrary.com]

Figure 3

Distribution of identified and unidentified ion clusters after filtering.

*0.2% of hybrid identified and 2.9% of unidentified spectra that were assigned abundance in this bin because the abundance could not be calculated were removed [Color figure can be viewed at wileyonlinelibrary.com]

Workflow for annotation of spectra [Color figure can be viewed at wileyonlinelibrary.com] Distribution of identified and unidentified ion clusters after filtering. *0.2% of hybrid identified and 2.9% of unidentified spectra that were assigned abundance in this bin because the abundance could not be calculated were removed [Color figure can be viewed at wileyonlinelibrary.com] In the next step in spectral annotation after filtering, the EIC of the ion of the corresponding spectrum is compared to the EIC of the same ion in a blank run via visual inspection. If the peak is not present in the blank with intensity within 100x that of the sample, then this spectrum is labeled as either a known (if it is identified by MS/MS match) and annotated with the identification or as an unknown (metabolite not identified by library searching). If the peak is in the blank, then the spectrum can either be due to an artifact/carryover or background. During manual evaluation of EICs, we found that for the purposes of spectral classification, artifact/carryover ions can be separated from background ions by examining the peak width. The background has a broad peak width (in regions where both hydrophilic and hydrophobic compounds elute) while an artifact/carryover has a narrow peak width. For most of the cases, differentiating between background and artifact/carryover was straightforward, however, for cases that were difficult to differentiate, we labeled ions as background if there was a substantial signal in both halves of the chromatogram (0–15 and 15–30 min). Separation using peak width is a quick method to classify spectra, but more accurate methods could be applied in an automated pipeline. Multiple algorithms (Cleary et al., 2019; Ho, Kuo, Wang, Chen, & Tseng, 2013; Zhang & Yang, 2008; Zhu et al., 2009) have been developed for the purpose of subtracting the background from LC‐MS data. In addition, a hierarchical cluster analysis technique was developed to identify chemical interferants that are not removable by background subtraction (Caesar, Kvalheim, & Cech, 2018). Figure 4 shows examples of each of the above‐mentioned classifications. For the unidentified recurrent spectra, these classifications are an effective way (that can be automated) to annotate the spectra for the library. These labels allow us to prioritize spectra needing identification first through library and literature searching. Unknowns represent compounds that originate from the CHO cells or cell culture media and are the highest priority to attempt to identify. Artifacts/carryover are the next priority because these may still be compounds that originate from the CHO cells or cell culture media. Background spectra are likely not worth an analyst's time to try to identify as the background will be different in analyses from different labs. Table S4 shows the resultant annotation of the 20 most abundant unidentified ion clusters. Fifteen percent of the clusters are unknowns and would be the most useful to search the literature and online databases for the identities. Fifty percent of the clusters are artifacts/carryover and the remaining 35% are background.

Figure 4

Examples of each type of annotated ion [Color figure can be viewed at wileyonlinelibrary.com]

Confidence in library match identifications

A framework for reporting confidence in metabolomics identifications was proposed in 2007 by the Chemical Analysis Working Group of the Metabolomics Standards Initiative (MSI) and is composed of four levels of metabolite confidence. These are identified compounds (Level 1), putatively annotated compounds (Level 2), putatively characterized compound classes (Level 3), and unknown compounds (Level 4) (Sumner et al., 2007). There has been discussion in the metabolomics community about providing more information about confidence by modifying/expanding the level system, introducing a quantitative system, or providing alphanumeric identification metrics, but no consensus has been reached (Creek et al., 2014; Schrimpe‐Rutledge et al., 2016; Schymanski et al., 2014; Sumner et al., 2014; Viant et al., 2017). Schymanski et. al. (2014) proposed a framework for reporting confidence that was based on the MSI levels and adapted for high‐resolution mass spectrometry (HR‐MS). These HR‐MS specific confidence levels are most appropriate for our data and consist of five confidence levels. These are confirmed structure (Level 1), probable structure (Level 2), tentative candidate(s) (Level 3), unequivocal molecular formula (Level 4), and exact mass (Level 5). In this study, we have Level 2, 3, and 5 confidences. Level 1 is confirmed using two or more properties of reference standards using the same experimental conditions. Although the NIST17 MS/MS library is acquired using reference standards, the experimental data in this paper was not acquired on the same platform, so it is not a Level 1 confidence. This type of confirmation is unrealistic for our work where we are trying to catalogue all metabolites and identify as many as possible. The direct identifications reported in this study represent Level 2 confidence structure identifications as they are obtained with library matching. Hybrid match identifications are Level 3 because they are chemical class identifications made with library searching. To assign a Level 4 confidence, we would need to attempt to assign a chemical structure to the spectra in the libraries, which we have not done to date. All the spectra in the libraries are associated with accurate mass data, and spectra annotated as unknowns would have a Level 5 confidence. Some of the spectra annotated as artifact/carryover could be originating from the sample and have a Level 5 confidence but finding these could be challenging and a method for doing this would require further development. To provide additional detail about the confidence of both our direct and hybrid library MS/MS matches, we developed a workflow to assign a qualitative confidence level to each metabolite identification. The workflow starts with the match score and incorporates prior probability information about whether the identified compound has been previously observed as a metabolite. The workflow also incorporates the annotation as described above to ensure the identified spectrum originates from the sample. Match scoring performed by the pipeline is well documented in the literature for the NIST Tandem MS library and for the NIST MS Search program and is based upon the dot product of the spectra being compared (S. E. Stein, 1999). The match score has been validated by manual inspection of matches and correlates very well with the match quality as determined by visual inspection. A score cut‐off of 400 removes essentially all poor matches and has been chosen as the default cutoff for metabolites. To assign confidence, an identification can initially be classified as high, medium, or low confidence, depending on the match score. Scores of 400–599, 600–799, and 800–999 correspond to low, medium, and high confidence, respectively. Of course, in cases of isomers with similar spectra, distinguishing them may not be possible without the use of reference standards. Prior probability as well the spectrum annotation can be used to raise or lower the qualitative level of confidence and can, to some degree, assist in isomer identification. Figure 5 depicts the workflow that was developed for assigning confidence. The workflow starting with a medium confidence is depicted at the top of the figure and is described below. The workflow starting with a low or high confidence is depicted at the bottom of the figure with the differences from the medium highlighted in green. The first step in the workflow is to determine if the identified compound is a known metabolite. For this study, we performed a literature search for reported CHO cell metabolites and searched the Human Metabolome Database (HMDB; Wishart et al., 2013, 2018) and/or PubChem (Kim et al., 2019) to see if the compound was a reported human metabolite (it was not considered a metabolite if it was on HMDB, but not endogenous). In addition, we searched for lipids using the LIPID MAPS structure database (Sud et al., 2007). If it was found in any of these places, the qualitative confidence level was increased and if not, it was decreased. For the initial confidence of medium, an identification was elevated to high confidence if it was a known metabolite and lowered to low confidence if it was not. The next step is determining if the spectrum is annotated as a known/unknown. For the right side of the workflow, if the spectrum is not a known/unknown, confidence remains low and if it is, confidence is elevated to medium. On the left side of the workflow, if the spectrum is a known/unknown, confidence remains high and if it is not, it is determined if the spectrum is annotated as an artifact/carryover. If the spectrum is an artifact/carryover, then confidence remains high, and if it is not, confidence is lowered to medium. Confidence is only elevated once in the workflow to prevent a match with a low score from being elevated to high confidence. Table S5 shows the 20 most abundant identified ions from the data in Figure 1 and their associated confidence.

Figure 5

Workflow for assigning confidence in MS/MS identifications. Initial confidence level is determined by the match score and initial medium confidence is shown at the top. MS, mass spectrometry [Color figure can be viewed at wileyonlinelibrary.com]

Automation

One of the goals of this study was to develop tools that could be automated after initial development. The two workflows for annotation of spectra in the library and assignment of a qualitative confidence level for library identifications are amenable to automation via development of software tools. This will drastically increase the speed at which annotation and confidence assignment can occur. In addition, development of software tools for assessing prior probability and tools for automatic detection of spectra that are likely to be originating from the Pluronic F‐68 in the cell culture media will be beneficial. However, expert evaluation of the output of any developed software tools will be required until the methods become routine.

CONCLUSIONS

We have created the first recurrent spectral library for use in identifying CHO cell metabolites and outlined a procedure for future extensions. The library contains metabolites originating from a single CHO cell variety in a single cell culture media and represents the spectra of all compounds repeatedly observed in these samples and can be used as a tool by others in the field to quickly identify compounds in a CHO cell metabolite sample. During this analysis, we have developed a method capable of identifying all components commonly found in the LC‐MS analysis of CHO cell metabolite extracts and media. An extension of this approach is expected to lead to both an automated way to extend this library and to develop similar libraries for other metabolite materials. Finally, we developed a strategy to assign qualitative confidence to NIST MS/MS library identifications. Although methods of representing the confidence of measurement have been developed for reporting individual metabolite identifications, this scheme could not adequately represent the confidence needed to properly annotate the identification made here—many of which cannot be regarded as definitive. The next step for this project will be automation of the workflows and release of the recurrent spectral libraries. The libraries can then be used in metabolomics studies of CHO cell metabolites using LC‐MS/MS analyses.

AUTHOR CONTRIBUTIONS

Kelly H. Telu and Stephen E. Stein contributed intellectually to project conceptualization and experiment design. Renae J. Preston and Lila Kashi grew the CHO cells used in the experiments. Zvi Kelman supervised CHO cell growth. Kelly H. Telu, Ramesh Marupaka, and Nirina R. Andriamaharavo performed the experiments. Kelly H. Telu, Yamil Simón‐Manso, and Yuxue Liang contributed to LC‐MS/MS method development. Yuri A. Mirokhin developed the algorithm Tallat H. Bukhari used to create the recurrent spectral libraries. The manuscript was drafted by Kelly H. Telu, revised by Stephen E. Stein, and then critiqued and approved by all co‐authors. Stephen E. Stein supervised the project. Supporting information. Click here for additional data file. Supporting information. Click here for additional data file. Supporting information. Click here for additional data file. Supporting information. Click here for additional data file. Supporting information. Click here for additional data file.

44 in total

1. Structure Annotation of All Mass Spectra in Untargeted Metabolomics.

Authors: Ivana Blaženović; Tobias Kind; Michael R Sa; Jian Ji; Arpana Vaniya; Benjamin Wancewicz; Bryan S Roberts; Hrvoje Torbašinović; Tack Lee; Sajjan S Mehta; Megan R Showalter; Hosook Song; Jessica Kwok; Dieter Jahn; Jayoung Kim; Oliver Fiehn
Journal: Anal Chem Date: 2019-01-16 Impact factor: 6.986

2. Towards quantitative metabolomics of mammalian cells: development of a metabolite extraction protocol.

Authors: Stefanie Dietmair; Nicholas E Timmins; Peter P Gray; Lars K Nielsen; Jens O Krömer
Journal: Anal Biochem Date: 2010-05-21 Impact factor: 3.365

3. Creation of libraries of recurring mass spectra from large data sets assisted by a dual-column workflow.

Authors: W Gary Mallard; N Rabe Andriamaharavo; Yuri A Mirokhin; John M Halket; Stephen E Stein
Journal: Anal Chem Date: 2014-10-01 Impact factor: 6.986

Review 4. Chemical Discovery in the Era of Metabolomics.

Authors: Miriam Sindelar; Gary J Patti
Journal: J Am Chem Soc Date: 2020-05-11 Impact factor: 15.419

Review 5. Overview of mass spectrometry-based metabolomics: opportunities and challenges.

Authors: G A Nagana Gowda; Danijel Djukovic
Journal: Methods Mol Biol Date: 2014

6. Hierarchical cluster analysis of technical replicates to identify interferents in untargeted mass spectrometry metabolomics.

Authors: Lindsay K Caesar; Olav M Kvalheim; Nadja B Cech
Journal: Anal Chim Acta Date: 2018-03-19 Impact factor: 6.558

7. BLANKA: an Algorithm for Blank Subtraction in Mass Spectrometry of Complex Biological Samples.

Authors: Jessica L Cleary; Gordon T Luu; Emily C Pierce; Rachel J Dutton; Laura M Sanchez
Journal: J Am Soc Mass Spectrom Date: 2019-04-16 Impact factor: 3.109

Review 8. CHO-Omics Review: The Impact of Current and Emerging Technologies on Chinese Hamster Ovary Based Bioproduction.

Authors: Gino Stolfa; Matthew T Smonskey; Ryan Boniface; Anna-Barbara Hachmann; Paul Gulde; Atul D Joshi; Anson P Pierce; Scott J Jacobia; Andrew Campbell
Journal: Biotechnol J Date: 2017-11-15 Impact factor: 4.677

9. Creating a Mass Spectral Reference Library for Oligosaccharides in Human Milk.

Authors: Connie A Remoroza; Tytus D Mak; Maria Lorna A De Leoz; Yuri A Mirokhin; Stephen E Stein
Journal: Anal Chem Date: 2018-07-17 Impact factor: 6.986

Review 10. How close are we to complete annotation of metabolomes?

Authors: Mark R Viant; Irwin J Kurland; Martin R Jones; Warwick B Dunn
Journal: Curr Opin Chem Biol Date: 2017-01-21 Impact factor: 8.822

1 in total

1. Creation and filtering of a recurrent spectral library of CHO cell metabolites and media components.

Authors: Kelly H Telu; Ramesh Marupaka; Nirina R Andriamaharavo; Yamil Simón-Manso; Yuxue Liang; Yuri A Mirokhin; Tallat H Bukhari; Renae J Preston; Lila Kashi; Zvi Kelman; Stephen E Stein
Journal: Biotechnol Bioeng Date: 2021-02-02 Impact factor: 4.530

1 in total