| Literature DB >> 32952725 |
Ian Walsh1, Matthew S F Choo1, Sim Lyn Chiin1, Amelia Mak1, Shi Jie Tay1, Pauline M Rudd1,2, Yang Yuansheng3, Andre Choo4,5, Ho Ying Swan5, Terry Nguyen-Khuong1.
Abstract
The accurate assessment of antibody glycosylation during bioprocessing requires the high-throughput generation of large amounts of glycomics data. This allows bioprocess engineers to identify critical process parameters that control the glycosylation critical quality attributes. The advances made in protocols for capillary electrophoresis-laser-induced fluorescence (CE-LIF) measurements of antibody N-glycans have increased the potential for generating large datasets of N-glycosylation values for assessment. With large cohorts of CE-LIF data, peak picking and peak area calculations still remain a problem for fast and accurate quantitation, despite the presence of internal and external standards to reduce misalignment for the qualitative analysis. The peak picking and area calculation problems are often due to fluctuations introduced by varying process conditions resulting in heterogeneous peak shapes. Additionally, peaks with co-eluting glycans can produce peaks of a non-Gaussian nature in some process conditions and not in others. Here, we describe an approach to quantitatively and qualitatively curate large cohort CE-LIF glycomics data. For glycan identification, a previously reported method based on internal triple standards is used. For determining the glycan relative quantities our method uses a clustering algorithm to 'divide and conquer' highly heterogeneous electropherograms into similar groups, making it easier to define peaks manually. Open-source software is then used to determine peak areas of the manually defined peaks. We successfully applied this semi-automated method to a dataset (containing 391 glycoprofiles) of monoclonal antibody biosimilars from a bioreactor optimization study. The key advantage of this computational approach is that all runs can be analyzed simultaneously with high accuracy in glycan identification and quantitation and there is no theoretical limit to the scale of this method.Entities:
Keywords: capillary electrophoresis; clustering; data analysis; electropherogram; glycosylation; monoclonal antibodies; peak picking; process development
Year: 2020 PMID: 32952725 PMCID: PMC7476600 DOI: 10.3762/bjoc.16.176
Source DB: PubMed Journal: Beilstein J Org Chem ISSN: 1860-5397 Impact factor: 2.883
Figure 1A single bioreactor run with defined culture conditions for twelve days. (A to B) batch GU calculation using the triple standard approach. The orange star marks the three bracketing standards. (B) Dotplot of the GU value vs. migration time. (A to C to D) HappyTools software allowed easier quantitation since all peaks can be aligned/calibrated and all peaks start and end migration times can be defined before quantitation begins using HappyTools.
Figure 2Problems when integrating poorly resolved peaks using FA1/FA2G2S1/A2 and M5 peaks as an example. (A) FA1/FA2G2S1/A2 and M5 had similar peak areas. (B) FA1/FA2G2S1/A2 had a greater peak area than M5. (C) FA1/FA2G2S1/A2 had less peak area than M5. (D) Average peak area and standard deviation (error bars) for 9 electropherograms in A. (E) Average peak area and standard deviation (error bars) for 326 electropherograms in B. (F) Average peak area and standard deviation (error bars) for 56 electropherograms in C. (G). Correlations between 32 Karat and HappyTools/Riemann sums for FA1/FA2G2S1/A2 only (red box peak not picked by 32 Karat), (H) M5 peaks only (red box peak not picked by 32 Karat, and (I) when FA1/FA2G2S1/A2 and M5 peak areas were combined there was excellent correlation between both approaches suggesting 32 Karat integrates the two peaks as a whole.
Figure 3The clustering function allowed grouping of similar electropherograms and therefore clean the HappyTools calibration output. (A) HappyTools calibration of all 391 electropherograms was largely misaligned. This made peaks not well aligned and made it difficult to define the peak positions for quantitation. (B) The clustering grouped the HappyTools calibration into three clusters of similar electropherograms. Each cluster had well defined peaks that were manually determined.
Figure 4Comparison of the performance of the automated peak picking and semi-automated clustering and HappyTools quantitation for the 391 electropherograms. (A) Automated quantitation using Gaussian approximation approach picked on average 10 peaks (min = 5, max = 14) while our HappyTools + Clustering approach constantly picked 17 peaks. (B) Number of times automated Gaussian peak picking missed one of the 17 peaks listed in Supporting Information File 1, Table S3 as a function of % abundance.
Figure 5Glycans identified in anti-HER-2 samples using UPLC-MS and CE. (A) the UPLC chromatogram confirmed the 14 glycans using GU and mass. Green boxed glycans were also identified in CE. (B) Glucose units vs. migration time for all 391 CE electropherograms. Database matched glycans are shown in Oxford linear notation [19]. The CE APTS database hits are marked with a circle and a corresponding error bar showing the GU tolerance. All glycans with core fucose were α-1→6 linkage, galactose were β-1→4 linkage and all sialic acid linkages were α-2→3 linkage. All glycans are drawn in SNFG notation [20].
Figure 6Boxplots showing the quantitation of the 11 different bioreactor conditions. The boxplots show the peak area distribution (expressed as % relative abundance) for each of the 11 bioreactor conditions. Points are averaged relative abundances for three replicates.