Literature DB >> 35274671

AbDiver-A tool to explore the natural antibody landscape to aid therapeutic design.

Jakub Młokosiewicz1, Piotr Deszyński1, Wiktoria Wilman1, Igor Jaszczyszyn1,2, Rajkumar Ganesan3, Aleksandr Kovaltsuk4, Jinwoo Leem5, Jacob Galson5, Konrad Krawczyk1.   

Abstract

MOTIVATION: Rational design of therapeutic antibodies can be improved by harnessing the natural sequence diversity of these molecules. Our understanding of the diversity of antibodies has recently been greatly facilitated through the deposition of hundreds of millions of human antibody sequences in next-generation sequencing (NGS) repositories. Contrasting a query therapeutic antibody sequence to naturally observed diversity in similar antibody sequences from NGS can provide a mutational roadmap for antibody engineers designing biotherapeutics. Because of the sheer scale of the antibody NGS datasets, performing queries across them is computationally challenging.
RESULTS: To facilitate harnessing antibody NGS data, we developed AbDiver (http://naturalantibody.com/abdiver), a free portal allowing users to compare their query sequences to those observed in the natural repertoires. AbDiver offers three antibody-specific use-cases: 1) compare a query antibody to positional variability statistics precomputed from multiple independent studies 2) retrieve close full variable sequence matches to a query antibody and 3) retrieve CDR3 or clonotype matches to a query antibody. We applied our system to a set of 742 therapeutic antibodies, demonstrating that for each use-case our system can retrieve relevant results for most sequences. AbDiver facilitates the navigation of vast antibody mutation space for the purpose of rational therapeutic antibody design. AVAILABILITY: AbDiver is freely accessible at http://naturalantibody.com/abdiver.
© The Author(s) 2022. Published by Oxford University Press.

Entities:  

Year:  2022        PMID: 35274671      PMCID: PMC9048670          DOI: 10.1093/bioinformatics/btac151

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.931


1 Introduction

Monoclonal antibodies are the largest class of biotherapeutics. Development of successful antibody therapeutics requires selection and engineering of candidate sequences with favorable functional and developability properties. Knowledge of biologically possible mutations at specific positions can be employed to engineer biophysical properties of these molecules (Venkataramani ). Next-generation sequencing (NGS) now allows us to capture millions of naturally sourced B cell receptor (BCR) sequences in a single experiment, providing insight into natural antibody diversity (Kovaltsuk ). The richness of NGS data has implications for rational selection and design; models trained on these data have already shown promise for humanization (Marks ) and binding prediction (Mason ). It has also been shown that close sequence matches to clinically approved antibodies can be found in NGS datasets (Krawczyk ), and clinically approved antibodies contain engineered mutations that can be recapitulated using the natural diversity from these datasets (Petersen et al., 2021). Exploration of natural antibody diversity from NGS datasets relative to a candidate therapeutic would therefore facilitate rapid and effective antibody engineering. The volume of the publicly available antibody NGS data makes investigation of their diversity mostly constrained to time-consuming bioinformatic endeavors. There exist online tools to retrieve antibody sequences from large databases, such as ClonoMatch (Jones ), PIRD (Zhang et al., 2019) or the AIRR Data Commons (Christley ). Here, we offer an orthogonal service with a therapeutic focus, called AbDiver, that allows users to discover and characterize the natural sequence diversity surrounding a query antibody of interest. We provide three different approaches for performing this mapping (i) annotation of the natural positional diversity statistics on a position-by-position basis for a given query sequence, (ii) finding close matches to a full variable-region sequence and (iii) identifying sequences that would be classed as belonging to the same clonotype as the query sequence (based on CDR3 and germline V/J-gene segments). AbDiver thus provides an accessible way to find a natural reference for a query therapeutic sequence, offering insights for sequence selection and rational design.

2 Implementation

Data: As the underlying data, we used publicly curated unpaired BCR NGS datasets from the Observed Antibody Space (OAS) (Kovaltsuk ). In May 2021, the dataset encompassed 81 studies with 906 933 358 (105 730 531 light chains and 801 202 827 heavy chains) unique BCR sequences numbered according to the IMGT scheme (see Supplementary Material). We envisage updates to the services as more datasets become available. For benchmarking, we used a set of 742 therapeutic antibodies, extending a set from our previous study (Krawczyk ). Certain therapeutics were multispecific or contained chain duplicates resulting in 738 unique heavy chains, 707 unique light chains, 686 unique CDRH3s and 573 unique CDRL3s. V-region profiling service: The AbDiver V-region natural profiling service annotates a query variable-region antibody sequence with the naturally observed amino acid frequency statistics for each position (Fig. 1, Supplementary Material). The frequency statistics are calculated from all antibodies comprising the same combination of V-gene and J-gene. Separate statistics were calculated for genes and their constituent alleles to allow for fine-grained allelic analysis, but also to reflect the ongoing effort in allele annotation (Smakaj ). Each IMGT position in each profile contains statistics from amino acid frequencies calculated for each study separately. Amino acid positional frequency for a study was incorporated if it included at least 100 observations at a given position. For each position, we calculated the study-specific Shannon entropy and ranks of the amino acids by frequency. Query sequence sharing the germline genes or alleles of a given profile is then annotated at each IMGT position with the ranks and entropies of the given amino acid, averaged from the ranks and entropies of individual studies. This approach is designed to mitigate the effects of different numbers of sequences, techniques and disease states contributed by different studies, emphasizing frequency commonalities independent of study-specific biases. For benchmarking of the profiling service on 742 therapeutics, please see the Supplementary Material.
Fig. 1.

Visualization of the profiling service. Query sequence is compared to the frequency distribution statistics of amino acids of NGS sequences within the same gene or allele. Profiles are calculated for a specific gene or allele. Within each allele or gene, frequencies of amino acids are calculated from IMGT-aligned sequences. The individual study frequencies are used to calculate the mean, median and standard deviation (std) of the amino acid ranks, proportions and entropies which are displayed for each residue. Upon clicking on individual residues, box plot displays positional frequencies aggregated from raw data from individual studies to reflect the observed variability among independent samples

Visualization of the profiling service. Query sequence is compared to the frequency distribution statistics of amino acids of NGS sequences within the same gene or allele. Profiles are calculated for a specific gene or allele. Within each allele or gene, frequencies of amino acids are calculated from IMGT-aligned sequences. The individual study frequencies are used to calculate the mean, median and standard deviation (std) of the amino acid ranks, proportions and entropies which are displayed for each residue. Upon clicking on individual residues, box plot displays positional frequencies aggregated from raw data from individual studies to reflect the observed variability among independent samples Sequence retrieval service: We created k-mer (k = 5) based indexes for CDRs in full variable-region sequences and CDR3s separately. Variable sequence matches are identified based on the same length CDR1, CDR2 with one residue discrepancy allowed for CDR3. Clonotypes are identified on the basis of the same V-gene and CDR3 sequence identity. The matches are presented using IMGT-based Multiple Sequence Alignment (Martin, 2014). Search results are presented using interactive tables highlighting the leading themes in the studies (e.g. studied disease, vaccine) facilitating further exploration of results. For benchmarking of the search service on 742 therapeutics, please see the Supplementary Material.

3 Discussion

We created an online portal that facilitates the navigation of natural antibody diversity. We envisage particular application of the service to enable drawing parallels between natural and therapeutic antibodies (Krawczyk ) for the purpose of engineering to remove Post-Translational Modification risks while maintaining favorable biophysical properties. For instance, removing a deamidation motif would require one to introduce one of few standard mutations (e.g. NA, QG) and reassess the function of the antibody. AbDiver can identify sequence-similar candidates from natural origin, increasing the chances that function and immunogenicity will not be compromised (Gutiérrez-González ). Beyond facilitating liability removal, AbDiver could excavate sequences with potentially better product profiles than the lead therapeutic. Using the CDR3 or clonality search can accelerate the discovery of clones that share therapeutic properties of the query, yet provide alternatives with potentially better product profiles. We hope that AbDiver will enable research-supporting applications to facilitate decision-making in rational design of therapeutics during lead optimization. Financial Support: none declared. Conflict of Interest: none declared. Click here for additional data file.
  12 in total

1.  Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning.

Authors:  Derek M Mason; Simon Friedensohn; Cédric R Weber; Christian Jordi; Bastian Wagner; Simon M Meng; Roy A Ehling; Lucia Bonati; Jan Dahinden; Pablo Gainza; Bruno E Correia; Sai T Reddy
Journal:  Nat Biomed Eng       Date:  2021-04-15       Impact factor: 25.671

2.  Observed Antibody Space: A Resource for Data Mining Next-Generation Sequencing of Antibody Repertoires.

Authors:  Aleksandr Kovaltsuk; Jinwoo Leem; Sebastian Kelm; James Snowden; Charlotte M Deane; Konrad Krawczyk
Journal:  J Immunol       Date:  2018-09-14       Impact factor: 5.422

3.  ClonoMatch: a tool for identifying homologous immunoglobulin and T cell receptor sequences in large databases.

Authors:  Taylor Jones; Samuel B Day; Luke Myers; James E Crowe; Cinque Soto
Journal:  Bioinformatics       Date:  2020-12-16       Impact factor: 6.937

4.  Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV).

Authors:  Andrew C R Martin
Journal:  F1000Res       Date:  2014-10-23

5.  Looking for therapeutic antibodies in next-generation sequencing repositories.

Authors:  Konrad Krawczyk; Matthew I J Raybould; Aleksandr Kovaltsuk; Charlotte M Deane
Journal:  MAbs       Date:  2019-07-17       Impact factor: 5.857

6.  Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences.

Authors:  Erand Smakaj; Lmar Babrak; Mats Ohlin; Mikhail Shugay; Bryan Briney; Deniz Tosoni; Christopher Galli; Vendi Grobelsek; Igor D'Angelo; Branden Olson; Sai Reddy; Victor Greiff; Johannes Trück; Susanna Marquez; William Lees; Enkelejda Miho
Journal:  Bioinformatics       Date:  2020-03-01       Impact factor: 6.937

7.  Data mining patented antibody sequences.

Authors:  Konrad Krawczyk; Andrew Buchanan; Paolo Marcatili
Journal:  MAbs       Date:  2021 Jan-Dec       Impact factor: 5.857

8.  Regulatory Approved Monoclonal Antibodies Contain Framework Mutations Predicted From Human Antibody Repertoires.

Authors:  Brian M Petersen; Sophia A Ulmer; Emily R Rhodes; Matias F Gutierrez-Gonzalez; Brandon J Dekosky; Kayla G Sprenger; Timothy A Whitehead
Journal:  Front Immunol       Date:  2021-09-27       Impact factor: 7.561

9.  The ADC API: A Web API for the Programmatic Query of the AIRR Data Commons.

Authors:  Scott Christley; Ademar Aguiar; George Blanck; Felix Breden; Syed Ahmad Chan Bukhari; Christian E Busse; Jerome Jaglale; Srilakshmy L Harikrishnan; Uri Laserson; Bjoern Peters; Artur Rocha; Chaim A Schramm; Sarah Taylor; Jason Anthony Vander Heiden; Bojan Zimonja; Corey T Watson; Brian Corrie; Lindsay G Cowell
Journal:  Front Big Data       Date:  2020-06-17

10.  Humanization of antibodies using a machine learning approach on large-scale repertoire data.

Authors:  Claire Marks; Alissa M Hummer; Mark Chin; Charlotte M Deane
Journal:  Bioinformatics       Date:  2021-06-10       Impact factor: 6.931

View more
  1 in total

Review 1.  Machine-designed biotherapeutics: opportunities, feasibility and advantages of deep learning in computational antibody discovery.

Authors:  Wiktoria Wilman; Sonia Wróbel; Weronika Bielska; Piotr Deszynski; Paweł Dudzic; Igor Jaszczyszyn; Jędrzej Kaniewski; Jakub Młokosiewicz; Anahita Rouyan; Tadeusz Satława; Sandeep Kumar; Victor Greiff; Konrad Krawczyk
Journal:  Brief Bioinform       Date:  2022-07-18       Impact factor: 13.994

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.