Literature DB >> 35845838

Artificial intelligence for antibody reading comprehension: AntiBERTa.

Yoonjoo Choi1.   

Abstract

Utilizing publicly available antibody big data resources, Leem et al. (2022) developed an antibody-specific language model, AntiBERTa, to understand the "language" of antibodies. Case studies reveal that AntiBERTa can be an extremely useful tool for antibody engineering.
© 2022 The Author(s).

Entities:  

Year:  2022        PMID: 35845838      PMCID: PMC9278504          DOI: 10.1016/j.patter.2022.100535

Source DB:  PubMed          Journal:  Patterns (N Y)        ISSN: 2666-3899


Main text

There are a large number of players in the immune system to protect biological individuals from harmful foreign substances. Among those, the B cell is a main player in the adaptive immune system. B cells are equipped with receptor molecules (B cell receptor) that can be secreted upon activation. The secreted molecules, antibody, are known to be astronomically diverse (estimated 1013–1015). The high diversity of the antibody is a two-faced Janus. The immune system can respond to nearly any type of antigen, mainly due to the large diversity of antibodies. According to Antibodypedia, 4.6 million monoclonal antibodies are currently available for 19,000 genes. The diversity also enables antibodies to be highly successful as biotherapeutics. In 2021, FDA approved the 100th therapeutic antibody. The coronavirus pandemic has been currently boosting the development of therapeutic antibodies for COVID-19, and several new antibodies are waiting to treat SARS-CoV-2-infected patients. On the other hand, such rich diversity may not be always advantageous. Despite the fact that antibodies have been (perhaps the most) extensively studied and the antibody-related biopharmaceutical industry continues to mature, there seem to be a lot of things to learn about antibodies, as evidenced in the increasing growth of papers with the publication keyword, “antibody.” It is simply practically impossible to experimentally explore the entire antibody repertoire. Thus, computational approaches using artificial intelligence (AI) techniques have become essential for antibody research. The advancement of AI and big data are not separable. Recent advances in next-generation sequencing technology now enable the construction of a large volume of antibody repertoires. The observed antibody space (OAS) database, is a compilation of known repertoire studies and databases. Since the release of OAS, many practical applications have been developed including computational antibody humanization using AI., The antibody repertoire big data resources also provide an in-depth view and biological insights into antibodies. Here, Leem et al. present an antibody-specific language model in a timely manner. AntiBERTa (antibody-specific bidirectional encoder representation from transformers) is a 12-layer transformer model pre-trained using the OAS database. In fact, there has been a language model for general proteins (ProtBERT), and there have been other antibody-specific language models, such as DeepAb and Sapiens. Comparing with those existing methods, however, AntiBERTa is more versatile and specific with deeper layers. It is remarkable that AntiBERTa nicely partitions memory and naive B cells, whereas other models showed relatively less distinct results; i.e. the antibody-specific deep-layered model indeed learns the language of antibodies and finds the origin of B cell. One of the direct applications can be the estimation of antibody humanness and immunogenicity for the development of safer therapeutic antibodies. It is well known that antibodies with high human content tend to be less immunogenic. As demonstrated in the separation of memory and naive B cells, AntiBERTa is shown to be successful in classifying their species origin (murine, chimeric, humanized, and human). The antibody-specific model generally provides better descriptions of antibodies than the general protein model. The authors found that residue pairs with high self-attention scores give structural insights into long-range interactions, which were not identified by the general protein model. The insight naturally leads to the prediction of paratopes, antigen binding sites. From several case studies, the authors showed that AntiBERTa successfully identifies paratope residues that are not in complementarity determining regions (CDR). While the authors demonstrated that AntiBERTa outperforms other methods and provided convincing rationales, they also leave something to be desired. As the authors stated in the main manuscript, AntiBERTa can be directly applicable to antibody-structure prediction and humanization (or both at the same time), but the authors left it as potential applications. In the near future, we hope to meet practical application tools based on the AntiBERTa model.
  9 in total

1.  Antibodypedia, a portal for sharing antibody and antigen validation data.

Authors:  Erik Björling; Mathias Uhlén
Journal:  Mol Cell Proteomics       Date:  2008-07-29       Impact factor: 5.911

2.  FDA approves 100th monoclonal antibody product.

Authors:  Asher Mullard
Journal:  Nat Rev Drug Discov       Date:  2021-07       Impact factor: 84.694

3.  Observed Antibody Space: A Resource for Data Mining Next-Generation Sequencing of Antibody Repertoires.

Authors:  Aleksandr Kovaltsuk; Jinwoo Leem; Sebastian Kelm; James Snowden; Charlotte M Deane; Konrad Krawczyk
Journal:  J Immunol       Date:  2018-09-14       Impact factor: 5.422

Review 4.  How repertoire data are changing antibody science.

Authors:  Claire Marks; Charlotte M Deane
Journal:  J Biol Chem       Date:  2020-05-14       Impact factor: 5.157

5.  ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.

Authors:  Ahmed Elnaggar; Michael Heinzinger; Christian Dallago; Ghalia Rehawi; Yu Wang; Llion Jones; Tom Gibbs; Tamas Feher; Christoph Angerer; Martin Steinegger; Debsindhu Bhowmik; Burkhard Rost
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2022-09-14       Impact factor: 9.322

6.  Observed Antibody Space: A diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences.

Authors:  Tobias H Olsen; Fergus Boyles; Charlotte M Deane
Journal:  Protein Sci       Date:  2021-10-29       Impact factor: 6.725

7.  BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning.

Authors:  David Prihoda; Jad Maamary; Andrew Waight; Veronica Juan; Laurence Fayadat-Dilman; Daniel Svozil; Danny A Bitton
Journal:  MAbs       Date:  2022 Jan-Dec       Impact factor: 5.857

8.  Antibody structure prediction using interpretable deep learning.

Authors:  Jeffrey A Ruffolo; Jeremias Sulam; Jeffrey J Gray
Journal:  Patterns (N Y)       Date:  2021-12-09

9.  Humanization of antibodies using a machine learning approach on large-scale repertoire data.

Authors:  Claire Marks; Alissa M Hummer; Mark Chin; Charlotte M Deane
Journal:  Bioinformatics       Date:  2021-06-10       Impact factor: 6.931

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.