Literature DB >> 31410491

The ABCD database: a repository for chemically defined antibodies.

Wanessa C Lima¹, Elisabeth Gasteiger², Paolo Marcatili³, Paula Duek⁴, Amos Bairoch⁴, Pierre Cosson¹.

Abstract

The ABCD (for AntiBodies Chemically Defined) database is a repository of sequenced antibodies, integrating curated information about the antibody and its antigen with cross-links to standardized databases of chemical and protein entities. It is freely available to the academic community, accessible through the ExPASy server (https://web.expasy.org/abcd/). The ABCD database aims at helping to improve reproducibility in academic research by providing a unique, unambiguous identifier associated to each antibody sequence. It also allows to determine rapidly if a sequenced antibody is available for a given antigen.

Entities: Chemical Disease Gene

Mesh：

Substances：
Antibodies
Antigens

Year: 2020 PMID： 31410491 PMCID： PMC6943046 DOI： 10.1093/nar/gkz714

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Antibodies are one of the most widespread tools used in biological sciences. However, they are currently deemed one of the major culprits in the reproducibility crisis plaguing bio-medical research (1). Problems include batch-to-batch variability, poorly characterized and/or non-validated antibodies that sometimes do not recognize the presumptive target, or recognize more than one target, lack of explicitly described procedures adapted to each antibody, decreasing scrutiny of results by scientists and misleading antibody nomenclature. The 2 million antibodies available on the market might represent as few as 250'000 actual clones (1). Standardized guidelines for antibody validation have been proposed to reduce reproducibility issues. These guidelines delineate a working framework to define antibody specificity and functionality for different research applications (2). In order to apply these guidelines, it is of course necessary that each antibody is identified easily and unambiguously. Although the scientific community is well aware of this serious problem, few concerted solutions have appeared until now. The most advanced initiatives for centralizing information of antibodies are probably the portals Antibodypedia (3) and Antibody Registry [http://antibodyregistry.org/]), but both still rely largely on information provided by commercial vendors (such as antibody clone names). They also include an overwhelming majority of unsequenced or polyclonal antibodies, whose identity is difficult to clearly establish. One of the solutions for this problem is to employ only sequenced antibodies that are unambiguously defined by their primary amino-acid sequence (4,5). In this way, researchers can be sure to be using the exactly same binding reagent. While it seems unlikely that systematic characterization of millions of antibodies will be achieved, for the estimated 20 000 currently described chemically defined (i.e. sequenced) monoclonal antibodies, the goal would seem more attainable. The IMGT database (created decades ago by Marie-Paule Lefranc and colleagues (6)) is an invaluable knowledge resource on sequences of immunoglobulins, but it is primarily aimed at studying the diversity of immune molecules, rather than their binding specificity. Our goal is to provide the academic community with a wider access to recombinant, chemically defined antibodies (7). For this the recently launched ABCD database lists publicly available sequenced antibodies, and provides for each antibody a unique identifier and a link to its antigenic target.

OVERVIEW OF DATABASE CONTENT

The ABCD database is, to our knowledge, the first effort to provide freely accessible, curated information on chemically defined antibodies (i.e. antibodies with a known primary amino-acid sequence) connected with their antigenic target, which can be either a protein (linked to an UniProtKB unique identifier (UID) [(8), https://www.uniprot.org/]) or a chemical entity (linked to a ChEBI UID [(9), https://www.ebi.ac.uk/chebi/]). Each ABCD entry corresponds to a unique primary amino-acid sequence, defined by a unique ABCD identifier. For each entry, information about the antigen and about the antibody are provided (Figure 1).

Figure 1.

Examples of antibody entries. Each entry has a unique identifier with the format ABCD_[A-Z][A-Z][0–9][0–9][0–9]. The Antibody table contains names and synonyms of the antibody, a published reference (with a link to PubMed, in case of scientific papers, or to the WIPO database, in case of a patent), and technical applications. The antibody sequence (see Figure 2 legend) is available on the Cross-references and Publications links provided, or upon request. (A) Target is a Protein: the Antigen table contains the name and species of target, a link to the UniProtKB UID, and information on the epitope when available. (B) Target is a Chemical: the Antigen table contains the target name, a link to the ChEBI UID, and information on the epitope when available.

Examples of antibody entries. Each entry has a unique identifier with the format ABCD_[A-Z][A-Z][0-9][0-9][0-9]. The Antibody table contains names and synonyms of the antibody, a published reference (with a link to PubMed, in case of scientific papers, or to the WIPO database, in case of a patent), and technical applications. The antibody sequence (see Figure 2 legend) is available on the Cross-references and Publications links provided, or upon request. (A) Target is a Protein: the Antigen table contains the name and species of target, a link to the UniProtKB UID, and information on the epitope when available. (B) Target is a Chemical: the Antigen table contains the target name, a link to the ChEBI UID, and information on the epitope when available.

Figure 2.

Antibody sequence information. (A) An immunoglobulin consists of constant (C, in gray) and variable (V, in blue and green) chains. The paratope (or specific binding site) of an antibody is located at the variable moiety of the light (VL) and heavy (VH) chains. (B) The ABCD database stores as sequence information the amino-acid sequence of both VL and VH chains (the example given corresponds to sequence of entry ABCD_AI179, the anti-cMyc 9E10 clone).

Regarding the antibody, in addition to its ABCD identifier, the following information is given: recommended name (most frequently, the name provided in the referenced publication) and a list of synonyms; technical applications for which the antibody has been used (by no means an exhaustive inventory, as it lists only the applications described on the referenced publications); at least one bibliographic reference (either a published scientific article—with a PubMed UID or a Digital Object Identifier (DOI)—or a patent, with a link to the WIPO database) in which the antibody sequence is provided. Note that this is not meant to be a comprehensive list of all the publications describing a given antibody; cross-references to other databases (listed in Table 1).

Table 1.

List of databases and websites used as source of information or cross-reference

Database	Link	Data use	Ref.
Abysis	www.bioinf.org.uk/abysis2.7/	Source for Kabat sequences	(16)
Addgene	www.addgene.org	Source for antibody sequences inside vectors	(17)
Cellosaurus	web.expasy.org/cellosaurus/	X-ref for hybridomas	(18)
ChEBI	www.ebi.ac.uk/chebi/	X-ref for chemical targets	(9)
DigIt	circe.med.uniroma1.it/digit/	Source for sequences of annotated variable domains	(19)
IMGT/mAb-DB	imgt.org/mAb-DB/	Source for therapeutic antibody sequences	(6)
InterPro	www.ebi.ac.uk/interpro/	X-ref for domains	(20)
NCBI Taxonomy	www.ncbi.nlm.nih.gov/Taxonomy/	X-ref for species taxonomy	(21)
PROSITE	prosite.expasy.org	X-ref for domains	(22)
PubMed	www.ncbi.nlm.nih.gov/pubmed/	X-ref for publications Source for published sequences	(21)
RAN	recombinant-antibodies.org	Source for Recombinant Antibody Network antibodies	(12)
RCSB/PDB	www.rcsb.org/pdb/	X-ref for 3D structures Source for published sequences	(23)
UniProt	www.uniprot.org	X-ref for protein targets	(8)
WIPO Patents	patentscope.wipo.int	X-ref for patent publications	—

List of databases and websites used as source of information or cross-reference Regarding the antigen, the following is given: type of target (if a protein or a chemical); name of the antigen (and, in the case of a protein, also the species against which the antibody was produced); link to UniProtKB (for a protein) or ChEBI (for a chemical) databases; when available, information about the epitope recognized (for example, a domain or a specific amino-acid subsequence). The antibody amino-acid sequence can be obtained in the links to the publications and the databases used as source (this is extensively explained on our FAQ section, with links and examples on how to obtain any given sequence). Alternatively, the information is also available upon request by email (via our Contact form). The stored information corresponds to the sequence of the variable region of both the heavy and light chains (or, in the case of camelid antibodies or nanobodies, the sequence of the unique variable chain) (Figure 2). When needed, definition of heavy and light chain boundaries, based on alignment with germline sequences, was done using the VBASE2 server (10). Antibody sequence information. (A) An immunoglobulin consists of constant (C, in gray) and variable (V, in blue and green) chains. The paratope (or specific binding site) of an antibody is located at the variable moiety of the light (VL) and heavy (VH) chains. (B) The ABCD database stores as sequence information the amino-acid sequence of both VL and VH chains (the example given corresponds to sequence of entry ABCD_AI179, the anti-cMyc 9E10 clone). The ABCD database is populated with data coming from (see Table 1 for a list of source databases): (i) sequences published in scientific articles or patents; (ii) 3D structural data; (iii) a few publications and repositories of large-scale phage display or hybridoma sequencing projects (11–15). We only include sequenced antibodies with a known and defined target. However, the source of such information is of variable quality, and we encourage users to verify (and to publish) the reactivity of each antibody that they use.

DATABASE DESIGN AND IMPLEMENTATION

The ABCD database is developed by the Geneva Antibody Facility team (https://www.unige.ch/medecine/antibodies/), in collaboration with the CALIPHO and Swiss-Prot groups at the Swiss Institute of Bioinformatics (https://www.sib.swiss/). The database is available at the ExPASy web server (https://web.expasy.org/abcd/). Data is indexed for full text search using the Apache Lucy search engine library in PERL (https://lucy.apache.org/). This is a ‘loose C’ port of the Apache Lucene™ search engine library for Java. The query interface and entry display is implemented on the ExPASy server using PERL CGI scripts. The ABCD database website consists of a simple, user-friendly interface. Each antibody page is dynamically linked to external resources and databases (see Table 1). Entries can be searched by antibody name, antigen name, antigen species, UniProtKB or ChEBI UIDs, epitope information and reference UID (PubMed, DOI or Patent), via a full-text search field. The current release (v 4.0) contains 10′525 entries, referencing 9′076 proteins (1′642 unique UniProtKB UIDs) and 1′203 chemicals (261 unique ChEBI UIDs).

CONCLUSION AND PERSPECTIVES

We believe that this initiative is a valuable step in setting up a centralized repository of sequenced antibodies, allowing the unique and unambiguous identification of binding reagents for research and publication purposes. Depositing or publishing the sequence information of any given antibody should be a required step during any antibody characterization procedure; careful and thorough validation is still obligatory, but knowing the precise identity of a given reagent would allow others to repeat the exact same experiment. All entries in the ABCD database are manually curated and, hence, the database growth is linear and slow. Using computational approaches is not a desirable strategy: defining the identity of a given antibody targets is a cumbersome process, involving extensive literature mining, a process that is not easily automatized. One approach to allow for a faster inclusion of entries is to promote the submission of sequences by colleagues around the world, originating from large-scale discovery projects or sequencing of hybridomas or purified antibodies.

23 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Antibodypedia, a portal for sharing antibody and antigen validation data.

Authors: Erik Björling; Mathias Uhlén
Journal: Mol Cell Proteomics Date: 2008-07-29 Impact factor: 5.911

3. Recombinant Antibodies for Academia: A Practical Approach.

Authors: Pierre Cosson; Oliver Hartley
Journal: Chimia (Aarau) Date: 2016-12-21 Impact factor: 1.509

4. A database of immunoglobulins with integrated tools: DIGIT.

Authors: Anna Chailyan; Anna Tramontano; Paolo Marcatili
Journal: Nucleic Acids Res Date: 2011-11-10 Impact factor: 16.971

5. VBASE2, an integrative V gene database.

Authors: Ida Retter; Hans Helmar Althaus; Richard Münch; Werner Müller
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

6. IMGT®, the international ImMunoGeneTics information system® 25 years on.

Authors: Marie-Paule Lefranc; Véronique Giudicelli; Patrice Duroux; Joumana Jabado-Michaloud; Géraldine Folch; Safa Aouinti; Emilie Carillon; Hugo Duvergey; Amélie Houles; Typhaine Paysan-Lafosse; Saida Hadi-Saljoqi; Souphatta Sasorith; Gérard Lefranc; Sofia Kossida
Journal: Nucleic Acids Res Date: 2014-11-05 Impact factor: 19.160

Review 7. Quality Issues of Research Antibodies.

Authors: Michael G Weller
Journal: Anal Chem Insights Date: 2016-03-20

8. UniProt: the universal protein knowledgebase.

Authors:
Journal: Nucleic Acids Res Date: 2016-11-29 Impact factor: 16.971

9. Database resources of the National Center for Biotechnology Information.

Authors: Eric W Sayers; Richa Agarwala; Evan E Bolton; J Rodney Brister; Kathi Canese; Karen Clark; Ryan Connor; Nicolas Fiorini; Kathryn Funk; Timothy Hefferon; J Bradley Holmes; Sunghwan Kim; Avi Kimchi; Paul A Kitts; Stacy Lathrop; Zhiyong Lu; Thomas L Madden; Aron Marchler-Bauer; Lon Phan; Valerie A Schneider; Conrad L Schoch; Kim D Pruitt; James Ostell
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

10. Application of phage display to high throughput antibody generation and characterization.

Authors: Darren J Schofield; Anthony R Pope; Veronica Clementel; Jenny Buckell; Susan Dj Chapple; Kay F Clarke; Jennie S Conquer; Anna M Crofts; Sandra R E Crowther; Michael R Dyson; Gillian Flack; Gareth J Griffin; Yvette Hooks; William J Howat; Anja Kolb-Kokocinski; Susan Kunze; Cecile D Martin; Gareth L Maslen; Joanne N Mitchell; Maureen O'Sullivan; Rajika L Perera; Wendy Roake; S Paul Shadbolt; Karen J Vincent; Anthony Warford; Wendy E Wilson; Jane Xie; Joyce L Young; John McCafferty
Journal: Genome Biol Date: 2007 Impact factor: 13.583

13 in total

Review 1. Progress and challenges for the machine learning-based design of fit-for-purpose monoclonal antibodies.

Authors: Rahmad Akbar; Habib Bashour; Puneet Rawat; Philippe A Robert; Eva Smorodina; Tudor-Stefan Cotet; Karine Flem-Karlsen; Robert Frank; Brij Bhushan Mehta; Mai Ha Vu; Talip Zengin; Jose Gutierrez-Marcos; Fridtjof Lund-Johansen; Jan Terje Andersen; Victor Greiff
Journal: MAbs Date: 2022 Jan-Dec Impact factor: 5.857

2. Mass Spectrometry-Based De Novo Sequencing of Monoclonal Antibodies Using Multiple Proteases and a Dual Fragmentation Scheme.

Authors: Weiwei Peng; Matti F Pronker; Joost Snijder
Journal: J Proteome Res Date: 2021-06-14 Impact factor: 4.466

3. A recombinant antibody toolbox for Dictyostelium discoideum.

Authors: Wanessa C Lima; Philippe Hammel; Pierre Cosson
Journal: BMC Res Notes Date: 2020-04-10

4. Thera-SAbDab: the Therapeutic Structural Antibody Database.

Authors: Matthew I J Raybould; Claire Marks; Alan P Lewis; Jiye Shi; Alexander Bujotzek; Bruck Taddese; Charlotte M Deane
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971

5. ESC: a comprehensive resource for SARS-CoV-2 immune escape variants.

Authors: Mercy Rophina; Kavita Pandhare; Afra Shamnath; Mohamed Imran; Bani Jolly; Vinod Scaria
Journal: Nucleic Acids Res Date: 2022-01-07 Impact factor: 16.971

6. INDI-integrated nanobody database for immunoinformatics.

Authors: Piotr Deszyński; Jakub Młokosiewicz; Adam Volanakis; Igor Jaszczyszyn; Natalie Castellana; Stefano Bonissone; Rajkumar Ganesan; Konrad Krawczyk
Journal: Nucleic Acids Res Date: 2022-01-07 Impact factor: 16.971

7. SYNBIP: synthetic binding proteins for research, diagnosis and therapy.

Authors: Xiaona Wang; Fengcheng Li; Wenqi Qiu; Binbin Xu; Yanlin Li; Xichen Lian; Hongyan Yu; Zhao Zhang; Jianxin Wang; Zhaorong Li; Weiwei Xue; Feng Zhu
Journal: Nucleic Acids Res Date: 2022-01-07 Impact factor: 16.971

8. Generation and diversification of recombinant monoclonal antibodies.

Authors: Keith F DeLuca; Jeanne E Mick; Amy H Ide; Wanessa C Lima; Lori Sherman; Kristin L Schaller; Steven M Anderson; Ning Zhao; Timothy J Stasevich; Dileep Varma; Jakob Nilsson; Jennifer G DeLuca
Journal: Elife Date: 2021-12-31 Impact factor: 8.140

Review 9. DBHR: a collection of databases relevant to human research.

Authors: Shahid Ullah; Wajeeha Rahman; Farhan Ullah; Gulzar Ahmad; Muhmmad Ijaz; Tianshun Gao
Journal: Future Sci OA Date: 2021-01-20

Review 10. The 27th annual Nucleic Acids Research database issue and molecular biology database collection.

Authors: Daniel J Rigden; Xosé M Fernández
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971