| Literature DB >> 34484220 |
Yanfang Zhang1,2,3,4,5, Tianjian Chen6, Huikun Zeng1,2,3,4,5, Xiujia Yang1,2,3,4,5, Qingxian Xu6, Yanxia Zhang1,2, Yuan Chen3, Minhui Wang1,7,8, Yan Zhu1,2, Chunhong Lan1,3, Qilong Wang3, Haipei Tang3, Yan Zhang2, Chengrui Wang2, Wenxi Xie1,2, Cuiyu Ma1,2, Junjie Guan1,2, Shixin Guo9, Sen Chen2, Wei Yang10, Lai Wei9, Jian Ren6, Xueqing Yu5,11, Zhenhai Zhang1,2,3,4,5.
Abstract
The antibody repertoire is a critical component of the adaptive immune system and is believed to reflect an individual's immune history and current immune status. Delineating the antibody repertoire has advanced our understanding of humoral immunity, facilitated antibody discovery, and showed great potential for improving the diagnosis and treatment of disease. However, no tool to date has effectively integrated big Rep-seq data and prior knowledge of functional antibodies to elucidate the remarkably diverse antibody repertoire. We developed a Rep-seq dataset Analysis Platform with an Integrated antibody Database (RAPID; https://rapid.zzhlab.org/), a free and web-based tool that allows researchers to process and analyse Rep-seq datasets. RAPID consolidates 521 WHO-recognized therapeutic antibodies, 88,059 antigen- or disease-specific antibodies, and 306 million clones extracted from 2,449 human IGH Rep-seq datasets generated from individuals with 29 different health conditions. RAPID also integrates a standardized Rep-seq dataset analysis pipeline to enable users to upload and analyse their datasets. In the process, users can also select set of existing repertoires for comparison. RAPID automatically annotates clones based on integrated therapeutic and known antibodies, and users can easily query antibodies or repertoires based on sequence or optional keywords. With its powerful analysis functions and rich set of antibody and antibody repertoire information, RAPID will benefit researchers in adaptive immune studies.Entities:
Keywords: Rep-Seq; antibody annotation; antibody database; comparative analysis; public clone
Mesh:
Substances:
Year: 2021 PMID: 34484220 PMCID: PMC8414647 DOI: 10.3389/fimmu.2021.717496
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Figure 1Schematic of the datasets and related features in RAPID. (A) Rep-seq datasets and their metadata (top) and repertoire (bottom) features. Metadata is linked with “Dataset ID” and can be obtained by it. The “Clones” in repertoire features was generated after the process of RAPID pipeline (Materials and Methods). (B)The antibody collections included in RAPID consist of three data sources: Rep-seq datasets, known antibodies, and therapeutic antibodies. All available information was extracted from these sources and stored. In addition, antibody sequences were analysed, and related information (such as VDJ gene usage and CDR3) were extracted and recorded. nt, nucleotide; aa, amino acid.
Figure 2Functionalities of RAPID. (A) Low-level analysis. Germline genes and the antibody sequence derived from the recombination are shown schematically on top. CDR3s were identified using the RAPID bioinformatics pipeline, and clonalities of antibodies were defined according to sequence similarity and V/J gene segments (bottom). (B) High-level analysis of Rep-seq dataset. Repertoire features were extracted from the Rep-seq datasets (top). The features of the experimental and reference groups were compared and shown (middle). Public clones, if available, were extracted and displayed (bottom). (C) Antibody annotation based on CDR3 aa. Antibodies having the same amino acid CDR3 as known or therapeutic antibodies were extracted and annotated based on their matches in the database (middle). Enrichment of the annotated antibodies were analysed, and a p value was calculated. (D) Antibody and repertoire query function. The top panel shows several antibody queries and the schematics of the result. The bottom panel shows the visualized results of a repertoire query.
Figure 3Repertoire features of COVID-19 patients compared with 32 references. (A) The distribution of V gene usage. The V-gene usage of the reference group is shown in the boxplot, and that of COVID-19 patients is indicated by the dots. (B) Length of CDR3nt sequences. The median fraction from the reference is indicated by the gray bars. The length of deletions (C) and insertions (E) at V3, D5, D3, and J5. (D) Mutation rates in each functional region. (F) The distribution of D50. (G) Number of shared clones. The X-axis indicates the number of references, and the Y-axis shows the number of COVID-19 samples.
Figure 4Output of Antibody annotation module. (A) The number of annotated clones for each sample. The pink circle represents the total clone identified in each sample and the green one indicates the number of heavy chain with detailed annotation in RAPID. (B) The distribution of diseases which are enriched in samples.
Figure 5“Sequence Query” schematic. (A) Input options for “Sequence Query”. Selected options are marked with blue dots. (B) Sequence query result. The subject ID filled in blue is a hyperlink and can be clicked to see details. Hit sequences can be sorted according to any column by clicking the marker at the end of each column. (C) Details for subject CDR3_0010989842. (D) Metadata of dataset with such a subject. The accession number of SRA, BioProject, and pubmed id can be clicked to visit their original websites.