Literature DB >> 31588507

VDJdb in 2019: database extension, new analysis infrastructure and a T-cell receptor motif compendium.

Dmitry V Bagaev1,2, Renske M A Vroomans3,4, Jerome Samir5,6, Ulrik Stervbo7,8, Cristina Rius9, Garry Dolton9, Alexander Greenshields-Watson9, Meriem Attaf9, Evgeny S Egorov2, Ivan V Zvyagin1,2, Nina Babel7,8, David K Cole9,10, Andrew J Godkin9, Andrew K Sewell9, Can Kesmir11, Dmitriy M Chudakov1,2,12, Fabio Luciani5,6, Mikhail Shugay1,2,12.   

Abstract

Here, we report an update of the VDJdb database with a substantial increase in the number of T-cell receptor (TCR) sequences and their cognate antigens. The update further provides a new database infrastructure featuring two additional analysis modes that facilitate database querying and real-world data analysis. The increased yield of TCR specificity identification methods and the overall increase in the number of studies in the field has allowed us to expand the database more than 5-fold. Furthermore, several new analysis methods are included. For example, batch annotation of TCR repertoire sequencing samples allows for annotating large datasets on-line. Using recently developed bioinformatic methods for TCR motif mining, we have built a reduced set of high-quality TCR motifs that can be used for both training TCR specificity predictors and matching against TCRs of interest. These additions enhance the versatility of the VDJdb in the task of exploring T-cell antigen specificities. The database is available at https://vdjdb.cdr3.net.
© The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 31588507      PMCID: PMC6943061          DOI: 10.1093/nar/gkz874

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Knowing the exact antigen specificity of a given T-cell is key to solving numerous problems of both basic and applied immunology research: from discovering the specificity profile of TCR repertoire sequencing samples (1,2), to finding associations between autoimmunity and foreign mimics of self-antigens (3), and designing of personalized tumor immunotherapies (4). The field of molecular methods designed for studying antigen-specific T-cells is developing at a high pace: novel methodologies based on single-cell T-cell sequencing allow simultaneous detection of TCR sequence, T-cell phenotype and a vast array of antigen specificities (5). Resulting data, however, still need to be properly quality-controlled, and organized in the form of a database that is both comprehensive and easy to query. After the first version of VDJdb (6) and a pathology-associated TCR database (McPAS-TCR (7)) were published, a commonly used iEDB database that describes immunogenic antigens was also modified to add metadata related to TCR and B-cell receptor sequences (8), highlighting the overall demand for such data in the field. A number of methods for TCR specificity prediction were also reported recently, many of which rely on VDJdb data for training and validating classifiers (9–12). The latter demonstrates the overall potential of the VDJdb database for developing better bioinformatic methods for TCR sequence analysis. In this 2019 update, we focused on both accumulating the large amount of data generated by recent studies and providing an interface facilitating web-based analysis of adaptive immune receptor repertoire sequencing (AIRR-Seq, (13)) datasets. Given the large amount of data currently stored in VDJdb, we provided a reduced dataset of high-quality motifs that facilitates identification of TCR residues critical for recognition of certain antigens.

MATERIALS AND METHODS

Data acquisition and processing

All data acquired from published studies was manually parsed into VDJdb format according to VDJdb guidelines (https://github.com/antigenomics/vdjdb-db/blob/master/README.md) and quality-controlled both manually and using previously reported automated scripts (6). Summary statistics for VDJdb records were computed using an R notebook provided at https://github.com/antigenomics/vdjdb-db/blob/master/summary/vdjdb_summary.Rmd. Datasets from 10X genomics were downloaded from https://support.10xgenomics.com/single-cell-vdj/datasets (‘Application Note - A New Way of Exploring Immunity’ section, datasets ‘CD8+ T cells of Healthy Donor’ 1–4, available under the Creative Commons Attribution license) and processed using in-house scripts (available at https://bitbucket.org/kirbyvisp/10x-tcr/src/master/) that perform stringent filtering on the CDR3 length and composition, and tetramer read counts, yielding over 20 000 unique antigen-specific receptors. We’ve also performed several rounds of manual proofreading for the database and fixed a number of typos, mostly related to ambiguous segment naming (e.g. cases when V segments named according to Arden nomenclature were imported as IMGT segment names).

Updated VDJdb web browser implementation details

Since 2017, we upgraded VDJdb web server to run on the latest Play framework (v2.7.2, https://www.playframework.com) with Akka HTTP server to improve the overall performance. We have also fully re-implemented frontend using Angular (https://angular.io) to provide a faster and more responsive interface. Importantly, we have implemented a fully documented REST API that can be used to query the database and can be found at https://vdjdb-web.readthedocs.io/en/latest/api.html. We also facilitated local VDJdb web server installation by providing a Docker image available at https://cloud.docker.com/u/bvdmitri/repository/docker/bvdmitri/vdjdb-web.

TCR motif database

We have used the TCRNET implementation (14) in VDJtools (15) to identify TCR nodes in VDJdb TCR similarity network that have more neighbors than expected by chance, allowing for a single amino acid substitution in the CDR3 region. Only epitopes assigned to at least 30 distinct TCR amino acid sequences were considered. Selected nodes and their first neighbors were left in the TCR similarity network and sets of homologous TCR sequences (motifs) were defined for each epitope as connected components of the resulting graph. Position weight matrices (PWMs) for CDR3 amino acid sequences of inferred motifs were constructed using connected components of the graph. PWM normalization was performed by using the probability in a control set as the information measure, where control set is a set TCR sequences having the same V/J genes and CDR3 length coming from a pool of healthy donor samples. Details of this procedure are summarized in an R markdown notebook available at https://github.com/antigenomics/vdjdb-motifs.

RESULTS

Timeline of data accumulation and perspectives

VDJdb database is substantially expanded compared to the previous report (6): since the establishment of the database a total of 155 published studies were processed and added resulting in 61 049 TCR specificity records in 2019 compared to only 5491 in 2017. We used the publication dates of papers added to VDJdb to calculate the rate of accumulation of TCR specificity knowledge (Figure 1). The figure clearly shows that the number of records grows slowly prior to 2017, while a rapid growth in the number of records occurred in the last two years. The latter can be explained by the establishment of AIRR-Seq techniques producing very large amounts of TCR sequences (13) as a method of choice for performing the readout of TCR specificity assays.
Figure 1.

Summary statistics of VDJdb records according to July 2019 database release. Plots show the cumulative number of unique TCR sequences (up to unique V/J gene and CDR3 amino acid), antigens, MHC alleles and publications added to the database arranged by the publication date of corresponding papers. Colored lines represent records that only have TCR alpha (TRA, blue) or beta (TRB, green) chain or both chains (‘paired’ records, red). Note that as several VDJdb records can link a single TCR sequence to different metadata (e.g. another study and donor, or distinct epitope in case of cross-reactivity), the total number of unique TCR sequences (n = 42 211) is less than the total number of VDJdb records (n = 61 049).

Summary statistics of VDJdb records according to July 2019 database release. Plots show the cumulative number of unique TCR sequences (up to unique V/J gene and CDR3 amino acid), antigens, MHC alleles and publications added to the database arranged by the publication date of corresponding papers. Colored lines represent records that only have TCR alpha (TRA, blue) or beta (TRB, green) chain or both chains (‘paired’ records, red). Note that as several VDJdb records can link a single TCR sequence to different metadata (e.g. another study and donor, or distinct epitope in case of cross-reactivity), the total number of unique TCR sequences (n = 42 211) is less than the total number of VDJdb records (n = 61 049). Notably, while the number of unique epitopes with known specific TCRs increased, it is still dwarfed by the number of unique TCRs in the database, as there are currently only 212 epitopes in VDJdb. This highlights one of the most important factors currently limiting our ability to study and predict TCR specificity for a diverse range of antigens. We hope that in future this can be resolved using technologies allowing simultaneous testing for several antigen specificities (5). Prior to 2019, there was a substantial lack of paired TCR records as most high-throughput studies focused on TCR beta chain only. However, with the advent of the droplet based single-cell sequencing techniques and the subsequent AIRR-seq focused improvements, the number of paired TCR alpha and beta records increased a lot. For example, one of the most recent studies added to VDJdb, a paired-chain dataset from 10X Genomics compendium (see Materials and Methods section), features 40 unique epitopes, more than any other high-throughput dataset produced so far.

Online batch analysis of AIRR-Seq samples

Due to a growing number of immunological studies that choose AIRR-Seq technology to survey adaptive immunity, one of the main tasks for the current VDJdb update was to provide the ability to query a large TCR sequence set (e.g. whole blood or tissue specific T cell repertoire) against the entire database. We extended the functionality of the VDJdb browser and included batch upload and query options for AIRR-Seq datasets (Annotation tab of the web interface). Batch query supports several commonly used data formats (IMGT/HighV-QUEST, MiXCR, ImmunoSEQ, etc), borrowing functionality from the VDJviz browser (16). Upon uploading a sample, TCR sequences are aligned against VDJdb records and users are provided with a list of VDJdb hits containing all necessary information regarding alignment and metadata (Figure 2A). Users can specify parameters of the TCR sequence homology search and/or pre-filter the VDJdb database to search in the specific data subset. Charts with summary statistics are supplied together with the analysis results and can be customized using different normalization options (Figure 2B). The results of repertoire annotation can be easily exported as tab-delimited tables and used for downstream analysis.
Figure 2.

Batch query interface of the VDJdb web browser. (A) TCR sequence annotations provided for an example AIRR-Seq sample with default matching criteria. The sample (HIP02877) represents an individual carrying a HLA-B*35 allele and is taken from the Emerson et al.'s study (17). Note a prominent EBV-specific clonal expansion restricted to this allele at the top of the annotations list. (B) Summary statistics charts comparing HIP02877 (‘B35+.txt’, left) to HIP13994 (‘CMV+.txt’, right) sample representing a CMV+ individual from the same study.

Batch query interface of the VDJdb web browser. (A) TCR sequence annotations provided for an example AIRR-Seq sample with default matching criteria. The sample (HIP02877) represents an individual carrying a HLA-B*35 allele and is taken from the Emerson et al.'s study (17). Note a prominent EBV-specific clonal expansion restricted to this allele at the top of the annotations list. (B) Summary statistics charts comparing HIP02877 (‘B35+.txt’, left) to HIP13994 (‘CMV+.txt’, right) sample representing a CMV+ individual from the same study.

Evaluating and exploring TCR motifs

Similarity analysis of TCR sequences specific for a certain antigen reveals a complex network structure containing both large interconnected hubs of highly similar TCRs and a diverse set of unique TCR variants that cannot be co-clustered (6). Reducing the complexity of the VDJdb TCR network can both greatly speed-up database queries for large AIRR-Seq datasets and make it easier to visualize and interpret TCR binding motifs. The latter can be achieved by applying a recently developed de novo TCR motif discovery algorithm that allows to distinguish hubs of similar TCRs that arise due to antigen-specific enrichment from hubs that arise simply due to biases in the V(D)J rearrangement process (14). Upon selecting such hubs in the VDJdb TCR network with a stringent criterion that only allows for a single substitution in the CDR3 region (see Material and Methods section) we obtained a set of 501 motifs specific for 40 epitopes for human and mouse TCR alpha and beta chains. Additionally, we prepared a set of normalized motif PWMs that control for amino acid biases for a given V/J combination and CDR3 length. This normalization removes germline-encoded residues that are not unique to a given motif: the conserved Cys and Phe/Trp and other flanking CDR3 residues that rarely interact with an epitope. Users can navigate the TCR motifs using the VDJdb motif browser (Motifs tab of the web interface) as shown in Figure 3 and query both full-length or partial CDR3 sequences against the database. Note that we provide motif PWMs to illustrate epitope-specific TCR patterns and do not report their accuracy for TCR specificity classification as this is beyond the scope of the present work.
Figure 3.

VDJdb motif browser interface. (A) Navigation tab showing the tree of available epitope motifs that can be selected to view PWMs of CDR3 amino acid sequences. The top plot shows the most abundant TCR beta chain motif for the A*02:GIL Influenza epitope. Motifs normalized for V(D)J rearrangement background are shown. (B) An example of CDR3 sequence query with matching amino acids highlighted. CDR3 sequence (CAEDNNARLMF) of the TCR alpha chain from the 3O4L PDB structure (TCR bound to A*02:GLC EBV epitope) was used as a query.

VDJdb motif browser interface. (A) Navigation tab showing the tree of available epitope motifs that can be selected to view PWMs of CDR3 amino acid sequences. The top plot shows the most abundant TCR beta chain motif for the A*02:GIL Influenza epitope. Motifs normalized for V(D)J rearrangement background are shown. (B) An example of CDR3 sequence query with matching amino acids highlighted. CDR3 sequence (CAEDNNARLMF) of the TCR alpha chain from the 3O4L PDB structure (TCR bound to A*02:GLC EBV epitope) was used as a query.

DISCUSSION

During the last two years we have greatly increased the number of entries in the VDJdb, and made substantial improvements to the database structure and web interface. Currently, VDJdb represents the largest open data set of TCRs with characterized specificity. We believe that the comprehensiveness of the VDJdb make it an attractive database for the benchmarking of TCR specificity prediction algorithms and other basic studies investigating TCR:peptide:MHC interactions. The future development of the database will primarily follow two major directions. First, we aim at extending the set of existing records with putative antigen-specific TCRs using bioinformatic methods for de novo prediction of cognate TCR sequences. The latter is critical to provide a reasonable coverage for the overall diversity of an AIRR-Seq sample that can reach ∼106 TCR variants. Second, we aim at modelling TCR:peptide:MHC structures based on existing templates and VDJdb records in order to provide a basis for investigating the complexity of interactions involving residues of the TCR heterodimer and bring more insight into TCR recognition of antigens. We believe that the recent expansion of the VDJdb database both in terms of the underlying dataset and functionality, will make it a favorable resource for tasks related to annotation of T-cell AIRR-seq samples and basic studies of TCR specificity.

DATA AVAILABILITY

VDJdb database is available at https://vdjdb.cdr3.net.
  15 in total

Review 1.  Molecular mimicry and autoimmunity.

Authors:  Manuel Rojas; Paula Restrepo-Jiménez; Diana M Monsalve; Yovana Pacheco; Yeny Acosta-Ampudia; Carolina Ramírez-Santana; Patrick S C Leung; Aftab A Ansari; M Eric Gershwin; Juan-Manuel Anaya
Journal:  J Autoimmun       Date:  2018-10-26       Impact factor: 7.094

Review 2.  Using T Cell Receptor Repertoires to Understand the Principles of Adaptive Immune Recognition.

Authors:  Philip Bradley; Paul G Thomas
Journal:  Annu Rev Immunol       Date:  2019-01-30       Impact factor: 28.527

3.  High-resolution repertoire analysis reveals a major bystander activation of Tfh and Tfr cells.

Authors:  Paul-Gydeon Ritvo; Ahmed Saadawi; Pierre Barennes; Valentin Quiniou; Wahiba Chaara; Karim El Soufi; Benjamin Bonnet; Adrien Six; Mikhail Shugay; Encarnita Mariotti-Ferrandiz; David Klatzmann
Journal:  Proc Natl Acad Sci U S A       Date:  2018-08-29       Impact factor: 11.205

4.  Human T cell receptor occurrence patterns encode immune history, genetic background, and receptor specificity.

Authors:  William S DeWitt; Anajane Smith; Gary Schoch; John A Hansen; Frederick A Matsen; Philip Bradley
Journal:  Elife       Date:  2018-08-28       Impact factor: 8.140

Review 5.  Ushering in Integrated T Cell Repertoire Profiling in Cancer.

Authors:  Ning Jiang; Alexandra A Schonnesen; Ke-Yue Ma
Journal:  Trends Cancer       Date:  2018-12-26

6.  Large-scale detection of antigen-specific T cells using peptide-MHC-I multimers labeled with DNA barcodes.

Authors:  Amalie Kai Bentzen; Andrea Marion Marquard; Rikke Lyngaa; Sunil Kumar Saini; Sofie Ramskov; Marco Donia; Lina Such; Andrew J S Furness; Nicholas McGranahan; Rachel Rosenthal; Per Thor Straten; Zoltan Szallasi; Inge Marie Svane; Charles Swanton; Sergio A Quezada; Søren Nyboe Jakobsen; Aron Charles Eklund; Sine Reker Hadrup
Journal:  Nat Biotechnol       Date:  2016-08-29       Impact factor: 54.908

7.  McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences.

Authors:  Nili Tickotsky; Tal Sagiv; Jaime Prilusky; Eric Shifrut; Nir Friedman
Journal:  Bioinformatics       Date:  2017-09-15       Impact factor: 6.937

8.  Detecting T cell receptors involved in immune responses from single repertoire snapshots.

Authors:  Mikhail V Pogorelyy; Anastasia A Minervina; Mikhail Shugay; Dmitriy M Chudakov; Yuri B Lebedev; Thierry Mora; Aleksandra M Walczak
Journal:  PLoS Biol       Date:  2019-06-13       Impact factor: 8.029

9.  A Framework for Annotation of Antigen Specificities in High-Throughput T-Cell Repertoire Sequencing Studies.

Authors:  Mikhail V Pogorelyy; Mikhail Shugay
Journal:  Front Immunol       Date:  2019-09-26       Impact factor: 7.561

10.  VDJdb: a curated database of T-cell receptor sequences with known antigen specificity.

Authors:  Mikhail Shugay; Dmitriy V Bagaev; Ivan V Zvyagin; Renske M Vroomans; Jeremy Chase Crawford; Garry Dolton; Ekaterina A Komech; Anastasiya L Sycheva; Anna E Koneva; Evgeniy S Egorov; Alexey V Eliseev; Ewald Van Dyk; Pradyot Dash; Meriem Attaf; Cristina Rius; Kristin Ladell; James E McLaren; Katherine K Matthews; E Bridie Clemens; Daniel C Douek; Fabio Luciani; Debbie van Baarle; Katherine Kedzierska; Can Kesmir; Paul G Thomas; David A Price; Andrew K Sewell; Dmitriy M Chudakov
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

View more
  86 in total

1.  Postvaccination graft dysfunction/aplastic anemia relapse with massive clonal expansion of autologous CD8+ lymphocytes.

Authors:  Caitlin Ritz; Wenzhao Meng; Natasha L Stanley; Miren L Baroja; Chong Xu; Patrick Yan; Alexander C Huang; Ryan Hausler; Peter Nicholas; Jian-Meng Fan; David Lieberman; Beatriz M Carreno; Eline T Luning Prak; Timothy S Olson; Daria V Babushok
Journal:  Blood Adv       Date:  2020-04-14

2.  T-cell tracking, safety, and effect of low-dose donor memory T-cell infusions after αβ T cell-depleted hematopoietic stem cell transplantation.

Authors:  Sergey Blagov; Ivan V Zvyagin; Larisa Shelikhova; Rimma Khismatullina; Dmitriy Balashov; Ekaterina Komech; Viktoria Fomchenkova; Mikhail Shugay; Julia Starichkova; Elena Kurnikova; Dmitriy Pershin; Maria Fadeeva; Svetlana Glushkova; Yakov Muzalevskii; Alexei Kazachenok; Maria Efimenko; Elena Osipova; Galina Novichkova; Dmitriy Chudakov; Alexei Maschan; Michael Maschan
Journal:  Bone Marrow Transplant       Date:  2020-11-17       Impact factor: 5.483

3.  Current challenges for unseen-epitope TCR interaction prediction and a new perspective derived from image classification.

Authors:  Pieter Moris; Joey De Pauw; Anna Postovskaya; Sofie Gielis; Nicolas De Neuter; Wout Bittremieux; Benson Ogunjimi; Kris Laukens; Pieter Meysman
Journal:  Brief Bioinform       Date:  2021-07-20       Impact factor: 11.622

4.  Public and private human T-cell clones respond differentially to HCMV antigen when boosted by CD3 copotentiation.

Authors:  Laura R E Becher; Wendy K Nevala; Shari Lee Sutor; Megan Abergel; Michele M Hoffmann; Christopher A Parks; Larry R Pease; Adam G Schrum; Svetomir N Markovic; Diana Gil
Journal:  Blood Adv       Date:  2020-11-10

Review 5.  An overview of immunoinformatics approaches and databases linking T cell receptor repertoires to their antigen specificity.

Authors:  Ivan V Zvyagin; Vasily O Tsvetkov; Dmitry M Chudakov; Mikhail Shugay
Journal:  Immunogenetics       Date:  2019-11-18       Impact factor: 2.846

6.  TCRdb: a comprehensive database for T-cell receptor sequences with powerful search function.

Authors:  Si-Yi Chen; Tao Yue; Qian Lei; An-Yuan Guo
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

7.  Progressive immune dysfunction with advancing disease stage in renal cell carcinoma.

Authors:  David A Braun; Kelly Street; Kelly P Burke; David L Cookmeyer; Thomas Denize; Christina B Pedersen; Satyen H Gohil; Nicholas Schindler; Lucas Pomerance; Laure Hirsch; Ziad Bakouny; Yue Hou; Juliet Forman; Teddy Huang; Shuqiang Li; Ang Cui; Derin B Keskin; John Steinharter; Gabrielle Bouchard; Maxine Sun; Erica M Pimenta; Wenxin Xu; Kathleen M Mahoney; Bradley A McGregor; Michelle S Hirsch; Steven L Chang; Kenneth J Livak; David F McDermott; Sachet A Shukla; Lars R Olsen; Sabina Signoretti; Arlene H Sharpe; Rafael A Irizarry; Toni K Choueiri; Catherine J Wu
Journal:  Cancer Cell       Date:  2021-03-11       Impact factor: 31.743

8.  A framework for highly multiplexed dextramer mapping and prediction of T cell receptor sequences to antigen specificity.

Authors:  Wen Zhang; Peter G Hawkins; Jing He; Namita T Gupta; Jinrui Liu; Gabrielle Choonoo; Se W Jeong; Calvin R Chen; Ankur Dhanik; Myles Dillon; Raquel Deering; Lynn E Macdonald; Gavin Thurston; Gurinder S Atwal
Journal:  Sci Adv       Date:  2021-05-14       Impact factor: 14.136

9.  DeepImmuno: deep learning-empowered prediction and generation of immunogenic peptides for T-cell immunity.

Authors:  Guangyuan Li; Balaji Iyer; V B Surya Prasath; Yizhao Ni; Nathan Salomonis
Journal:  Brief Bioinform       Date:  2021-05-03       Impact factor: 11.622

10.  Molecular signatures of antitumor neoantigen-reactive T cells from metastatic human cancers.

Authors:  Frank J Lowery; Sri Krishna; Rami Yossef; Neilesh B Parikh; Praveen D Chatani; Nikolaos Zacharakis; Maria R Parkhurst; Noam Levin; Sivasish Sindiri; Abraham Sachs; Kyle J Hitscherich; Zhiya Yu; Nolan R Vale; Yong-Chen Lu; Zhili Zheng; Li Jia; Jared J Gartner; Victoria K Hill; Amy R Copeland; Shirley K Nah; Robert V Masi; Billel Gasmi; Scott Kivitz; Biman C Paria; Maria Florentin; Sanghyun P Kim; Ken-Ichi Hanada; Yong F Li; Lien T Ngo; Satyajit Ray; Mackenzie L Shindorf; Shoshana T Levi; Ryan Shepherd; Chris Toy; Anup Y Parikh; Todd D Prickett; Michael C Kelly; Rachel Beyer; Stephanie L Goff; James C Yang; Paul F Robbins; Steven A Rosenberg
Journal:  Science       Date:  2022-02-03       Impact factor: 47.728

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.