| Literature DB >> 30323809 |
Jason Anthony Vander Heiden1, Susanna Marquez2, Nishanth Marthandan3, Syed Ahmad Chan Bukhari2, Christian E Busse4, Brian Corrie5, Uri Hershberg6,7,8, Steven H Kleinstein2,9, Frederick A Matsen Iv10, Duncan K Ralph10, Aaron M Rosenfeld6, Chaim A Schramm11, Scott Christley12, Uri Laserson13.
Abstract
Increased interest in the immune system's involvement in pathophysiological phenomena coupled with decreased DNA sequencing costs have led to an explosion of antibody and T cell receptor sequencing data collectively termed "adaptive immune receptor repertoire sequencing" (AIRR-seq or Rep-Seq). The AIRR Community has been actively working to standardize protocols, metadata, formats, APIs, and other guidelines to promote open and reproducible studies of the immune repertoire. In this paper, we describe the work of the AIRR Community's Data Representation Working Group to develop standardized data representations for storing and sharing annotated antibody and T cell receptor data. Our file format emphasizes ease-of-use, accessibility, scalability to large data sets, and a commitment to open and transparent science. It is composed of a tab-delimited format with a specific schema. Several popular repertoire analysis tools and data repositories already utilize this AIRR-seq data format. We hope that others will follow suit in the interest of promoting interoperable standards.Entities:
Keywords: AIRR-seq; B cell; Rep-Seq; T cell; antibody; immunoglobulin; immunology; repertoire
Mesh:
Substances:
Year: 2018 PMID: 30323809 PMCID: PMC6173121 DOI: 10.3389/fimmu.2018.02206
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Figure 1Standards proliferation. The DRWG has been actively engaging as many community members as possible to drive adoption of our new standard. https://xkcd.com/927/.
Tools and databases supporting the AIRR Rearrangement schema.
| AIRR Python Library | 1.2 | Input, output and validation |
| AIRR R Library | 1.2 | Input, output and validation |
| IgBLAST | 1.10 | Output |
| IGoR | TBD | Input and output |
| Immcantation:Change-O | 0.4.2 | Input, output and conversion |
| ImmuneDB | 0.24.0 | Output |
| iReceptor | 2.0 | Input, output and conversion |
| MiXCR | 2.2.1 | Output |
| OLGA | TBD | Input and output |
| Partis | TBD | Output |
| SONAR | 3.0 | Output |
| TRIgS | 2 | Input |
| VDJServer | 1.2.0 | Input and output |
| Vidjil-algo | 2018.10 | Output |
| Vidjil Web Platform | TBD | Input and conversion |
Figure 2AIRR Rearrangement schema v1.2.0. Overview of the schema for representing annotated rearrangements. Fields in bold are required columns in the TSV. All fields, including those that are required columns in the TSV header, can be set to null by assigning an empty string as the value.
Figure 3Interoperability example. Shown is a set of flowcharts depicting examples of the interoperability facilitated by the AIRR Rearrangement schema. (A) Starting with repertoire sequencing data in the FASTA format, either IgBLAST or IMGT/HighV-QUEST in combination with Change-O's conversion tool may be used. Once data conforms to the AIRR Rearrangement schema, Change-O can be used to generated MiAIRR-compliant GenBank/TLS submissions. AIRR-seq data from separate tools and pipelines can easily be combined for aggregate analysis. (B) Data may be exported from or imported to the iReceptor or VDJServer repositories using the TSV format. Data is returned from queries to the separate repositories using the TSV format and can be integrated into a single collection for downstream analysis.