| Literature DB >> 27766961 |
Inimary T Toby1, Mikhail K Levin2, Edward A Salinas3, Scott Christley1, Sanchita Bhattacharya4, Felix Breden5, Adam Buntzman6, Brian Corrie7, John Fonner8, Namita T Gupta9, Uri Hershberg10, Nishanth Marthandan11, Aaron Rosenfeld12, William Rounds13, Florian Rubelt14, Walter Scarborough8, Jamie K Scott15, Mohamed Uduman16, Jason A Vander Heiden9, Richard H Scheuermann17,18,19, Nancy Monson13, Steven H Kleinstein9,16,20, Lindsay G Cowell21.
Abstract
BACKGROUND: The genes that produce antibodies and the immune receptors expressed on lymphocytes are not germline encoded; rather, they are somatically generated in each developing lymphocyte by a process called V(D)J recombination, which assembles specific, independent gene segments into mature composite genes. The full set of composite genes in an individual at a single point in time is referred to as the immune repertoire. V(D)J recombination is the distinguishing feature of adaptive immunity and enables effective immune responses against an essentially infinite array of antigens. Characterization of immune repertoires is critical in both basic research and clinical contexts. Recent technological advances in repertoire profiling via high-throughput sequencing have resulted in an explosion of research activity in the field. This has been accompanied by a proliferation of software tools for analysis of repertoire sequencing data. Despite the widespread use of immune repertoire profiling and analysis software, there is currently no standardized format for output files from V(D)J analysis. Researchers utilize software such as IgBLAST and IMGT/High V-QUEST to perform V(D)J analysis and infer the structure of germline rearrangements. However, each of these software tools produces results in a different file format, and can annotate the same result using different labels. These differences make it challenging for users to perform additional downstream analyses.Entities:
Keywords: Antigen receptor repertoire; C++; Data sharing; Data standards; Immune repertoire; Python; Repertoire profiling; XML
Mesh:
Substances:
Year: 2016 PMID: 27766961 PMCID: PMC5073965 DOI: 10.1186/s12859-016-1214-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1A UML representation of the VDJML schema showing the current scope of VDJML and how the high-level data elements relate to each other. Each box corresponds to an element. Attributes are listed within a box. A “+” symbol beside an attribute name indicates that it is required. Labels on edges connecting an element to a child element indicate the number of instances of a child element type that can be included in a VDJML document
Fig. 2A VDJML file generated on VDJServer. This figure shows the two main parts of a VDJML file, the vdj:meta and vdj:read_results elements. It also shows how information about how the file was generated is recorded in the vdj:meta section. The alignment corresponding to this VDJML file was generated using a local version of IgBLAST. Six of seven vdj:segment_match elements are not shown due to space limitations. These can be seen in Fig. 4
Fig. 4The full vdj:alignment element from the VDJML file shown in Fig. 2. This figure illustrates how the sections of a vdj:alignment element jointly specify a full germline alignment
Names and descriptions of optional attributes for segment match and region elements
| Attribute | Description |
|---|---|
| num_system | Designates the numbering system (e.g., Kabat, IMGT) used to number codon positions |
| identity | Percent of nucleotide sequence identity (e.g., 90 %) between aligned portions of a read sequence and a germline gene segment sequence |
| score | Alignment score, as defined by the aligner software |
| insertions | Number of nucleotide insertions in the read sequence relative to the germline sequence |
| deletions | Number of nucleotide deletions from the read sequence relative to the germline sequence |
| substitutions | Number of nucleotide substitutions in the read sequence relative to the germline sequence |
| stop_codon | True if a stop codon is present in the read sequence |
| mutated_invariant | True if a codon for a conserved amino acid is mutated in the read sequence |
| inverted | True if the read sequence is a reverse-complement to a germline gene segment |
| out_frame_indel | True if an insertion or deletion resulted in a frame shift |
| out_frame_vdj | True if the V(D)J recombination occurred out of frame |
Fig. 3An IgBLAST-generated alignment of an IGH sequence. The sequence was taken from [40]. The standard IgBLAST alignment output is shown
Fig. 5An example workflow showing how libVDJML and VDJML are used with upstream and downstream software packages. Ovals indicate file formats, and rectangles indicate software packages. This workflow shows how raw sequence read data in FASTQ format is processed to generate FASTA-formatted data for input into germline alignment packages, such as IgBLAST and IMGT/High V-QUEST, which each output their own format. These output files can be read by libVDJML for conversion into VDJML format, which can then be taken as input by a variety of downstream programs