| Literature DB >> 27551775 |
Julia Bischof1, Saleh M Ibrahim1.
Abstract
Immunoglobulins, as well as T cell receptors, play a key role in adaptive immune responses because of their ability to recognize antigens. Recent advances in next generation sequencing improved also the quality and quantity of individual B cell receptors repertoire sequencing. Unfortunately, appropriate software to exhaustively analyze repertoire data from NGS platforms without limitations of the number of sequences are lacking. Here we introduce a new R package, bcRep, which offers a platform for comprehensive analyses of B cell receptor repertoires, using IMGT/HighV-QUEST formatted data. Methods for gene usage statistics, clonotype classification, as well as diversity measures, are included. Furthermore, functions to filter datasets, to do summary statistics about mutations, as well as visualization methods, are available. To compare samples in respect of gene usage, diversity, amino acid proportions, similar sequences or clones, several functions including also distance measurements, as well as multidimensional scaling methods, are provided.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27551775 PMCID: PMC4995022 DOI: 10.1371/journal.pone.0161569
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Comparison of the different B cell receptor repertoire analysis tools and bcRep.
| feature | bcRep | Change-O | iRAP | IMEX |
|---|---|---|---|---|
| base | R package | command-line, R package | online tool | GUI, command line |
| input | IMGT/HighV-QUEST | IMGT/HighV-QUEST | FASTA | FASTA, IMGT/HighV-QUEST |
| special function to read input | + | + | - | - |
| combine several files | + | - | - | + |
| sequence number limited | - | - | + | - |
| comparison of samples | + | - | - | + |
| sequence filtering | + | - | - | - |
| sequence statistics | + | - | - | + |
| general mutation statistics | + | + | - | - |
| advanced mutation statistics | + | + | - | - |
| lineage trees | - | + | + | - |
| gene usage | + | - | + | + |
| gene/gene combinations | + | - | + | - |
| assemble clonotypes | + | + | + | + |
| clone filtering | + | - | - | - |
| clone statistics | + | - | + | + |
| shared clones | + | - | - | + |
| clone tracking | - | - | + | - |
| amino acid distribution | + | + | - | - |
| diversity | + | + | + | + |
| dissimilarities/distances on gene usage data | + | - | - | - |
| dissimilarities/distances on sequence data | + | + | - | - |
| multidimensional scaling | + | - | - | - |
| several visualization routines | + | - | + | + |
| alignment of sequences | - | + | - | - |
| estimation of repertoire size | - | - | + | - |
‘+’ refers to feature exists,
‘-‘ refers to feature does not exist.
Information was taken from the documentation of the tools.
Functions of the bcRep package and their description.
| Function | Description |
|---|---|
|
combineIMGT() | Combines several IMGT/HighV-QUEST outputs |
|
readIMGT() | Reads IMGT/HighV-QUEST outputs and filters for sequences without results (optionally; see paragraph “Input data”) |
|
sequences.functionality() sequences.junctionFrame() | Gives information about functionality and junction frame usage of input data |
|
sequences.getAnyFunctionality() sequences.getProductives() sequences.getUnproductives() | Filters datasets for productive/unproductive sequences |
|
sequences.getAnyJunctionFrame() sequences.getInFrames() sequences.getOutOfFrames() | Filters datasets for in-frame/out-of-frame sequences |
|
sequences.mutation() | Summary statistics about mutations in V-region, FR1-3 or CDR1-2 sequences, like number of all mutations, number of silent/replacement mutations or R/S ratio |
|
sequences.mutation.AA() | Analyzes all replacement mutations and returns a matrix with proportions of mutations from (germline) amino acid to mutated amino acid + visualization method |
|
plotSequenesMutationAA() | |
|
sequences.mutation.base() | Analyzes nucleotide distributions next to silent mutations (positions -3 to +3) + visualization method |
|
plotSequencesMutationBase() | |
|
clones() | Combines sequences to clonotypes with same V gene and J gene (optional) and a variable CDR3 sequence identity |
|
clones.filterSize() clones.filter Functionality() clones.filterJunctionFrame() | Filters clones for their size, functionality or junction frame usage |
|
clones.CDR3Length() plotClonesCDR3Length() plotClonesCopyNumber() | Statistics and visualizations of CDR3 length distribution and copy number of clones |
|
clones.giniIndex() | Gini index of a set of clones |
|
clones.shared() clones.shared.summary() | Clones shared between at least two samples. Same criteria than in clones() |
|
geneUsage() plotGeneUsage() | V(D)J gene usage in general or stratified for functionality or junction frame usage (for subgroups, genes or alleles) + visualization method |
|
compare.geneUsage() plotCompareGeneUsage() | Comparison of gene usage between different samples (for subgroups, genes or alleles) + visualization method |
|
sequences.geneComb() plotGeneComb() | Gene/gene combinations for V(D)J genes (for subgroups, genes or alleles) + visualization method |
|
aaDistribution() plotAADistribution() | Amino acid distribution of sequences of the same length + visualization method |
|
compare.aaDistribution() plotCompareAADistribution() | Comparisons of amino acid distribution of sequences of the same length of different samples + visualization method |
|
trueDiversity() plotTrueDiversity() | True diversity of sequences of the same length (Richness, Shannon, Simpson) + visualization method |
|
compare.trueDiversity() plotCompareTrueDiversity() | Comparisons of diversity of sequences of the same length of different samples + visualization method |
|
geneUsage.distance() | Several dissimilarity and distance measurements for gene usage data |
|
sequences.distance() | Several dissimilarity and distance measurements for sequence data |
|
dist.PCoA() plotDistPCoA() | Multidimensional scaling (principal coordinate analysis) of distances + visualization method |
Fig 1Example of an analysis of gene/gene combinations.
A color coded heatmap represents the relative abundance of IGHV and IGHD combinations for a selected set of sequences. Bright colors represent low proportions, darker ones high proportions. Dendrograms represent hierarchical clustering of genes.
Fig 2Example of an analysis of replacement mutations.
Percentages of replacement mutations from one amino acid to another are color coded. Darker colors represent higher percentages, compared to bright colors. The amino acids of the germline sequence are shown in rows, the mutated ones in columns. The orange dots represent amino acid changes that result also in a hydropathy change.
Fig 3Example of an analysis of CDR3 amino acid sequence length distribution.
a) Percentages (y-axis) of different CDR3 sequence lengths (x-axis) (upper figure). b) Percentages (y-axis) of productive (orange) and unproductive (blue) sequences per CDR3 sequence length (x-axis) (lower figure).
Conversion of specific diversity indices to true diversity indices [13].
| Index x | Diversity in terms of x | Diversity in terms of pi | |
|---|---|---|---|
| Species richness | |||
| Shannon entropy | exp( | ||
| Simpson concentration | |||
Fig 4Example of a comparison of diversities of CDR3 sequences in two samples.
Diversity indices of order one are given on the y-axis, CDR3 lengths (amino acids) are on the x-axis. Samples are color coded (blue and red). Dots represent mean diversities of all CDR3 sequences of given length; bars represent standard deviation. Diversity is alike in both samples, except for longer sequences (with a length of 21 to 26 amino acids), where CDR3’s of sample A are more diverse than those of sample B.
Fig 5Example of a comparison of Gini indices of three samples.
Gini indices are displayed on the y-axis, samples are on the x-axis. The Gini index can lie between zero and one. An index of zero represents a clone set of equally distributed clones, all having the same size. A Gini index of one would point to a set including only one clone with many sequences.
Fig 6Example of a principal coordinate analysis based on cosine distances on IGHV gene usage distributions of 42 samples.
The dots are color coded for two groups. First (x-axis) and second (y-axis) axes are plotted and the variances explained by these axes are given.