Yuval Elhanati1, Quentin Marcou1, Thierry Mora2, Aleksandra M Walczak1. 1. Laboratoire de physique théorique, CNRS, UPMC and Ecole normale supérieure, Paris, France. 2. Laboratoire de physique statistique, CNRS, UPMC and Ecole normale supérieure, Paris, France.
Abstract
MOTIVATION: The diversity of the immune repertoire is initially generated by random rearrangements of the receptor gene during early T and B cell development. Rearrangement scenarios are composed of random events-choices of gene templates, base pair deletions and insertions-described by probability distributions. Not all scenarios are equally likely, and the same receptor sequence may be obtained in several different ways. Quantifying the distribution of these rearrangements is an essential baseline for studying the immune system diversity. Inferring the properties of the distributions from receptor sequences is a computationally hard problem, requiring enumerating every possible scenario for every sampled receptor sequence. RESULTS: We present a Hidden Markov model, which accounts for all plausible scenarios that can generate the receptor sequences. We developed and implemented a method based on the Baum-Welch algorithm that can efficiently infer the parameters for the different events of the rearrangement process. We tested our software tool on sequence data for both the alpha and beta chains of the T cell receptor. To test the validity of our algorithm, we also generated synthetic sequences produced by a known model, and confirmed that its parameters could be accurately inferred back from the sequences. The inferred model can be used to generate synthetic sequences, to calculate the probability of generation of any receptor sequence, as well as the theoretical diversity of the repertoire. We estimate this diversity to be [Formula: see text] for human T cells. The model gives a baseline to investigate the selection and dynamics of immune repertoires. AVAILABILITY AND IMPLEMENTATION: Source code and sample sequence files are available at https://bitbucket.org/yuvalel/repgenhmm/downloads CONTACT: elhanati@lpt.ens.fr or tmora@lps.ens.fr or awalczak@lpt.ens.fr.
MOTIVATION: The diversity of the immune repertoire is initially generated by random rearrangements of the receptor gene during early T and B cell development. Rearrangement scenarios are composed of random events-choices of gene templates, base pair deletions and insertions-described by probability distributions. Not all scenarios are equally likely, and the same receptor sequence may be obtained in several different ways. Quantifying the distribution of these rearrangements is an essential baseline for studying the immune system diversity. Inferring the properties of the distributions from receptor sequences is a computationally hard problem, requiring enumerating every possible scenario for every sampled receptor sequence. RESULTS: We present a Hidden Markov model, which accounts for all plausible scenarios that can generate the receptor sequences. We developed and implemented a method based on the Baum-Welch algorithm that can efficiently infer the parameters for the different events of the rearrangement process. We tested our software tool on sequence data for both the alpha and beta chains of the T cell receptor. To test the validity of our algorithm, we also generated synthetic sequences produced by a known model, and confirmed that its parameters could be accurately inferred back from the sequences. The inferred model can be used to generate synthetic sequences, to calculate the probability of generation of any receptor sequence, as well as the theoretical diversity of the repertoire. We estimate this diversity to be [Formula: see text] for human T cells. The model gives a baseline to investigate the selection and dynamics of immune repertoires. AVAILABILITY AND IMPLEMENTATION: Source code and sample sequence files are available at https://bitbucket.org/yuvalel/repgenhmm/downloads CONTACT: elhanati@lpt.ens.fr or tmora@lps.ens.fr or awalczak@lpt.ens.fr.
Authors: Dmitriy A Bolotin; Stanislav Poslavsky; Igor Mitrophanov; Mikhail Shugay; Ilgar Z Mamedov; Ekaterina V Putintseva; Dmitriy M Chudakov Journal: Nat Methods Date: 2015-05 Impact factor: 28.547
Authors: Mark Izraelson; Tatiana O Nakonechnaya; Bruno Moltedo; Evgeniy S Egorov; Sofya A Kasatskaya; Ekaterina V Putintseva; Ilgar Z Mamedov; Dmitriy B Staroverov; Irina I Shemiakina; Maria Y Zakharova; Alexey N Davydov; Dmitriy A Bolotin; Mikhail Shugay; Dmitriy M Chudakov; Alexander Y Rudensky; Olga V Britanova Journal: Immunology Date: 2017-11-27 Impact factor: 7.397
Authors: Mikhail V Pogorelyy; Yuval Elhanati; Quentin Marcou; Anastasiia L Sycheva; Ekaterina A Komech; Vadim I Nazarov; Olga V Britanova; Dmitriy M Chudakov; Ilgar Z Mamedov; Yury B Lebedev; Thierry Mora; Aleksandra M Walczak Journal: PLoS Comput Biol Date: 2017-07-06 Impact factor: 4.475
Authors: Li Zhang; Jason Cham; Alan Paciorek; James Trager; Nadeem Sheikh; Lawrence Fong Journal: BMC Bioinformatics Date: 2017-02-27 Impact factor: 3.169
Authors: Zachary Sethna; Yuval Elhanati; Crissy S Dudgeon; Curtis G Callan; Arnold J Levine; Thierry Mora; Aleksandra M Walczak Journal: Proc Natl Acad Sci U S A Date: 2017-02-14 Impact factor: 11.205