Literature DB >> 35639633

Profiling the baseline performance and limits of machine learning models for adaptive immune receptor repertoire classification.

Chakravarthi Kanduri1, Milena Pavlović1, Lonneke Scheffer1, Keshav Motwani2, Maria Chernigovskaya3, Victor Greiff3, Geir K Sandve1.   

Abstract

BACKGROUND: Machine learning (ML) methodology development for the classification of immune states in adaptive immune receptor repertoires (AIRRs) has seen a recent surge of interest. However, so far, there does not exist a systematic evaluation of scenarios where classical ML methods (such as penalized logistic regression) already perform adequately for AIRR classification. This hinders investigative reorientation to those scenarios where method development of more sophisticated ML approaches may be required.
RESULTS: To identify those scenarios where a baseline ML method is able to perform well for AIRR classification, we generated a collection of synthetic AIRR benchmark data sets encompassing a wide range of data set architecture-associated and immune state-associated sequence patterns (signal) complexity. We trained ≈1,700 ML models with varying assumptions regarding immune signal on ≈1,000 data sets with a total of ≈250,000 AIRRs containing ≈46 billion TCRβ CDR3 amino acid sequences, thereby surpassing the sample sizes of current state-of-the-art AIRR-ML setups by two orders of magnitude. We found that L1-penalized logistic regression achieved high prediction accuracy even when the immune signal occurs only in 1 out of 50,000 AIR sequences.
CONCLUSIONS: We provide a reference benchmark to guide new AIRR-ML classification methodology by (i) identifying those scenarios characterized by immune signal and data set complexity, where baseline methods already achieve high prediction accuracy, and (ii) facilitating realistic expectations of the performance of AIRR-ML models given training data set properties and assumptions. Our study serves as a template for defining specialized AIRR benchmark data sets for comprehensive benchmarking of AIRR-ML methods.
© The Author(s) 2022. Published by Oxford University Press GigaScience.

Entities:  

Keywords:  AIRR; ML; adaptive immune receptor repertoires; baseline performance; benchmarking; machine learning

Mesh:

Substances:

Year:  2022        PMID: 35639633      PMCID: PMC9154052          DOI: 10.1093/gigascience/giac046

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   7.658


  50 in total

Review 1.  The molecular basis for public T-cell responses?

Authors:  Vanessa Venturi; David A Price; Daniel C Douek; Miles P Davenport
Journal:  Nat Rev Immunol       Date:  2008-03       Impact factor: 53.106

Review 2.  T-cell antigen receptor genes and T-cell recognition.

Authors:  M M Davis; P J Bjorkman
Journal:  Nature       Date:  1988-08-04       Impact factor: 49.962

3.  Systems Analysis Reveals High Genetic and Antigen-Driven Predetermination of Antibody Repertoires throughout B Cell Development.

Authors:  Victor Greiff; Ulrike Menzel; Enkelejda Miho; Cédric Weber; René Riedel; Skylar Cook; Atijeh Valai; Telma Lopes; Andreas Radbruch; Thomas H Winkler; Sai T Reddy
Journal:  Cell Rep       Date:  2017-05-16       Impact factor: 9.423

4.  T cell receptor β repertoires as novel diagnostic markers for systemic lupus erythematosus and rheumatoid arthritis.

Authors:  Xiao Liu; Wei Zhang; Ming Zhao; Longfei Fu; Limin Liu; Jinghua Wu; Shuangyan Luo; Longlong Wang; Zijun Wang; Liya Lin; Yan Liu; Shiyu Wang; Yang Yang; Lihua Luo; Juqing Jiang; Xie Wang; Yixin Tan; Tao Li; Bochen Zhu; Yi Zhao; Xiaofei Gao; Ziyun Wan; Cancan Huang; Mingyan Fang; Qianwen Li; Huanhuan Peng; Xiangping Liao; Jinwei Chen; Fen Li; Guanghui Ling; Hongjun Zhao; Hui Luo; Zhongyuan Xiang; Jieyue Liao; Yu Liu; Heng Yin; Hai Long; Haijing Wu; Huanming Yang; Jian Wang; Qianjin Lu
Journal:  Ann Rheum Dis       Date:  2019-05-17       Impact factor: 19.103

5.  Precise determination of the diversity of a combinatorial antibody library gives insight into the human immunoglobulin repertoire.

Authors:  Jacob Glanville; Wenwu Zhai; Jan Berka; Dilduz Telman; Gabriella Huerta; Gautam R Mehta; Irene Ni; Li Mei; Purnima D Sundar; Giles M R Day; David Cox; Arvind Rajpal; Jaume Pons
Journal:  Proc Natl Acad Sci U S A       Date:  2009-10-29       Impact factor: 11.205

6.  Investigation of Antigen-Specific T-Cell Receptor Clusters in Human Cancers.

Authors:  Hongyi Zhang; Longchao Liu; Jian Zhang; Jiahui Chen; Jianfeng Ye; Sachet Shukla; Jian Qiao; Xiaowei Zhan; Hao Chen; Catherine J Wu; Yang-Xin Fu; Bo Li
Journal:  Clin Cancer Res       Date:  2019-12-12       Impact factor: 12.531

7.  A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding.

Authors:  Rahmad Akbar; Philippe A Robert; Milena Pavlović; Jeliazko R Jeliazkov; Igor Snapkov; Andrei Slabodkin; Cédric R Weber; Lonneke Scheffer; Enkelejda Miho; Ingrid Hobæk Haff; Dag Trygve Tryslew Haug; Fridtjof Lund-Johansen; Yana Safonova; Geir K Sandve; Victor Greiff
Journal:  Cell Rep       Date:  2021-03-16       Impact factor: 9.423

8.  Specificity, Privacy, and Degeneracy in the CD4 T Cell Receptor Repertoire Following Immunization.

Authors:  Yuxin Sun; Katharine Best; Mattia Cinelli; James M Heather; Shlomit Reich-Zeliger; Eric Shifrut; Nir Friedman; John Shawe-Taylor; Benny Chain
Journal:  Front Immunol       Date:  2017-04-13       Impact factor: 7.561

9.  High-throughput immune repertoire analysis with IGoR.

Authors:  Quentin Marcou; Thierry Mora; Aleksandra M Walczak
Journal:  Nat Commun       Date:  2018-02-08       Impact factor: 14.919

10.  Individualized VDJ recombination predisposes the available Ig sequence space.

Authors:  Philippe A Robert; Victor Greiff; Andrei Slabodkin; Maria Chernigovskaya; Ivana Mikocziova; Rahmad Akbar; Lonneke Scheffer; Milena Pavlović; Habib Bashour; Igor Snapkov; Brij Bhushan Mehta; Cédric R Weber; Jose Gutierrez-Marcos; Ludvig M Sollid; Ingrid Hobæk Haff; Geir Kjetil Sandve
Journal:  Genome Res       Date:  2021-11-23       Impact factor: 9.043

View more
  3 in total

1.  Profiling the baseline performance and limits of machine learning models for adaptive immune receptor repertoire classification.

Authors:  Chakravarthi Kanduri; Milena Pavlović; Lonneke Scheffer; Keshav Motwani; Maria Chernigovskaya; Victor Greiff; Geir K Sandve
Journal:  Gigascience       Date:  2022-05-25       Impact factor: 7.658

Review 2.  Machine Learning Approaches to TCR Repertoire Analysis.

Authors:  Yotaro Katayama; Ryo Yokota; Taishin Akiyama; Tetsuya J Kobayashi
Journal:  Front Immunol       Date:  2022-07-15       Impact factor: 8.786

3.  Reference-based comparison of adaptive immune receptor repertoires.

Authors:  Cédric R Weber; Teresa Rubio; Longlong Wang; Wei Zhang; Philippe A Robert; Rahmad Akbar; Igor Snapkov; Jinghua Wu; Marieke L Kuijjer; Sonia Tarazona; Ana Conesa; Geir K Sandve; Xiao Liu; Sai T Reddy; Victor Greiff
Journal:  Cell Rep Methods       Date:  2022-08-22
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.