| Literature DB >> 31138102 |
Hiroki Konishi1,2, Daisuke Komura1, Hiroto Katoh1, Shinichiro Atsumi1,3, Hirotomo Koda1,4, Asami Yamamoto1, Yasuyuki Seto3, Masashi Fukayama4, Rui Yamaguchi5, Seiya Imoto2, Shumpei Ishikawa6.
Abstract
BACKGROUND: The recent success of immunotherapy in treating tumors has attracted increasing interest in research related to the adaptive immune system in the tumor microenvironment. Recent advances in next-generation sequencing technology enabled the sequencing of whole T-cell receptors (TCRs) and B-cell receptors (BCRs)/immunoglobulins (Igs) in the tumor microenvironment. Since BCRs/Igs in tumor tissues have high affinities for tumor-specific antigens, the patterns of their amino acid sequences and other sequence-independent features such as the number of somatic hypermutations (SHMs) may differ between the normal and tumor microenvironments. However, given the high diversity of BCRs/Igs and the rarity of recurrent sequences among individuals, it is far more difficult to capture such differences in BCR/Ig sequences than in TCR sequences. The aim of this study was to explore the possibility of discriminating BCRs/Igs in tumor and in normal tissues, by capturing these differences using supervised machine learning methods applied to RNA sequences of BCRs/Igs.Entities:
Keywords: B-cell receptor/immunoglobulin; Cancer; Machine learning
Mesh:
Substances:
Year: 2019 PMID: 31138102 PMCID: PMC6537402 DOI: 10.1186/s12859-019-2853-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Pipeline for obtaining the BCRs/Igs data used as the query for the classification machine from normal or tumor tissue
Range of hyperparameter searched in the CNN classifier
| Name of hyperparameter | Lower limit | Upper limit |
|---|---|---|
| Initial learning rate | 1e-7 | 1e-3 |
| Dropout rate | 0.4 | 1.0 |
| # of convolutional layers | 1 | 2 |
| # of convolutional kernels | 80 | 300 |
| The width of convolution kernels | 2 | 3 |
| The width of pooling filters | 2 | 3 |
| # of fully connected layers | 1 | 3 |
| # of units in a fully connected layer | 100 | 300 |
Fig. 2Workflow of individual BCR/Ig classification and tissue classification between normal/tumor environment using various sequence-independent features
Fig. 3Letter-value plots are showing distribution of the area under the Receiver Operating Characteristic curve (AUROC) calculated on 89 held-out patient. The figures are illustrating the comparison of (a) different models using amino acid sequences, (b) different trimming/padding strategies, (c) models using various sequence-independent features as well as ensemble model combining them, and (d) CNN against different length of CDR3. Barplots (e) shows the coefficients of linear ensemble model
Fig. 4Sequence motifs constructed for all CDRs. Each motif is made using sequences that have average length of each region. We note that the y-axes of CDR3 is different from those of CDR1, CDR2
Fig. 5Receiver Operating Characteristic (ROC) curves showing tissue-level classification performance. Each figure illustrates the comparison of ROC-curves of (a) models using average score from sequence-independent features, (b) those using median score, (c) those using mode score, (d) ensemble model and clonal entropy, and (e) ensemble model and Ostmeyer’s model