| Literature DB >> 34112780 |
Maura Garofalo1, Luca Piccoli1, Margherita Romeo2, Maria Monica Barzago2, Sara Ravasio1,3, Mathilde Foglierini1,4, Milos Matkovic1, Jacopo Sgrignani1, Raoul De Gasparo1, Marco Prunotto5, Luca Varani1, Luisa Diomede2, Olivier Michielin6,7, Antonio Lanzavecchia1, Andrea Cavalli8,9.
Abstract
In systemic light chain amyloidosis (AL), pathogenic monoclonal immunoglobulin light chains (LC) form toxic aggregates and amyloid fibrils in target organs. Prompt diagnosis is crucial to avoid permanent organ damage, but delayed diagnosis is common because symptoms usually appear only after strong organ involvement. Here we present LICTOR, a machine learning approach predicting LC toxicity in AL, based on the distribution of somatic mutations acquired during clonal selection. LICTOR achieves a specificity and a sensitivity of 0.82 and 0.76, respectively, with an area under the receiver operating characteristic curve (AUC) of 0.87. Tested on an independent set of 12 LCs sequences with known clinical phenotypes, LICTOR achieves a prediction accuracy of 83%. Furthermore, we are able to abolish the toxic phenotype of an LC by in silico reverting two germline-specific somatic mutations identified by LICTOR, and by experimentally assessing the loss of in vivo toxicity in a Caenorhabditis elegans model. Therefore, LICTOR represents a promising strategy for AL diagnosis and reducing high mortality rates in AL.Entities:
Year: 2021 PMID: 34112780 PMCID: PMC8192768 DOI: 10.1038/s41467-021-23880-9
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1The presence of SMs differentiates toxic and non-toxic LC sequences.
a Schematic representation of the generation of LC diversity through the processes of VJ recombination and somatic hypermutation. b Alignment of an LC sequence with the corresponding germline (GL) sequence according to Kabat-Chothia scheme using a progressive enumeration for a total of 125 positions (“Methods”). Structural elements of immunoglobulin light chains are depicted on top of the sequences (FR1 = framework 1, CDR1 = complementary determining region 1, FR2 = framework 2, CDR2 = complementary determining region 2, FR3 = framework 3, CDR3 = complementary determining region 3, FR4 = framework 4). Residues in red depict somatic mutations (SMs). The third line shows the encoding scheme used by the classifier with SMs (displayed in bold) and unmutated positions represented by an “X”. c Data are presented as odds ratio (OR) and their 0.95 confidence interval (grey horizontal bars) for all 125 positions of the LC sequences according to our sequential numbering scheme (y-axis). The corresponding Kabat-Chothia enumeration is reported on the right. Structural elements of immunoglobulin light chains are shown on the left. ORs for positions with no statistically significant difference between tox and nox sequences (p ≥ 0.05) are represented as grey dots. Positions with statistically significant differences (p < 0.05) are depicted as either red (OR > 1) or blue (OR < 1) dots. Fisher’s exact test was used as statistical test. .
Fig. 2Machine learning predicts toxic and non-toxic sequences and identifies key features of toxicity.
a AUC of the best configuration for each of the considered machine learners (blue bars). Different combinations of three families of predictor variables were tested, with (✓) or without (✗) the SMOTE balancing technique. b The yellow bars show the best AUC value obtained by each machine learner using only the LC germline VJ rearrangements as predictor variables. c ROC curve for LICTOR (i.e. random forest using AMP + MAP + DAP) compared with a predictor (random forest) using only the LC germline VJ rearrangements as predictor variables. d Top 10 features of each family ranked by information gain. Each feature is enumerated according to our sequential numbering scheme, while the corresponding Kabat-Chothia enumeration for each feature is reported in parenthesis. Kabat-Chothia insertions are reported with lowercase letters. Below each predictor variable are shown the occurrence in tox/nox sequences (a), the p-value (b) and the feature selection general ranking (c) (red = AMP features, blue = MAP features and green = DAP features). e Mapping of the top 10 features of each family on the variable domains of an LC homodimeric structure (PDB ID: 2OLD, represented in white and grey in cartoon). AMP features are shown in red in the left image, MAP features in blue in the middle image and DAP features in green in the right image. The colour code used in the table to represent the three feature families is maintained in their structural representation in (d).
Fig. 3LICTOR accurately predicts the LC toxicity of sequences absent from the training set and is able to revert the pathological phenotype of a cardiotoxic LC.
a LICTOR predictions based on an independent set of LCs, i.e. not present in the training set. Toxic LCs are from patients affected by AL with cardiac involvement, while non-toxic LCs are from patients with multiple myeloma (see also Supplementary Data 2). Predictions are divided according to the clinical phenotype. White part of the bars represent correct LICTOR predictions, while grey part of the bars represent incorrect predictions. b Sequence of a cardiotoxic LC (tox153) used to neutralize the toxic phenotype using LICTOR and the non-toxic features unveiled by feature selection. Tox153 is aligned with the corresponding germline (GL) sequence, and the third line shows the difference in somatic mutations (SMs) between the LC and the GL sequence (LC/GL) with SMs (displayed in bold) and unmutated positions represented by an “X”. c The table represents the non-toxic features according to information gain used to revert the toxic phenotype of tox153. For each predictor variable, we also report the ranking in the specific-feature family (a) and the feature selection general ranking (b). d Proteotoxic effect of tox153 protein, of the two mutants in silico designed by adding non-toxic features (tox153V52L and tox153V52LA56G) and of the tox153 GL protein. The proteotoxic effect of H18 cardiotoxic LC and of the GL proteins H6GL and H9GL are tested as well. Proteins in 10 mM PBS (100 μg/mL) were administered to C. elegans (100 μL/100 worms). Vehicle (10 mM PBS) and 1 mM H2O2 were administered as negative and positive controls, respectively. Pharyngeal activity was determined 24 h after treatment by determining the number of pharyngeal bulb contractions (pumps/min). Data are the pumps/min in mean ± SE (n = 30 worms/assay, two assays). ****p < 0.0001 one-way ANOVA, Dunn’s post hoc test. e Values of pharyngeal bulb contraction (pumps/min) of some LCs (H6, H7, H9 and M2, M7, M8) listed in Supplementary Data 2 are from AL patients with cardiac involvement (cardiotoxic) and patients with multiple myeloma (non-toxic). These values were previously obtained under the same experimental conditions employed in this study[35,36]. Additionally, values of pharyngeal bulb contraction (pumps/min) of H18, H6GL and H9GL are reported as well (n = 30 worms/assay, two assays). Each square is the mean value for H6, H7, H9 and H18 while dots represent the mean of M2, M7, M8, H6GL and H9GL. Horizontal lines represent the mean of cardiotoxic and non-toxic LCs (****p < 0.0001, two-tailed unpaired t-test). The values of pumps/min obtained after the administration of tox153, tox153V52L and tox153V52LA56G are also plotted (triangles).