| Literature DB >> 34253751 |
John-William Sidhom1,2,3, Alexander S Baras4,5,6.
Abstract
SARS-CoV-2 infection is characterized by a highly variable clinical course with patients experiencing asymptomatic infection all the way to requiring critical care support. This variation in clinical course has led physicians and scientists to study factors that may predispose certain individuals to more severe clinical presentations in hopes of either identifying these individuals early in their illness or improving their medical management. We sought to understand immunogenomic differences that may result in varied clinical outcomes through analysis of T-cell receptor sequencing (TCR-Seq) data in the open access ImmuneCODE database. We identified two cohorts within the database that had clinical outcomes data reflecting severity of illness and utilized DeepTCR, a multiple-instance deep learning repertoire classifier, to predict patients with severe SARS-CoV-2 infection from their repertoire sequencing. We demonstrate that patients with severe infection have repertoires with higher T-cell responses associated with SARS-CoV-2 epitopes and identify the epitopes that result in these responses. Our results provide evidence that the highly variable clinical course seen in SARS-CoV-2 infection is associated to certain antigen-specific responses.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34253751 PMCID: PMC8275616 DOI: 10.1038/s41598-021-93608-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Demographic data of the ImmuneCODE database.
| Characteristic | NIH/NIAID (N | ISB (N |
|---|---|---|
| Median age (IQR)—yr | 62 (54-75) | 61 (48-75) |
| Male sex—no. (%) | 140 (77) | 71 (46) |
| Female sex—no. (%) | 41 (23) | 83 (54) |
| Caucasian | 93 (61) | 180 (100) |
| Asian or Pacific Islander | 25 (16) | |
| Unknown racial group | 24 (16) | |
| Black or African American | 8(5) | |
| Mixed racial group | 2(1) | |
| Days from onset to sample (IQR)—days | 18 (13-25) | 13 (9-21) |
| Severe | 38 (22) | 38 (36) |
| Mild | 131 (78) | 68 (64) |
Demographic data for ImmuneCODE database were collected including biological sex, age, and racial group. Additionally shown, time between symptom onset and sampling along with proportion of individuals with severe illness; documented with either ICU admission or WHO ordinal scale > 4 (corresponding to individuals requiring critical care needs).
Figure 1Associations of TCR diversity and abundance metrics to disease severity. (a) TCR-Seq diversity and abundance metrics were collected and stratified by disease severity in both the COVID-19-ISB and COVID-19-NIH/NIAID cohorts. (Mann–Whitney rank test: ***p val < 0.001, **p val < 0.01, *p val < 0.05, with multiple hypothesis testing with Benjamini/Hochberg correction, = 0.05). (b) Uni-variate logistic regression models were fit on all TCR-Seq sample-level measures and performance was assessed via receiving operating characteristic (ROC) curves and calculating area under the curve (AUC) with 2-fold cross-validation with 100 iterations, averaging predictions across all iterations and folds. (c) Multi-variate logistic regression models were fit on all TCR-Seq sample-level measures and performance was assessed in the same manner as previously described in (b).
Figure 2Deep learning models identify TCR signature of severe disease. (a) DeepTCR’s multiple instance repertoire classifier was used to fit predictive models of severe/mild illness in Monte Carlo cross-validation across both the NIH/NIAID and ISB cohort of patients. Receiver Operating Characteristic (ROC) curves are shown with corresponding Area Under Curve (AUC) measurements. (b) DeepTCR’s repertoire classifier was also fit to identify TCR repertoire differences between samples taken from the NIH/NIAID or ISB cohort. Receiver Operating Characteristic (ROC) curves are shown with corresponding Area Under Curve (AUC) measurements. (c) Following model fitting, top predictive sequences for severe disease were extracted from the network and residue sensitivity logos (RSL’s) were created highlighting predictive residues. (d) Following model fitting, top predictive sequences for NIH/NIAID versus ISB cohorts were extracted from the network and residue sensitivity logos (RSL’s) were created highlighting predictive residues. (e) All TCR sequences present in the samples were mapped to being COVID(+) or COVID(−), based on empirically derived antigen-specificity data from the MIRA assay and plotted by their corresponding prediction values for severe illness. A threshold of P = 0.90 was used to create contingency tables of TCR sequences called to carry the severe disease as well as the COVID(+) signature and used to calculate enrichment scores (Fisher’s Exact Text: ***p val < 0.001). (f) COVID(+) sequences were further stratified by open reading frame (ORF) in the viral genome as well as by CD8/CD4 specific TCRs as provided by the MIRA assay. (g) SARS-CoV-2 specific TCR sequences, as determined from MIRA assay results in the ImmuneCODE database were collected and mapped to their corresponding epitope sequences and used in place to train the repertoire classifier. Receiver Operating Characteristic (ROC) curves are shown with corresponding Area Under Curve (AUC) measurements.