| Literature DB >> 34090257 |
Maria Tió-Coma1, Szymon M Kiełbasa2, Susan J F van den Eeden1, Hailiang Mei2, Johan Chandra Roy3, Jacco Wallinga4, Marufa Khatun3, Sontosh Soren3, Abu Sufian Chowdhury3, Khorshed Alam3, Anouk van Hooij1, Jan Hendrik Richardus5, Annemieke Geluk6.
Abstract
BACKGROUND: Leprosy, a chronic infectious disease caused by Mycobacterium leprae, is often late- or misdiagnosed leading to irreversible disabilities. Blood transcriptomic biomarkers that prospectively predict those who progress to leprosy (progressors) would allow early diagnosis, better treatment outcomes and facilitate interventions aimed at stopping bacterial transmission. To identify potential risk signatures of leprosy, we collected whole blood of household contacts (HC, n=5,352) of leprosy patients, including individuals who were diagnosed with leprosy 4-61 months after sample collection.Entities:
Keywords: Biomarker; Diagnostics; Leprosy; Prediction; RNA-Seq; Transcriptomics
Mesh:
Substances:
Year: 2021 PMID: 34090257 PMCID: PMC8182229 DOI: 10.1016/j.ebiom.2021.103379
Source DB: PubMed Journal: EBioMedicine ISSN: 2352-3964 Impact factor: 8.143
Fig. 1Study design to identify a transcriptomic signature associated with leprosy risk. In blue samples used in the discovery set (RNA-Seq) and in green samples used in the validation set (reverse transcription quantitative PCR (RT-qPCR)). Progressors are household contacts who developed leprosy within 4-61 months (Fig. S1) after recruitment. t=1 is the timepoint before disease and t=2 is the timepoint of leprosy diagnosis. Excluded QC (quality check) RNA refers to samples that did not meet RNA quality check for RNA-Seq (RNA integrity number [RIN] ≤ 6) and were not used for RT-qPCR (validation set). Excluded QC RNA-Seq refers to samples for which RNA-Seq data did not meet the quality requirements with respect to number and distribution of reads (Fig. S2). Excluded QC RT-qPCR were samples showing outlier Cycle threshold (Ct) values (>15) for the reference GAPDH gene (medians of two assays: 9.6 and 7.3). Training and test subsets were used in Random Forest to predict leprosy development. *RT-qPCR data of 8 samples (4 progressors and 4 HC controls) from the discovery set (RNA-Seq) were included in the training subset of the RT-qPCR Random Forest to improve the training of the model.
Cohort characterization.
| Group | Subjects | Sex | Age range (n) | RJ Classification | BI | Time to diagnosis (n) |
|---|---|---|---|---|---|---|
| Progressors | 40 | 26 females | 6-15 years (7) | 37 BT | 34 BI-0 | 4-12 months (6) |
| HC | 40 | 27 females | 6-15 years (7) | - | - | |
| Group | Subjects | Sex | Age range (n) | RJ Classification | BI | Time to diagnosis (n) |
| Progressors | 43 | 23 females | 6-15 years (12) | 40 BT | 35 BI-0 | 4-12 months (7) |
| HC | 43 | 23 females | 6-15 years (12) | - | ||
Group (leprosy progressors or household contact [HC] controls), number of individuals used for analyses, number of females and males, number of individuals in certain age range (at t=1), number of leprosy progressors according to Ridley-Jopling (RJ) classification (5), bacteriological index (BI) of progressors and time to diagnosis for progressors (time between the first sample before clinical diagnosis (t=1) and leprosy diagnosis (t=2)) are shown for the samples used in the RNA-Seq (discovery set) and the RT-qPCR (validation set) analyses. RT-qPCR: reverse transcription quantitative PCR. HC: Household contacts; BT: borderline tuberculoid leprosy; TT: tuberculoid leprosy; I: indeterminate leprosy; PN: pure neural leprosy; BI-und: bacteriological index undetermined as patient refused or was too young for skin slit smear and PB leprosy was diagnosed according to the number of lesions.
Fig. 2RNA-Seq differential gene expression analysis of leprosy progressors before clinical diagnosis and household contacts. RNA-Seq data of whole blood from leprosy progressors (n=39) 4-61 months before clinical diagnosis of leprosy (t=1 or First time point) was compared to control household contacts (HC/HHC, n=39), after exclusion of one sample per group due to low number of on-feature unique reads (Fig. S2). A two-group (unpaired samples) analysis was performed using edgeR (71) in R. a) Boxplot of Trimmed Mean of the M-values (TMM)-normalized counts per million mapped reads (CPM) per group of the most significantly differentially expressed genes. Y-axis shows CPM, expressed in power of 10 (left) or power of 2 (right). Progressors at t=1 are shown in red and HC controls in blue. b) Histogram of p-values. Number of genes (y-axis) with a given p-value (x-axis). c) MA plot showing log2 of fold change (FC) in gene expression (y-axis) and log2 of average CPM (x-axis) per gene. In red, genes significantly differentially expressed (adjusted p-value < 0.05) and in black, genes not differentially expressed. C6orf48 is also known as SNHG32 and C19orf60 as REX1BD.
Functional analysis of differentially expressed genes in blood of leprosy progressors.
| GO terms | adj p-value | % associated genes | GO terms | adj p-value | % associated genes |
|---|---|---|---|---|---|
| SRP-dependent cotranslational protein targeting to membrane | 1.50E-33 | 41.51 | organelle organization | 1.33E-20 | 5.92 |
| cotranslational protein targeting to membrane | 1.05E-32 | 40.00 | cellular component organization | 1.50E-17 | 5.02 |
| protein targeting to ER | 5.23E-32 | 37.50 | regulation of cellular component organization | 7.92E-15 | 6.31 |
| establishment of protein localization to endoplasmic reticulum | 2.87E-31 | 36.29 | regulation of organelle organization | 5.61E-14 | 7.49 |
| protein localization to endoplasmic reticulum | 2.22E-30 | 31.79 | positive regulation of organelle organization | 6.43E-13 | 9.28 |
| Canonical pathway | adj p-value | % associated genes | Canonical pathway | adj p-value | % associated genes |
| eIF2 signalling | 4.29E-28 | 21.90 | clathrin-mediated endocytosis signalling | 3.57E-08 | 11.40 |
| mTOR signalling | 3.04E-13 | 14.80 | 14-3-3-mediated signalling | 6.72E-07 | 12.60 |
| regulation of eIF4 and p70S6K signalling | 1.70E-12 | 16.60 | integrin signalling | 8.44E-07 | 9.90 |
| coronavirus pathogenesis pathway | 1.63E-10 | 15.30 | FAK signalling | 2.92E-06 | 13.70 |
| oxidative phosphorylation | 1.04E-06 | 13.80 | p70S6K signalling | 4.13E-06 | 11.60 |
Top Gene Ontology (GO) terms identified by ClueGO (70) and canonical pathways identified by Ingenuity Pathway Analysis (Qiagen) from 836 upregulated and 777 downregulated genes in leprosy progressor before clinical diagnosis compared to household contacts who did not develop leprosy. P-values were adjusted for multiple testing with Bonferroni correction (adj p-value). Percentages of associated upregulated or downregulated genes from the pathway are shown.
Gene selection using a machine learning approach.
| Gene name | Ensembl ID | Type of RNA |
|---|---|---|
| ENSG00000204387 | ncRNA, small nuclear RNA | |
| ENSG00000198886 | protein coding | |
| ENSG00000198786 | protein coding | |
| ENSG00000198763 | protein coding | |
| ENSG00000283633 | lncRNA | |
| ENSG00000198804 | protein coding | |
| ENSG00000135090 | protein coding | |
| ENSG00000135597 | protein coding | |
| ENSG00000198727 | protein coding | |
| ENSG00000141933 | protein coding | |
| ENSG00000138722 | protein coding | |
| ENSG00000150991 | protein coding | |
| ENSG00000248527 | pseudogene | |
| ENSG00000266538 | lncRNA | |
| ENSG00000006015 | protein coding | |
| ENSG00000175602 | protein coding | |
| ENSG00000225864 | pseudogene | |
| ENSG00000200183 | pseudogene | |
| ENSG00000279227 | lncRNA |
Genes identified by Random Forest to predict leprosy progression amongst household contacts of leprosy patients. In bold genes that were included in the final RNA-Seq signature and tested by reverse transcription quantitative PCR (RT-qPCR). Underlined the genes present in the final RT-qPCR RISK4LEP signature.
Fig. 3AUC of leprosy risk RNA-Seq and RT-qPCR signatures in blood. Area Under the Curve (AUC) of risk signatures in whole blood to prospectively predict leprosy progressors within household contacts (HC). The models were built using Random Forest, were trained with 80% of the sample sets and evaluated in 20%. a) AUC of RNA-Seq 19-gene signature where 8 to 20 features/genes were automatically selected by the model from a total of 1,613 features. b) AUC of RNA-Seq 13-gene signature based on the 19-gene signature but excluding pseudogenes and long non-coding (lnc)RNA (n=6). c) AUC of reverse transcription quantitative PCR (RT-qPCR) 13-gene signature selected in the RNA-Seq signature. d) AUC of RT-qPCR 4-gene signature RISK4LEP (final RT-qPCR signature) where only genes significantly differentially expressed in the RT-qPCR were selected.
RT-qPCR ΔCts of the 13-gene signature in leprosy progressors and household contacts.
| Gene | p-value | ΔCt progressors | ΔCt HC | ΔΔCt | FC | Log2FC |
|---|---|---|---|---|---|---|
| 0.840901 | 12.32 | 12.48 | -0.16 | 1.12 | 0.16 | |
| 0.654911 | 7.73 | 7.88 | -0.15 | 1.11 | 0.15 | |
| 0.594361 | -2.22 | -2.31 | 0.09 | 0.94 | -0.09 | |
| 0.054951 | -1.50 | -1.90 | 0.40 | 0.76 | -0.40 | |
| 0.048303 | -1.03 | -1.26 | 0.23 | 0.85 | -0.23 | |
| 0.062337 | -1.64 | -1.90 | 0.26 | 0.84 | -0.26 | |
| 0.159386 | -0.94 | -1.17 | 0.24 | 0.85 | -0.24 | |
| 0.298712 | 4.24 | 4.40 | -0.16 | 1.12 | 0.16 | |
| 0.010086 | 2.92 | 3.18 | -0.27 | 1.20 | 0.27 | |
| 0.238032 | 1.77 | 2.03 | -0.26 | 1.20 | 0.26 | |
| 0.178822 | 4.26 | 4.30 | -0.04 | 1.03 | 0.04 | |
| 0.000448 | 5.20 | 5.62 | -0.42 | 1.34 | 0.42 | |
| 0.005958 | -1.07 | -0.89 | -0.18 | 1.13 | 0.18 |
P-values of Mann-Whitney U test of reverse transcription quantitative PCR (RT-qPCR) ΔCts (Cycle threshold (Ct) of target gene – Ct of reference gene) between leprosy progressors (n=47) and household contact (HC) controls (n=47). In bold genes significantly differentially expressed (p-value <0.05). Median of ΔCts per group, ΔΔCt (median ΔCt progressors – median ΔCt HC), Fold Change (FC, 2–∆∆Ct) for progressors and log2 of Fold Change (log2FC).
Fig. 4Boxplot showing -ΔCts of 13 genes. Boxplot of -ΔCts (-(Cycle threshold (Ct) target gene – Ct reference gene, GAPDH)) obtained by reverse transcription quantitative PCR (RT-qPCR) in whole blood. Genes identified in the RNA-Seq signature (n=13) are shown. Leprosy progressors before clinical diagnosis of leprosy are shown in red (t=1, n=47) and household contact (HC) controls in blue (n=47). *Genes significantly differentially expressed between the two groups using Mann-Whitney U test (MT-ND2, REX1BD, TPGS1 and UBC).