| Literature DB >> 32184234 |
Mary M Petzke1, Konstantin Volyanskyy2, Yong Mao2, Byron Arevalo3, Raphael Zohn3, Johanna Quituisaca3, Gary P Wormser4, Nevenka Dimitrova2, Ira Schwartz3.
Abstract
A bioinformatics approach was employed to identify transcriptome alterations in the peripheral blood mononuclear cells of well-characterized human subjects who were diagnosed with early disseminated Lyme disease (LD) based on stringent microbiological and clinical criteria. Transcriptomes were assessed at the time of presentation and also at approximately 1 month (early convalescence) and 6 months (late convalescence) after initiation of an appropriate antibiotic regimen. Comparative transcriptomics identified 335 transcripts, representing 233 unique genes, with significant alterations of at least 2-fold expression in acute- or convalescent-phase blood samples from LD subjects relative to healthy donors. Acute-phase blood samples from LD subjects had the largest number of differentially expressed transcripts (187 induced, 54 repressed). This transcriptional profile, which was dominated by interferon-regulated genes, was sustained during early convalescence. 6 months after antibiotic treatment the transcriptome of LD subjects was indistinguishable from that of healthy controls based on two separate methods of analysis. Return of the LD expression profile to levels found in control subjects was concordant with disease outcome; 82% of subjects with LD experienced at least one symptom at the baseline visit compared to 43% at the early convalescence time point and only a single patient (9%) at the 6-month convalescence time point. Using the random forest machine learning algorithm, we developed an efficient computational framework to identify sets of 20 classifier genes that discriminated LD from other bacterial and viral infections. These novel LD biomarkers not only differentiated subjects with acute disseminated LD from healthy controls with 96% accuracy but also distinguished between subjects with acute and resolved (late convalescent) disease with 97% accuracy.IMPORTANCE Lyme disease (LD), caused by Borrelia burgdorferi, is the most common tick-borne infectious disease in the United States. We examined gene expression patterns in the blood of individuals with early disseminated LD at the time of diagnosis (acute) and also at approximately 1 month and 6 months following antibiotic treatment. A distinct acute LD profile was observed that was sustained during early convalescence (1 month) but returned to control levels 6 months after treatment. Using a computer learning algorithm, we identified sets of 20 classifier genes that discriminate LD from other bacterial and viral infections. In addition, these novel LD biomarkers are highly accurate in distinguishing patients with acute LD from healthy subjects and in discriminating between individuals with active and resolved infection. This computational approach offers the potential for more accurate diagnosis of early disseminated Lyme disease. It may also allow improved monitoring of treatment efficacy and disease resolution.Entities:
Keywords: Borrelia burgdorferizzm321990; Lyme disease; diagnostics; random forest; transcriptome
Mesh:
Substances:
Year: 2020 PMID: 32184234 PMCID: PMC7078463 DOI: 10.1128/mBio.00047-20
Source DB: PubMed Journal: mBio Impact factor: 7.867
Clinical characteristics of human subjects
| Parameter | Lyme disease subjects | Healthy donors |
|---|---|---|
| Total no. of subjects | 39 | 21 |
| Gender, no. (%) | ||
| Male | 22 (56) | 9 (43) |
| Female | 17 (44) | 12 (57) |
| Age, no. (%) | ||
| <60 yr | 28 (68) | 16 (76) |
| ≥60 yr | 11 (28) | 3 (14) |
| EM rash | ||
| Median size, cm2 (range) | 104 (11–1,440) | |
| Median duration, days (range) | 5 (1–60) | |
| MEM, no. (%) | 26 (67) | |
| No. (%) seroreactive | ||
| Initial visit | 28/38 (74) | 0/21 (0) |
| One-month return visit | 33/35 (94) | |
| Six-month return visit | 6/11 (55) | |
| Skin culture for | ||
| No. (%) positive | 22 (56) | |
| No. (%) negative | 9 (23) | |
| No. (%) contaminated | 1 (3) | |
| No. (%) not done | 6 (15) | 21 (100) |
| Blood culture for | ||
| No. (%) positive | 29 (74) | |
| No. (%) negative | 7 (18) | |
| No. (%) not done | 3 (8) | 21 (100) |
| Disseminated infection | ||
| No. (%) with MEM and/or positive blood culture | 38/39 (95) |
That is, the number of subjects seroreactive/number of subjects examined. Whole-cell sonicate ELISA was used for Lyme disease subjects, and IgG immunoblotting was used for healthy donors.
Includes four equivocal results.
The remaining patient had facial palsy from Lyme disease.
FIG 1Principal-component analysis distinguishes subjects by disease state. Principal-component analysis of Lyme disease patients at three time points and healthy controls based on 335 differentially expressed transcripts (DETs).
FIG 2Venn diagram depicting common and unique patterns of differential gene expression among Lyme disease patients during acute LD and at 1 month or 6 months after the initiation of an appropriate antibiotic regimen. Venn diagrams were generated using a total of 335 DETs that had a fold change of at least 2, with P value of <0.05, relative to healthy controls. DETs for acute, 1-, and 6-month samples are represented by colored ellipses. The sizes of the ellipses are adjusted for the number of DETs in each group.
Top 40 genes with greatest fold changes in LD subjects relative to healthy donors
| Gene symbol(s) | Gene title(s) | Entrez gene(s) | Fold change | ||
|---|---|---|---|---|---|
| Acute | 1 mo | 6 mo | |||
| DEFA1/DEF1B/DEF3A | Defensin, alpha 1/defensin, alpha | 1667/1668/728358 | 5.21 | 3.73 | 3.24 |
| LCN2 | Lipocalin 2 | 3934 | 3.95 | 2.59 | 1.00 |
| FCGR3B | Fc fragment of IgG, low-affinity IIIb, | 2215 | 3.86 | 2.67 | –1.07 |
| MYL9 | Myosin, light chain 9, regulatory | 10398 | 3.42 | 2.35 | –1.74 |
| FCGR1A | Fc fragment of IgG, high-affinity Ia, | 2209 | 3.34 | 1.38 | 1.25 |
| CLU | Clusterin | 1191 | 3.12 | 2.06 | –1.76 |
| RRM2 | Ribonucleotide reductase M2 | 6241 | 3.06 | 1.29 | –1.27 |
| GMPR | Guanosine monophosphate reductase | 2766 | 2.88 | 2.03 | –1.15 |
| IGHM | Immunoglobulin heavy constant mu | 3507 | 2.84 | 2.05 | –1.71 |
| PF4 | Platelet factor 4 | 5196 | 2.83 | 2.56 | –1.27 |
| SPARC | Secreted protein, acidic, cysteine-rich | 6678 | 2.77 | 2.09 | –1.45 |
| PPBP | Pro-platelet basic protein (chemokine | 5473 | 2.82 | 2.48 | –1.22 |
| C21orf7 | Chromosome 21 open reading frame 7 | 56911 | 2.70 | 2.41 | –1.27 |
| TNFSF10 | Tumor necrosis factor (ligand) | 8743 | 2.77 | 1.87 | 1.43 |
| HSPA6/HSPA7 | Heat shock 70-kDa protein 6/heat shock | 3310/3311 | 2.76 | 2.12 | 1.25 |
| C6orf25 | Chromosome 6 open reading frame 25 | 80739 | 2.75 | 2.06 | –1.17 |
| HIST1H2BK | Histone cluster 1, H2bk | 85236 | 2.74 | 2.11 | –1.67 |
| MYL9 | Myosin, light-chain 9, regulatory | 10398 | 2.72 | 1.86 | –1.45 |
| CXCR2/CXCR2P1 | Chemokine (C-X-C motif) receptor | 3579/3580 | 2.72 | 1.87 | –1.12 |
| FCGR1B | Fc fragment of IgG, high-affinity 1b, | 2210 | 2.70 | 1.23 | –1.06 |
| SLC25A37 | Solute carrier family 25, member 37 | 51312 | 2.68 | 1.88 | –1.29 |
| GBP1 | Guanylate binding protein 1, interferon | 2633 | 2.68 | 1.80 | 1.56 |
| HP | Haptoglobin | 3240 | 2.68 | 1.32 | 1.07 |
| AIM2 | Absent in melanoma 2 | 9447 | 2.67 | 2.19 | 1.42 |
| CA2 | Carbonic anhydrase II | 760 | 2.63 | 2.41 | –1.18 |
| HIST1H2AG | Histone cluster 1, H2ag | 8969 | 2.62 | 2.16 | 1.37 |
| PTGS1 | Prostaglandin-endoperoxide synthase 1 | 5742 | 2.61 | 2.15 | –1.01 |
| THBS1 | Thrombospondin 1 | 7057 | –4.30 | –5.64 | 1.49 |
| IL8 | Interleukin 8 | 3576 | –3.44 | –3.88 | 1.79 |
| EGR1 | Early growth response 1 | 1958 | –3.40 | −2.87 | 1.14 |
| G0S2 | G0/G1 switch 2 | 50486 | –3.10 | –3.77 | 1.02 |
| PPP1CB | Protein phosphatase 1, catalytic subunit, | 5500 | –3.02 | −2.71 | −1.04 |
| NR4A2 | Nuclear receptor subfamily 4, group A, | 4926 | −2.80 | –2.85 | 1.16 |
| HBEGF | Heparin-binding EGF-like growth factor | 1839 | –2.96 | –3.46 | 1.15 |
| RGS1 | Regulator of G-protein signaling 1 | 5996 | –2.94 | –2.70 | 1.12 |
| EPPK1 | Epiplakin 1 | 83481 | –2.94 | –2.44 | −1.22 |
| TNFAIP3 | Tumor necrosis factor, alpha-induced | 7128 | –2.79 | –2.51 | 1.08 |
| NAMPT | Nicotinamide phosphoribosyltransferase | 10135 | –2.75 | –3.98 | 1.84 |
| CD69 | CD69 molecule | 969 | –2.67 | –2.50 | −1.04 |
| CD83 | CD83 molecule | 9308 | −2.67 | −2.50 | 1.29 |
FIG 3Profile plots of temporal gene expression changes in Lyme disease patients and controls. Profile plots were generated using the normalized intensities of the 335 DETs. Lines representing transcripts are colored based on the normalized expression of each transcript (blue, low; red, high) relative to the mean expression value of all transcripts in acute LD subjects.
FIG 4Hierarchical clustering distinguishes between disease states. Heat map with the dendrogram resulting from unsupervised hierarchical clustering performed using 335 transcripts (representing 233 genes) that were differentially expressed (at least a 2-fold change, with a P value of <0.05) relative to healthy controls. The values shown are normalized intensities relative to the mean. Red or blue indicates high or low expression, respectively, of the normalized intensities relative to the mean. The heat map displays five distinct clusters, three containing induced genes and two containing repressed genes. Boldfacing indicates genes that were later identified as classifiers for disease states (Tables 4 and 5). A list of the top 40 genes with greatest changes in LD subjects is presented in Table 2, and all dysregulated genes are provided in Table S1 in the supplemental material.
Top 20 classifier genes that discriminate subjects with acute LD from healthy controls
| Gene symbol | Gene title | RFIL (%) |
|---|---|---|
| PSMB8 | Protease subunit β8 | 9.14 |
| SLAMF7 | SLAM family member 7 | 7.58 |
| RAB24 | RAB24, member RAS oncogene family | 7.11 |
| FCGR1B | Fc fragment of IgG, high affinity 1b, receptor (CD64) | 6.52 |
| MPP1 | Membrane protein, palmitoylated 1, 55 kDa | 5.86 |
| CSF2RB | Colony stimulating factor 2 receptor, beta, low affinity | 5.55 |
| TNFSF10 | Tumor necrosis factor (ligand) superfamily, member 10 | 4.75 |
| BTG1 | B-cell translocation gene 1, antiproliferative | 4.72 |
| GPR183 | G protein-coupled receptor 183 | 4.54 |
| ATG16L2 | Autophagy-related 16-like 2 | 4.50 |
| ACOT7 | Acyl-CoA thioesterase 7 | 4.37 |
| TCIRG1 | T-cell, immune regulator 1, ATPase, H+ transporting V0 | 4.25 |
| CHKB_CPT1B | CHKB-CPT1B readthrough (NMD candidate) | 4.20 |
| DYNLL1 | Dynein light chain LC8-type 1 | 4.13 |
| LCN2 | Lipocalin 2 | 4.05 |
| HSPA6_HSP70B′ | Heat shock protein family A (Hsp70) member 6 | 4.02 |
| FCGR1A | Fc fragment of IgG, high-affinity 1a, receptor (CD64) | 3.85 |
| RCAN3 | RCAN family member 3 (calcipressin 3) | 3.74 |
| HK3 | Hexokinase 3 | 3.65 |
| AP1G2 | Adaptor-related protein complex 1 γ2 subunit | 3.48 |
| Total | 100 |
RFIL, random forest importance level.
Top 20 classifier genes that distinguish between acute and 6-month convalescent LD subjects
| Gene symbol | Gene name | RFIL (%) |
|---|---|---|
| TAF10 | TATA-box binding protein associated factor 10 | 9.96 |
| CTSA | Cathepsin A | 9.26 |
| EXOC3L2 | Exocyst complex component 3-like 2 | 6.77 |
| RRM2 | Ribonuclease reductase regulatory subunit M2 | 5.99 |
| PSMA7 | Proteasome subunit alpha 7 | 5.91 |
| KCNQ1OT1 | KCNQ1 opposite strand/antisense transcript 1 (nonprotein | 5.55 |
| CKMT1B | Creatine kinase, mitochondrial 1B | 5.34 |
| ANKRD13A | Ankyrin repeat domain 13A | 4.86 |
| UBA7 | Ubiquitin-like modifier activating enzyme 7 | 4.71 |
| CDK2AP1 | Cyclin-dependent kinase 2 associated protein 1 | 4.53 |
| TYMS | Thymidylate synthetase | 4.51 |
| FSIP1 | Fibrous sheath interacting protein 1 | 3.92 |
| KIAA0754 | Microtubule-actin crosslinking factor 1 | 3.79 |
| HIST1H2BH | Histone cluster 1 H2B family member H | 3.76 |
| FCGR1B | Fc fragment of IgG, high-affinity Ib, receptor (CD64) | 3.73 |
| WAS | Wiskott-Aldrich syndrome gene | 3.71 |
| CPNE5 | Copine 5 | 3.48 |
| C21orf7 | Chromosome 21 open reading frame 7 | 3.46 |
| GMPR | Guanosine monophosphate reductase | 3.38 |
| PSMD13 | Proteasome 26S subunit, non-ATPase 13 | 3.36 |
| Total | 100 |
RFIL, random forest importance level.
Reported symptoms of LD subjects before and after antibiotic therapy
| Symptom | No./total no. (%) | ||
|---|---|---|---|
| Acute LD | Convalescent LD | ||
| 1 mo | 6 mo | ||
| Arthralgia | 16/38 (42) | 4/35 (11) | 1/11 (9) |
| Dizziness | 7/38 (18) | 1/35 (3) | 0/11 (0) |
| Fatigue | 26/38 (68) | 8/35 (23) | 0/11 (0) |
| Headache | 18/38 (47) | 1/35 (3) | 0/11 (0) |
| Myalgia | 15/38 (40) | 4/35 (11) | 0/11 (0) |
| Stiff neck | 13/38 (34) | 4/35 (11) | 0/11 (0) |
| Any symptom present | 31/38 (82) | 15/35 (43) | 1/11 (9) |
FIG 5Twenty-gene classifier sets identified by random forest analysis accurately distinguish between disease states. (A) Hierarchical clustering was performed with samples from acute LD subjects (orange) and healthy donors (green) based on normalized expression intensities of 20 genes having the highest random forest importance levels for these groups (shown on right and in Table 4). (B) A second unique set of 20 genes (shown on the right and in Table 5) having the highest random forest importance levels when comparing acute LD subjects (orange) and 6-month convalescent LD subjects (green) was used for hierarchical clustering of samples from these groups.
FIG 6Performance of 20-gene classifier sets identified by random forest analysis. Separate leave-one-out cross-validation experiments were performed using the distinct 20-gene classifier sets shown in Tables 4 and 5, respectively, for comparison of subjects with acute LD to (A) healthy controls and (B) 6-month convalescent LD subjects. The results are presented as confusion matrices with boldfacing indicating the samples that were correctly classified.
Twenty-gene classifier sets distinguish B. burgdorferi infection from acute infections caused by other bacterial and viral pathogens
| Influenza A virus | ||||
|---|---|---|---|---|
| PSMB8* | IFI27* | |||
| RNASE3 | SLAMF7* | SIGLEC1* | ||
| C21orf59 | DEFA4 | RAB24* | OTOF | |
| CHIT1 | FCGR1B* | RSAD2* | ||
| ADM* | MPP1* | CD1C | ||
| AZU1 | CSF2RB* | IFI44L* | ||
| TNFSF10* | RPS4Y1 | |||
| FOSB | BPI | RNASE2* | BTG1* | AKR7A2 |
| SCN3A | FCGBP | GPR183* | IFIT3* | |
| HBG1/HBG2/ | CCDC99 | ATG16L2* | CACNA2D3 | |
| SELENBP1 | CAMP | ACOT7* | LAMP3* | |
| AKR1C3 | DUSP3 | TCIRG1 | EPHB2 | |
| CHKB_CPT1B | MCM10 | |||
| ALAS2 | PGLYRP1 | DYNLL1 | ABHD8 | |
| LMAN2L | CD14 | LCN2 | KIF23 | |
| CEACAM6 | HSPA6_HSP70B′ | HLA-DQA1/LOC100507718/LOC100509457 | ||
| RRP1 | NPL | EPHA4 | FCGR1A* | MX2 |
| CCL27 | MARCO | COL9A3 | RCAN3* | BTF3P11 |
| HBD | CHI3L1 | HK3* | AKR1B10 | |
| ZNF639 | PLBD1 | AP1G2* | PLK1S1 |
Genes are listed in order of random forest analysis importance level (highest to lowest). *, interferon-regulated gene. Genes that appear on the classifier list for more than one infectious agent are designated in boldface.