Literature DB >> 32184234

Global Transcriptome Analysis Identifies a Diagnostic Signature for Early Disseminated Lyme Disease and Its Resolution.

Mary M Petzke1, Konstantin Volyanskyy2, Yong Mao2, Byron Arevalo3, Raphael Zohn3, Johanna Quituisaca3, Gary P Wormser4, Nevenka Dimitrova2, Ira Schwartz3.   

Abstract

A bioinformatics approach was employed to identify transcriptome alterations in the peripheral blood mononuclear cells of well-characterized human subjects who were diagnosed with early disseminated Lyme disease (LD) based on stringent microbiological and clinical criteria. Transcriptomes were assessed at the time of presentation and also at approximately 1 month (early convalescence) and 6 months (late convalescence) after initiation of an appropriate antibiotic regimen. Comparative transcriptomics identified 335 transcripts, representing 233 unique genes, with significant alterations of at least 2-fold expression in acute- or convalescent-phase blood samples from LD subjects relative to healthy donors. Acute-phase blood samples from LD subjects had the largest number of differentially expressed transcripts (187 induced, 54 repressed). This transcriptional profile, which was dominated by interferon-regulated genes, was sustained during early convalescence. 6 months after antibiotic treatment the transcriptome of LD subjects was indistinguishable from that of healthy controls based on two separate methods of analysis. Return of the LD expression profile to levels found in control subjects was concordant with disease outcome; 82% of subjects with LD experienced at least one symptom at the baseline visit compared to 43% at the early convalescence time point and only a single patient (9%) at the 6-month convalescence time point. Using the random forest machine learning algorithm, we developed an efficient computational framework to identify sets of 20 classifier genes that discriminated LD from other bacterial and viral infections. These novel LD biomarkers not only differentiated subjects with acute disseminated LD from healthy controls with 96% accuracy but also distinguished between subjects with acute and resolved (late convalescent) disease with 97% accuracy.IMPORTANCE Lyme disease (LD), caused by Borrelia burgdorferi, is the most common tick-borne infectious disease in the United States. We examined gene expression patterns in the blood of individuals with early disseminated LD at the time of diagnosis (acute) and also at approximately 1 month and 6 months following antibiotic treatment. A distinct acute LD profile was observed that was sustained during early convalescence (1 month) but returned to control levels 6 months after treatment. Using a computer learning algorithm, we identified sets of 20 classifier genes that discriminate LD from other bacterial and viral infections. In addition, these novel LD biomarkers are highly accurate in distinguishing patients with acute LD from healthy subjects and in discriminating between individuals with active and resolved infection. This computational approach offers the potential for more accurate diagnosis of early disseminated Lyme disease. It may also allow improved monitoring of treatment efficacy and disease resolution.
Copyright © 2020 Petzke et al.

Entities:  

Keywords:  Borrelia burgdorferizzm321990; Lyme disease; diagnostics; random forest; transcriptome

Mesh:

Substances:

Year:  2020        PMID: 32184234      PMCID: PMC7078463          DOI: 10.1128/mBio.00047-20

Source DB:  PubMed          Journal:  mBio            Impact factor:   7.867


INTRODUCTION

Lyme disease (LD), a multisystem inflammatory disorder caused by Borrelia burgdorferi, is the most common tick-borne infectious disease in the United States, with an average of >25,000 reported cases per year during the past decade and an estimated annual incidence possibly as high as 300,000 cases per year (1). Diagnosis of early infection is primarily based on recognition of the characteristic skin lesion, erythema migrans (EM) (2–4). Treatment with appropriate antibiotics at this stage of infection is generally effective at preventing the development of later clinical manifestations (5–7). If left untreated, however, extracutaneous clinical manifestations may develop that can include neurologic manifestations (e.g., facial palsy), arthritis, or carditis (8–10). Currently, detection of antibodies to B. burgdorferi is the mainstay of laboratory diagnosis of LD (11–13). However, there are several limitations of serologic testing, including lack of sensitivity in patients with EM and the inability of these tests to assess treatment response or to distinguish active from resolved infection (11, 14, 15). Transcriptional profiling of an infected host holds promise as an alternative to serologic testing for rapid and accurate diagnosis of recent infection. In studies unrelated to LD, both common transcriptional activation programs and pathogen-specific alterations in gene expression have been identified (16, 17), and several studies have demonstrated that this approach can discriminate between specific microbial infections, as well as predict disease outcome (18–22). Importantly, gene expression profiles have been used to differentiate between active and resolved infection (23–26). This technology offers the promise of overcoming certain limitations of LD serologic testing. Here, we report on transcriptional profiling of patients with early LD who had objective evidence of disseminated infection and were evaluated both before and after antibiotic therapy. The random forest machine learning algorithm was employed to identify classifier gene sets that discriminate LD from other microbial infections. These novel gene sets differentiated subjects with acute disseminated LD from healthy controls with 96% accuracy. Notably, subjects with acute infection were also discriminated from those with resolved (late convalescent) disease with 97% accuracy.

RESULTS

Characteristics of study subjects.

The study included blood samples from 39 subjects with disseminated LD and from 21 healthy controls (Table 1). Different numbers of samples were included in the three time points used for evaluation of the LD subjects due to the following: some subjects failed to return for both of the follow-up visits, the amount and/or quality of RNA obtained from some blood samples was insufficient for analysis, and 6-month blood samples were collected only during the final 2 years of the study. Subjects who presented with physician-diagnosed EM from late May through early October were enrolled in the study, and an EM skin biopsy was performed. Confirmation of disseminated LD consisted of multiple erythema migrans (MEM) and/or isolation of B. burgdorferi from blood. The only exception was a study subject who presented with facial palsy, a sign of disseminated infection, and who was seropositive by two-tier serologic testing. Serologic testing by a first-tier whole-cell sonicate enzyme-linked immunosorbent assay (ELISA) was conducted at each sample collection time. All EM subjects except one were either seropositive by ELISA at presentation or seroconverted during the course of the study. B. burgdorferi was cultivated from the blood of 29 subjects with EM (Table 1).
TABLE 1

Clinical characteristics of human subjects

ParameterLyme disease subjectsHealthy donors
Total no. of subjects3921
Gender, no. (%)
    Male22 (56)9 (43)
    Female17 (44)12 (57)
Age, no. (%)
    <60 yr28 (68)16 (76)
    ≥60 yr11 (28)3 (14)
EM rash
    Median size, cm2 (range)104 (11–1,440)
    Median duration, days (range)5 (1–60)
    MEM, no. (%)26 (67)
No. (%) seroreactivea for B. burgdorferi
    Initial visit28/38 (74)0/21 (0)
    One-month return visit33/35 (94)
    Six-month return visit6/11 (55)b
Skin culture for B. burgdorferi
    No. (%) positive22 (56)
    No. (%) negative9 (23)
    No. (%) contaminated1 (3)
    No. (%) not done6 (15)21 (100)
Blood culture for B. burgdorferi
    No. (%) positive29 (74)
    No. (%) negative7 (18)
    No. (%) not done3 (8)21 (100)
Disseminated infection
    No. (%) with MEM and/or positive blood culture38/39 (95)c

That is, the number of subjects seroreactive/number of subjects examined. Whole-cell sonicate ELISA was used for Lyme disease subjects, and IgG immunoblotting was used for healthy donors.

Includes four equivocal results.

The remaining patient had facial palsy from Lyme disease.

Clinical characteristics of human subjects That is, the number of subjects seroreactive/number of subjects examined. Whole-cell sonicate ELISA was used for Lyme disease subjects, and IgG immunoblotting was used for healthy donors. Includes four equivocal results. The remaining patient had facial palsy from Lyme disease.

B. burgdorferi infection elicits a distinct gene expression signature during acute disease and early convalescence that resolves by 6 months following treatment.

To characterize the host response to B. burgdorferi infection, we compared gene expression in PBMCs from subjects with acute disseminated LD (n = 28), early convalescent LD (1 month; n = 27), and late convalescent LD (6 months; n = 10) with PBMCs from healthy donors (n = 21) using whole-genome oligonucleotide arrays. Principal-component analysis was performed using all samples. Figure 1 shows that the first principal component (x axis) accounts for 37.7% of the variability in the data and, with few exceptions, clearly separates the healthy donor and late convalescent LD blood samples from the acute and early convalescent LD blood samples. No further separation of samples within each of these groups occurs when the second (y axis) or third (z axis) principal component is applied.
FIG 1

Principal-component analysis distinguishes subjects by disease state. Principal-component analysis of Lyme disease patients at three time points and healthy controls based on 335 differentially expressed transcripts (DETs).

Principal-component analysis distinguishes subjects by disease state. Principal-component analysis of Lyme disease patients at three time points and healthy controls based on 335 differentially expressed transcripts (DETs). Significant differentially expressed transcripts (DETs) were defined as those having a P value of <0.05 and at least a 2-fold change in expression at any time point relative to the healthy donor group. A total of 335 DETs, representing 233 unique genes, were identified (see Table S1 in the supplemental material). The greatest number of DETs (241 total; 187 induced, 54 repressed) was observed in the acute phase blood samples of the LD subjects (Fig. 2). The 1-month convalescent phase samples contained 142 DETs (142 total; 84 induced, 58 repressed); most of these (92; 65%) were also differentially expressed during acute LD. Only 56 DETs (56 total; 45 induced, 11 repressed) were identified in 6-month convalescent-phase samples; of these, an overwhelming majority (51; 91%) were unique to this group. A list of the DETs with the greatest change in expression (at least 2.5-fold) is provided in Table 2, along with the corresponding fold change values for each time point.
FIG 2

Venn diagram depicting common and unique patterns of differential gene expression among Lyme disease patients during acute LD and at 1 month or 6 months after the initiation of an appropriate antibiotic regimen. Venn diagrams were generated using a total of 335 DETs that had a fold change of at least 2, with P value of <0.05, relative to healthy controls. DETs for acute, 1-, and 6-month samples are represented by colored ellipses. The sizes of the ellipses are adjusted for the number of DETs in each group.

TABLE 2

Top 40 genes with greatest fold changes in LD subjects relative to healthy donors

Gene symbol(s)Gene title(s)Entrez gene(s)Fold change
Acute1 mo6 mo
DEFA1/DEF1B/DEF3ADefensin, alpha 1/defensin, alpha1B/defensin, alpha 3, neutrophil specific1667/1668/7283585.213.733.24
LCN2Lipocalin 239343.952.591.00
FCGR3BFc fragment of IgG, low-affinity IIIb,receptor (CD16b)22153.862.67–1.07
MYL9Myosin, light chain 9, regulatory103983.422.35–1.74
FCGR1AFc fragment of IgG, high-affinity Ia,receptor (CD64)22093.341.381.25
CLUClusterin11913.122.06–1.76
RRM2Ribonucleotide reductase M262413.061.29–1.27
GMPRGuanosine monophosphate reductase27662.882.03–1.15
IGHMImmunoglobulin heavy constant mu35072.842.05–1.71
PF4Platelet factor 451962.832.56–1.27
SPARCSecreted protein, acidic, cysteine-rich(osteonectin)66782.772.09–1.45
PPBPPro-platelet basic protein (chemokine[C-X-C motif] ligand 7)54732.822.48–1.22
C21orf7Chromosome 21 open reading frame 7569112.702.41–1.27
TNFSF10Tumor necrosis factor (ligand)superfamily, member 1087432.771.871.43
HSPA6/HSPA7Heat shock 70-kDa protein 6/heat shock70-kDa protein 73310/33112.762.121.25
C6orf25Chromosome 6 open reading frame 25807392.752.06–1.17
HIST1H2BKHistone cluster 1, H2bk852362.742.11–1.67
MYL9Myosin, light-chain 9, regulatory103982.721.86–1.45
CXCR2/CXCR2P1Chemokine (C-X-C motif) receptor2/chemokine (C-X-C motif) receptor 2pseudogene 13579/35802.721.87–1.12
FCGR1BFc fragment of IgG, high-affinity 1b,receptor (CD64)22102.701.23–1.06
SLC25A37Solute carrier family 25, member 37513122.681.88–1.29
GBP1Guanylate binding protein 1, interferoninducible, 67 kDa26332.681.801.56
HPHaptoglobin32402.681.321.07
AIM2Absent in melanoma 294472.672.191.42
CA2Carbonic anhydrase II7602.632.41–1.18
HIST1H2AGHistone cluster 1, H2ag89692.622.161.37
PTGS1Prostaglandin-endoperoxide synthase 1(prostaglandin G/H synthase andcyclooxygenase)57422.612.15–1.01
THBS1Thrombospondin 17057–4.30–5.641.49
IL8Interleukin 83576–3.44–3.881.79
EGR1Early growth response 11958–3.40−2.871.14
G0S2G0/G1 switch 250486–3.10–3.771.02
PPP1CBProtein phosphatase 1, catalytic subunit,beta isozyme5500–3.02−2.71−1.04
NR4A2Nuclear receptor subfamily 4, group A,member 24926−2.80–2.851.16
HBEGFHeparin-binding EGF-like growth factor1839–2.96–3.461.15
RGS1Regulator of G-protein signaling 15996–2.94–2.701.12
EPPK1Epiplakin 183481–2.94–2.44−1.22
TNFAIP3Tumor necrosis factor, alpha-inducedprotein 37128–2.79–2.511.08
NAMPTNicotinamide phosphoribosyltransferase10135–2.75–3.981.84
CD69CD69 molecule969–2.67–2.50−1.04
CD83CD83 molecule9308−2.67−2.501.29
Venn diagram depicting common and unique patterns of differential gene expression among Lyme disease patients during acute LD and at 1 month or 6 months after the initiation of an appropriate antibiotic regimen. Venn diagrams were generated using a total of 335 DETs that had a fold change of at least 2, with P value of <0.05, relative to healthy controls. DETs for acute, 1-, and 6-month samples are represented by colored ellipses. The sizes of the ellipses are adjusted for the number of DETs in each group. Top 40 genes with greatest fold changes in LD subjects relative to healthy donors List of transcripts differentially expressed in Lyme disease patient PBMCs during acute disease and convalescence. Download Table S1, DOCX file, 0.04 MB. In order to visualize temporal gene expression changes occurring during different disease states, a profile plot was generated using the normalized intensity values of the 335 DETs. Healthy donors displayed a relatively broad range in intensity values (Fig. 3); this likely reflects normal variation in gene expression in the population (27, 28). The range of normalized intensities appeared to be more restricted in the acute LD samples relative to samples from the healthy controls, likely reflecting a common response to B. burgdorferi infection among subjects. Consistent with the Venn diagrams, the profiles for acute LD and 1-month convalescent LD samples were found to be strikingly similar; however, the intensity of many of the transcripts was slightly reduced in the 1-month convalescent samples. Importantly, expression intensities for the 6-month convalescent LD samples showed greater variability in general, as was observed in healthy donors (Fig. 3). Interestingly, at 6 months convalescence, the expression levels of some transcripts that had been repressed during acute LD exceeded values observed in healthy controls. This may indicate a “rebound effect” as immune cells returned to homeostasis following clearance of the infection.
FIG 3

Profile plots of temporal gene expression changes in Lyme disease patients and controls. Profile plots were generated using the normalized intensities of the 335 DETs. Lines representing transcripts are colored based on the normalized expression of each transcript (blue, low; red, high) relative to the mean expression value of all transcripts in acute LD subjects.

Profile plots of temporal gene expression changes in Lyme disease patients and controls. Profile plots were generated using the normalized intensities of the 335 DETs. Lines representing transcripts are colored based on the normalized expression of each transcript (blue, low; red, high) relative to the mean expression value of all transcripts in acute LD subjects.

Numerous genes involved in innate immune mechanisms are differentially expressed during acute and early convalescent LD but not during late convalescence.

To further identify transcriptional patterns characteristic of disease states, the 335 DETs were used for unsupervised hierarchical clustering. As shown in Fig. 4, samples separated into two main clusters. Consistent with the principal-component analysis, all healthy donor and late-convalescent-phase samples clustered together (group A), while the majority of the acute-phase (22 of 28) and early-convalescent-phase (20 of 27) samples from LD subjects comprised a second group (group B). Four of the remaining six acute LD samples formed a small subcluster immediately adjacent to group B. One acute LD sample was distinctly separated from the other acute LD samples; this sample had been collected from the only LD subject who did not have serologic evidence of B. burgdorferi infection at any time point during the course of the study and was culture negative from skin and blood; the diagnosis of LD was based solely on the presence of MEM.
FIG 4

Hierarchical clustering distinguishes between disease states. Heat map with the dendrogram resulting from unsupervised hierarchical clustering performed using 335 transcripts (representing 233 genes) that were differentially expressed (at least a 2-fold change, with a P value of <0.05) relative to healthy controls. The values shown are normalized intensities relative to the mean. Red or blue indicates high or low expression, respectively, of the normalized intensities relative to the mean. The heat map displays five distinct clusters, three containing induced genes and two containing repressed genes. Boldfacing indicates genes that were later identified as classifiers for disease states (Tables 4 and 5). A list of the top 40 genes with greatest changes in LD subjects is presented in Table 2, and all dysregulated genes are provided in Table S1 in the supplemental material.

Hierarchical clustering distinguishes between disease states. Heat map with the dendrogram resulting from unsupervised hierarchical clustering performed using 335 transcripts (representing 233 genes) that were differentially expressed (at least a 2-fold change, with a P value of <0.05) relative to healthy controls. The values shown are normalized intensities relative to the mean. Red or blue indicates high or low expression, respectively, of the normalized intensities relative to the mean. The heat map displays five distinct clusters, three containing induced genes and two containing repressed genes. Boldfacing indicates genes that were later identified as classifiers for disease states (Tables 4 and 5). A list of the top 40 genes with greatest changes in LD subjects is presented in Table 2, and all dysregulated genes are provided in Table S1 in the supplemental material.
TABLE 4

Top 20 classifier genes that discriminate subjects with acute LD from healthy controls

Gene symbolGene titleRFIL (%)a
PSMB8Protease subunit β89.14
SLAMF7SLAM family member 77.58
RAB24RAB24, member RAS oncogene family7.11
FCGR1BFc fragment of IgG, high affinity 1b, receptor (CD64)6.52
MPP1Membrane protein, palmitoylated 1, 55 kDa5.86
CSF2RBColony stimulating factor 2 receptor, beta, low affinity(granulocyte-macrophage)5.55
TNFSF10Tumor necrosis factor (ligand) superfamily, member 104.75
BTG1B-cell translocation gene 1, antiproliferative4.72
GPR183G protein-coupled receptor 1834.54
ATG16L2Autophagy-related 16-like 24.50
ACOT7Acyl-CoA thioesterase 74.37
TCIRG1T-cell, immune regulator 1, ATPase, H+ transporting V0subunit a34.25
CHKB_CPT1BCHKB-CPT1B readthrough (NMD candidate)4.20
DYNLL1Dynein light chain LC8-type 14.13
LCN2Lipocalin 24.05
HSPA6_HSP70B′Heat shock protein family A (Hsp70) member 64.02
FCGR1AFc fragment of IgG, high-affinity 1a, receptor (CD64)3.85
RCAN3RCAN family member 3 (calcipressin 3)3.74
HK3Hexokinase 33.65
AP1G2Adaptor-related protein complex 1 γ2 subunit3.48
Total100

RFIL, random forest importance level.

TABLE 5

Top 20 classifier genes that distinguish between acute and 6-month convalescent LD subjects

Gene symbolGene nameRFIL (%)a
TAF10TATA-box binding protein associated factor 109.96
CTSACathepsin A9.26
EXOC3L2Exocyst complex component 3-like 26.77
RRM2Ribonuclease reductase regulatory subunit M25.99
PSMA7Proteasome subunit alpha 75.91
KCNQ1OT1KCNQ1 opposite strand/antisense transcript 1 (nonproteincoding)5.55
CKMT1BCreatine kinase, mitochondrial 1B5.34
ANKRD13AAnkyrin repeat domain 13A4.86
UBA7Ubiquitin-like modifier activating enzyme 74.71
CDK2AP1Cyclin-dependent kinase 2 associated protein 14.53
TYMSThymidylate synthetase4.51
FSIP1Fibrous sheath interacting protein 13.92
KIAA0754Microtubule-actin crosslinking factor 13.79
HIST1H2BHHistone cluster 1 H2B family member H3.76
FCGR1BFc fragment of IgG, high-affinity Ib, receptor (CD64)3.73
WASWiskott-Aldrich syndrome gene3.71
CPNE5Copine 53.48
C21orf7Chromosome 21 open reading frame 73.46
GMPRGuanosine monophosphate reductase3.38
PSMD13Proteasome 26S subunit, non-ATPase 133.36
Total100

RFIL, random forest importance level.

DETs separated into five gene clusters (Fig. 4 and see Table S1 in the supplemental material). Cluster 1 (54 genes, 76 probe sets) and cluster 3 (38 genes, 45 probe sets) contained genes that were strongly or moderately induced in the 42 acute and early convalescent LD samples in group B relative to healthy controls. However, increased expression of these genes was not observed in the six acute LD subjects that clustered in group A. Cluster 1 was characterized by genes involved in innate immune processes (Table S1). Significant gene ontology (GO) terms associated with cluster 1 included platelet alpha granule (P = 3.15E–08), wound healing (P = 3.28E–04), blood coagulation (P = 0.001), hemostasis (P = 0.001), and response to stress (P = 0.005). Cluster 3 featured genes involved in fatty acid catabolism (Table S1). Significant GO terms included carnitine O-palmitoyltransferase activity (P = 2.72E–04), choline kinase activity (P = 2.72E–04), ethanolamine kinase activity (P = 2.72E–04), and intracellular lipid transport (P = 5.32E–04). The majority of acute LD subjects showed a significant induction of genes in cluster 4. This result contrasted with that for clusters 1 and 3, where different responses were observed for the acute LD subjects in group A and group B. Of the 69 transcripts in cluster 4, 28 (41%) are involved in innate immune cell functions, including pathogen recognition, phagocytosis, neutrophil activation, chemotaxis and cell migration, and inflammation. The most highly induced transcript encodes DEFA1/DEFA1B/DEFA3 (defensin, alpha 1/defensin, alpha 1B/defensin, alpha 3, neutrophil specific), microbicidal proteins of neutrophil granules that effectively kill B. burgdorferi in vitro (29) (Table 2). With the single exception of DEFA1/DEFA1B/DEFA3, which was upregulated at all time points, genes in cluster 4 were significantly induced only during acute and early convalescent LD and returned to levels observed in the healthy donors within 6 months (Table 2). Cluster 2 contained 22 genes (26 probe sets) that, with three exceptions, were not significantly changed during acute or early convalescent LD but were significantly induced in the late convalescent LD (6 months) subjects. Cluster 5 consisted of transcripts for 50 genes that were significantly repressed in the majority of acute and early convalescent LD patients relative to healthy subjects. Significant GO terms for these genes included immune system process (P = 4.98E–06), response to wounding (P = 1.21E–05), and cell migration (P = 2.92E–04).

Interferon-regulated genes characterize the response to acute disseminated B. burgdorferi infection.

Interferome (http://www.interferome.org/interferome/home.jspx), a database of interferon (IFN)-regulated genes (30), was employed to analyze the genes dysregulated during acute LD. The following parameters were applied to the analysis: human (species), hematopoietic/immune (system), and blood (organ). Totals of 106 of 131 (81%) induced genes (encoded by 187 transcripts) and 25 of 30 (83%) of the repressed genes (encoded by 54 transcripts) were identified as interferon regulated. These included 32 of the 40 genes with the greatest expression changes (Table 2).

Normalization of transcriptome following treatment is concordant with resolution of symptoms.

LD subjects were questioned regarding symptoms at each visit. Symptoms that had existed due to a preexisting condition were not included in the analysis. At the initial visit, 82% of subjects with acute LD reported experiencing at least one symptom (Table 3). Fatigue was the most commonly reported symptom (68%), followed by headache (47%), arthralgia (42%), myalgia (40%), and stiff neck (34%). Strikingly, only approximately one-half as many subjects (43%) reported experiencing any symptoms at the second visit. Fatigue remained the most commonly reported symptom (23%), followed by arthralgia (11%), myalgia (11%), and stiff neck. Only 3% of subjects at the second visit reported headache. Of 11 evaluable subjects at 6 months after antibiotic treatment, only 1 (9%) reported experiencing any symptoms (arthralgia).
TABLE 3

Reported symptoms of LD subjects before and after antibiotic therapy

SymptomNo./total no. (%)
Acute LDConvalescent LD
1 mo6 mo
Arthralgia16/38 (42)4/35 (11)1/11 (9)
Dizziness7/38 (18)1/35 (3)0/11 (0)
Fatigue26/38 (68)8/35 (23)0/11 (0)
Headache18/38 (47)1/35 (3)0/11 (0)
Myalgia15/38 (40)4/35 (11)0/11 (0)
Stiff neck13/38 (34)4/35 (11)0/11 (0)
Any symptom present31/38 (82)15/35 (43)1/11 (9)
Reported symptoms of LD subjects before and after antibiotic therapy

Identification and validation of predictor genes.

One major limitation of serological tests is the inability to detect infection prior to the appearance of antibodies. A predictive model was developed based on application of the random forest algorithm to the 2004 most highly variable genes in three data sets (acute LD, 6-month convalescent LD, and healthy controls). In the first comparison, the capability of this model to correctly distinguish between subjects with acute LD and healthy controls was determined and the top 20 genes with the highest random forest importance levels were identified (Table 4). Hierarchical clustering using only these 20 genes accurately separated acute LD subjects and healthy controls into two distinct clusters (Fig. 5A). Moreover, this 20-gene classifier set correctly distinguished subjects with acute LD from healthy donors with 100% sensitivity and 96% accuracy (correct predictions/test set size) (Fig. 6). In comparison, only 22/27 of these subjects tested positive by ELISA for B. burgdorferi-specific antibodies at the initial visit, resulting in 81% sensitivity for the serology-based test. Four of the five patients who were seronegative by ELISA at the initial visit seroconverted by the time of the second visit.
FIG 5

Twenty-gene classifier sets identified by random forest analysis accurately distinguish between disease states. (A) Hierarchical clustering was performed with samples from acute LD subjects (orange) and healthy donors (green) based on normalized expression intensities of 20 genes having the highest random forest importance levels for these groups (shown on right and in Table 4). (B) A second unique set of 20 genes (shown on the right and in Table 5) having the highest random forest importance levels when comparing acute LD subjects (orange) and 6-month convalescent LD subjects (green) was used for hierarchical clustering of samples from these groups.

FIG 6

Performance of 20-gene classifier sets identified by random forest analysis. Separate leave-one-out cross-validation experiments were performed using the distinct 20-gene classifier sets shown in Tables 4 and 5, respectively, for comparison of subjects with acute LD to (A) healthy controls and (B) 6-month convalescent LD subjects. The results are presented as confusion matrices with boldfacing indicating the samples that were correctly classified.

Top 20 classifier genes that discriminate subjects with acute LD from healthy controls RFIL, random forest importance level. Twenty-gene classifier sets identified by random forest analysis accurately distinguish between disease states. (A) Hierarchical clustering was performed with samples from acute LD subjects (orange) and healthy donors (green) based on normalized expression intensities of 20 genes having the highest random forest importance levels for these groups (shown on right and in Table 4). (B) A second unique set of 20 genes (shown on the right and in Table 5) having the highest random forest importance levels when comparing acute LD subjects (orange) and 6-month convalescent LD subjects (green) was used for hierarchical clustering of samples from these groups. Performance of 20-gene classifier sets identified by random forest analysis. Separate leave-one-out cross-validation experiments were performed using the distinct 20-gene classifier sets shown in Tables 4 and 5, respectively, for comparison of subjects with acute LD to (A) healthy controls and (B) 6-month convalescent LD subjects. The results are presented as confusion matrices with boldfacing indicating the samples that were correctly classified. Another major limitation of most serologic diagnostic tests is the inability to distinguish between active and prior infection as circulating antibodies are present long after the pathogen is cleared. Application of the random forest algorithm to samples from LD subjects at baseline and at 6-month convalescence resulted in a separate distinct set of 20 classifier genes (Table 5). Hierarchical clustering of samples using this unique 20-gene classifier set correctly categorized the preponderance of samples from these two groups (Fig. 5B). In addition, acute LD could be discriminated from 6-month convalescent subjects with 100% sensitivity and 97% accuracy (Fig. 6). Top 20 classifier genes that distinguish between acute and 6-month convalescent LD subjects RFIL, random forest importance level. Validation of the specificity of the classifier gene set was performed by applying the prediction model to a published microarray data set generated using peripheral blood mononuclear cells (PBMCs) from patients with acute infections caused by common bacterial and viral pathogens: Staphylococcus aureus, Streptococcus pneumoniae, Escherichia coli, or influenza A virus (17). First, the top 10% of genes with the greatest variance were selected. Next, iterations (n = 10) of the random forest algorithm were run to identify the top 20 genes associated with each infectious agent that had the highest importance levels (Table 6). Using these 20-gene classifier sets, random forest analysis correctly identified patients with specific infections with prediction accuracies of 100% (influenza A virus), 98% (B. burgdorferi), 95% (S. pneumoniae and S. aureus), and 94% (E. coli). Comparison of the 20-gene sets revealed that acute infections due to E. coli, S. aureus, and S. pneumoniae shared multiple classifiers; the greatest number (eight) of shared classifiers was between E. coli and S. aureus infections (Table 6). The gene lists were analyzed for IFN-responsive genes using Interferome as described above. The only classifier sets that contained more than one IFN-regulated gene were those for B. burgdorferi (n = 15) and influenza A (n = 6) (Table 6). Remarkably, however, all 20 classifier genes for acute infection with B. burgdorferi were unique to that organism; none was shared with any of the other bacterial infections or with infection due to influenza A.
TABLE 6

Twenty-gene classifier sets distinguish B. burgdorferi infection from acute infections caused by other bacterial and viral pathogens

E. coliS. aureusS. pneumoniaeB. burgdorferi (acute LD)Influenza A virus
ELANEELANESERPINB2PSMB8*IFI27*
CEACAM8DEFA1/DEFA1B/DEFA3RNASE3SLAMF7*SIGLEC1*
IL8C21orf59DEFA4RAB24*OTOF
MMP8MGAMCHIT1FCGR1B*RSAD2*
OLFM4ADM*ELANEMPP1*CD1C
DEFA1/DEFA1B/DEFA3LTFAZU1CSF2RB*IFI44L*
MGAMMPOCXCL2TNFSF10*RPS4Y1
FOSBBPIRNASE2*BTG1*AKR7A2
AHSPSCN3AFCGBPGPR183*IFIT3*
HBG1/HBG2/CCDC99CEACAM8ATG16L2*CACNA2D3
SELENBP1AHSPCAMPACOT7*LAMP3*
AKR1C3DUSP3ANXA3TCIRG1EPHB2
CXCL2MMP8DEFA1/DEFA1B/DEFA3CHKB_CPT1BMCM10
ALAS2CEACAM8PGLYRP1DYNLL1ABHD8
LMAN2LCD14IL8LCN2KIF23
LTFOLFM4CEACAM6HSPA6_HSP70B′HLA-DQA1/LOC100507718/LOC100509457
RRP1NPLEPHA4FCGR1A*MX2
CCL27MARCOCOL9A3RCAN3*BTF3P11
HBDANXA3CHI3L1HK3*AKR1B10
ZNF639PLBD1MPOAP1G2*PLK1S1

Genes are listed in order of random forest analysis importance level (highest to lowest). *, interferon-regulated gene. Genes that appear on the classifier list for more than one infectious agent are designated in boldface.

Twenty-gene classifier sets distinguish B. burgdorferi infection from acute infections caused by other bacterial and viral pathogens Genes are listed in order of random forest analysis importance level (highest to lowest). *, interferon-regulated gene. Genes that appear on the classifier list for more than one infectious agent are designated in boldface.

DISCUSSION

In this study, multiple approaches were used to identify a peripheral blood signature that would enable reliable detection of early disseminated LD at a time point when standard serologic testing may be suboptimally sensitive. A 20-gene classifier set that correctly distinguished subjects with acute LD from healthy donors with 96% accuracy, 100% sensitivity, and 90% specificity was identified. A second major limitation of antibody-based tests is the inability to differentiate between acute infection and resolved infection (after antibiotic treatment) due to specific circulating antibodies that may persist for years after the microbe has been eliminated. The identified 20-gene classifier set was able to discriminate acute LD from 6-month convalescent subjects with 97% accuracy, 100% sensitivity, and 90% specificity. Notably, gene expression changes corresponded to reported symptoms. The greatest number of genes with altered expression was present in the acute LD group; symptoms were reported by 82% of all acute LD subjects in this study and by 93% of the 28 subjects whose blood was analyzed for gene expression. In contrast, return of the gene expression profile to that observed in the healthy donors corresponded with resolution of symptoms: only one 6-month LD convalescent subject (9%) reported having any symptom. Thus, the identified classifier set has the potential for serving as a test for disease resolution. The algorithm used to generate the classifier gene set for acute B. burgdorferi infection was applied to published microarray data sets for PBMCs collected from patients with acute infections caused by three common bacterial pathogens or by influenza A virus. Importantly, all 20 classifier genes for acute B. burgdorferi infection were completely unique and were not associated with any of these four pathogens. Therefore, the gene classifier sets described here not only demonstrated high sensitivity for acute LD relative to healthy donors and convalescent LD patients, but the 20-gene classifier set for acute LD distinguished B. burgdorferi infection from the other tested bacterial or viral infections with 100% specificity. In sharp contrast to the gene classifiers for the other three bacterial pathogens, the classifier gene sets for B. burgdorferi and influenza A infection were both characterized by an IFN-regulated signature, although the individual genes comprising each set were unique. IFI27 (interferon alpha inducible protein 27) is the classifier gene for influenza A that has the highest random forest importance value. IFI27 has been described in a separate study as a novel single-gene biomarker in patient blood that was able to discriminate, with 88% diagnostic accuracy and 90% specificity, between influenza virus- and bacterium-associated respiratory infections (31). We have previously demonstrated that B. burgdorferi induces numerous IFN-regulated genes in skin at the site of an EM lesion (32), many of which were also dysregulated in Lyme disease patient PBMCs in the present study. Of note, the 20-gene classifier set for B. burgdorferi infection included 15 IFN-regulated genes; five were also significantly induced in EM skin biopsy specimens from patients with disseminated Lyme disease (32). Several of these genes encode proteins involved in pathogen recognition and phagocytosis, and antigen processing, including: the Fc gamma receptors FCGR1A and FCGR1B (Fc fragment of IgG, high-affinity 1a and 1b, receptor [CD64]), TNFSF10 (tumor necrosis factor [ligand] superfamily, member 10), and PSMB8 (proteasome subunit beta 8). Interestingly, the classifier gene sets for infections caused by each of the other three bacterial pathogens evaluated were nearly devoid of IFN-regulated genes, with none associated with E. coli infection and one IFN-regulated gene each associated with S. aureus and S. pneumoniae infections. Collectively, these results confirm and extend our previous observation that B. burgdorferi elicits an IFN-dominated transcriptional signature during early infection, a sharp distinction from the immunological footprints generated by the other bacterial pathogens examined. In addition, the 20-gene classifier set clearly distinguishes B. burgdorferi infection from that caused by influenza A, although both pathogens potently stimulate the interferon signaling pathway (33). Bouquet and colleagues also examined the transcriptional profile in PBMCs of LD patients with EM before antibiotic treatment, 3 weeks later, and then 6 months after the completion of antibiotic therapy (34). There is general consensus between Bouquet et al. and a major finding of the present study: acute infection with B. burgdorferi elicits a distinct gene expression profile in patient blood that persists for at least 3 weeks after infection. However, in contrast to Bouquet et al., we observed that the majority of differentially regulated genes return to healthy donor levels by 6 months posttreatment. There are several differences between the two studies that might explain the discrepancies in the findings. The most significant difference may be in the patient population under investigation. The present study was restricted to subjects with definitive early disseminated LD. A total of 95% of enrolled LD subjects had either MEM (67%) and/or positive blood culture for B. burgdorferi (74%); the remaining subject had facial palsy, a sign of disseminated LD. The inclusion criteria of Bouquet et al. were less stringent and consisted of a physician-documented EM of >5 cm with at least one concurrent nonspecific symptom (headache, fever, chills, fatigue, and/or new muscle or joint pains). Cultivation of B. burgdorferi from any clinical samples was not reported, and only 43% of LD subjects had MEM. Of the 29 subjects with LD in the Bouquet et al. study, 8 did not seroconvert, and 1 was not tested. In addition to the enrollment criteria, the definition for altered gene expression differed between the studies; Bouquet et al. used a 1.5-fold change cutoff compared to the 2-fold change in the present study. Significantly, in the present study, random forest analysis was employed to build predictive models. Classifier gene sets that could separately distinguish healthy controls from patients with acute disseminated infection, and between such patients and those with resolved infection, were identified. It is important to note the limitations of the current investigation. It was not completely longitudinal and included a relatively small sample size for the 6-month visit. This was primarily due to the fact that 6-month samples were not collected during the first 2 years of the study; the 6-month convalescent time point was added when it became apparent that transcript levels had not returned to normal by 1 month posttreatment. In addition, some study subjects were lost to follow-up, and some RNA samples did not meet the quality requirements for microarray hybridization. A sample size of 10, however, has proven to be sufficient for rigorous statistical comparison with earlier time points and with healthy donors in other studies (18). Since only one of the 10 subjects reported having any symptoms at 6 months, the small sample pool was insufficient for identifying potential transcriptome alterations associated with persisting symptoms. Another limitation is the specific focus on patients with definitive evidence of disseminated infection. An optimal diagnostic test for LD should be able to detect infection at its earliest stages, when B. burgdorferi is still localized to the skin. Current studies are under way to test the sensitivity of the diagnostic biomarker set using samples from subjects with EM, but without evidence of dissemination. It should also be noted, that the use of published data sets rather than prospectively collected samples (as in Table 6) could potentially lead to artifacts in the cross-comparisons. In conclusion, we report the development, using gene expression data, of an efficient computational framework to generate a 20-gene classifier set that detects disseminated B. burgdorferi infection with high sensitivity and specificity. This unique classifier set may have a critical advantage over current serologic tests in that it accurately discriminated between active and resolved infection. This computational approach offers the potential for more accurate diagnosis of early disseminated Lyme disease. It may also allow improved monitoring of treatment efficacy and disease resolution.

MATERIALS AND METHODS

Study subjects.

All subjects were adult volunteers of at least 18 years of age and provided written informed consent prior to sample collection, in accordance with the study protocol approved by the Institutional Review Board of New York Medical College (NYMC). Healthy donors were recruited from NYMC staff, excluding members of the investigators’ laboratories, and met the following inclusion criteria: no history of LD, no receipt of a Lyme disease vaccine, no evidence of a current infectious disease, not pregnant, and no usage of an immunosuppressive medication. Patients were recruited from the Lyme Disease Diagnostic Center of NYMC during the summer seasons of 2005 to 2006 and 2010 to 2013. Blood samples were collected at the time of diagnosis (acute LD) and at approximately 1 and 6 months after the initiation of a recommended course of antibiotics (7). Serologic testing of LD subjects for antibodies to B. burgdorferi was performed by a whole-cell sonicate ELISA. Serologic testing of healthy controls for antibodies to B. burgdorferi was performed once by IgG immunoblot. Analysis was restricted to samples collected from individuals with objective evidence of dissemination, most often based on the presence of multiple erythema migrans (MEM) skin lesions and/or the cultivation of B. burgdorferi from blood, as previously described (35).

Blood collection and RNA isolation.

Venous blood was collected directly into BD-Vacutainer CPT tubes (Becton Dickinson, Franklin Lakes, NJ). PBMCs were isolated by centrifugation, according to the manufacturer’s protocol, no later than 3 h after blood collection. PBMCs were washed with Hanks’ balanced salt solution without calcium, magnesium, or phenol red (Gibco-BRL, Grand Island, NY), and RNA was isolated immediately thereafter under RNase-free conditions using the PureScript total RNA isolation kit (Gentra, Minneapolis, MN) or the Ambion ToTALLY RNA isolation kit (Life Technologies, Grand Island, NY), according to the manufacturers’ instructions. Contaminating DNA was removed using the DNA-free kit (Ambion, Austin, TX). RNA was eluted in 20 μl RNase/DNase-free water and stored at –80°C after the addition of 32 U of RNase inhibitor (Promega, Madison, WI). RNA integrity was assessed by electrophoresis using an Agilent Bioanalyzer 2100 (Agilent, Palo Alto, CA) prior to cDNA synthesis for microarray hybridization. Samples having an RNA integrity number below 6 were excluded from further analysis.

Microarray hybridization.

Between 5 and 20 ng of total RNA from each PBMC sample was used to generate high-fidelity cDNA using an Ovation RNA amplification system (NuGEN Technologies, Inc., San Carlos, CA) according to the manufacturer’s protocol. The amplified cDNA was fragmented to 50 to 100 nucleotides, labeled with biotin, and hybridized to the Affymetrix GeneChip.HG-U219 high-density oligonucleotide array (Affymetrix, Santa Clara, CA). After hybridization, the arrays were stained with streptavidin-phycoerythrin and washed in an Affymetrix fluidics module using standard Affymetrix protocols. The detection and quantitation of target hybridization was performed using a GeneArray Scanner 3000 (Affymetrix). All procedures were performed at the Bionomics Research and Technology Center, Rutgers University, Piscataway, NJ.

Microarray data analysis.

Microarray data were analyzed using GeneSpring GX14.9 software (Agilent Technologies, Santa Clara, CA). Raw expression values in CEL file format were normalized by robust multiarray analysis (RMA) and quantile normalization, filtered to include only those with intensity values above the 20th percentile, and baseline transformed to the median of all samples. Statistical analysis was performed using one-way analysis of variance with Benjamini-Hochberg multiple testing correction to reduce false positives (36). Differentially expressed transcripts, defined as those having a P value of <0.05 and a fold change of at least 2 relative to the healthy donor group, were subjected to hierarchical clustering and principal-component analysis.

Predictive modeling.

A generic predictive modeling framework was developed and applied to two comparisons: acute LD (n = 28) versus healthy donors (n = 21) and acute LD versus 6-month convalescent LD (n = 10). In the first step, the distribution of the gene expression variance across all experimental groups was computed, and genes with variance at or above the 90th percentile were identified. This threshold is a parameter of the framework and can be appropriately set based on the variance distribution in a considered cohort of samples. In the second step, expression data containing the top 10% of variance in each experimental group were subjected to iterations (n = 50) of random forest analysis, a well-established machine learning algorithm (37). An importance value for each gene was generated following each iteration of random forest analysis, and a final importance value for each gene was computed by averaging the importance values across all 50 iterations. Averaged importance values were used to rank all top selected genes. Finally, for each experiment, leave-one-out predictive modeling was performed, as well as tested using incrementally expanding sets of the most significant genes (top 20 through top 2004), to assess the changes in accuracy performance across different sets of predictors.

Comparison of classifier genes for LD and other infectious diseases.

Microarray-based transcriptome data set GSE6269, containing gene expression profiles from PBMCs from patients with acute infections due to Escherichia coli, Staphylococcus aureus, Streptococcus pneumoniae, or influenza A virus (17) was downloaded from the GEO database and subjected to random forest analysis using the same framework and parameters that were applied to the LD data.

Data availability.

The transcriptome data obtained in this study have been submitted to the Gene Expression Omnibus (GEO) data repository under accession number GSE145974.
  36 in total

1.  The presenting manifestations of Lyme disease and the outcomes of treatment.

Authors:  Allen C Steere; Vijay K Sikand
Journal:  N Engl J Med       Date:  2003-06-12       Impact factor: 91.245

Review 2.  The Past, Present, and (Possible) Future of Serologic Testing for Lyme Disease.

Authors:  Elitza S Theel
Journal:  J Clin Microbiol       Date:  2016-02-10       Impact factor: 5.948

3.  The clinical assessment, treatment, and prevention of lyme disease, human granulocytic anaplasmosis, and babesiosis: clinical practice guidelines by the Infectious Diseases Society of America.

Authors:  Gary P Wormser; Raymond J Dattwyler; Eugene D Shapiro; John J Halperin; Allen C Steere; Mark S Klempner; Peter J Krause; Johan S Bakken; Franc Strle; Gerold Stanek; Linda Bockenstedt; Durland Fish; J Stephen Dumler; Robert B Nadelman
Journal:  Clin Infect Dis       Date:  2006-10-02       Impact factor: 9.079

4.  Gene expression profiles in febrile children with defined viral and bacterial infection.

Authors:  Xinran Hu; Jinsheng Yu; Seth D Crosby; Gregory A Storch
Journal:  Proc Natl Acad Sci U S A       Date:  2013-07-15       Impact factor: 11.205

Review 5.  Diagnosis of lyme borreliosis.

Authors:  Maria E Aguero-Rosenfeld; Guiqing Wang; Ira Schwartz; Gary P Wormser
Journal:  Clin Microbiol Rev       Date:  2005-07       Impact factor: 26.132

6.  Individuality and variation in gene expression patterns in human blood.

Authors:  Adeline R Whitney; Maximilian Diehn; Stephen J Popper; Ash A Alizadeh; Jennifer C Boldrick; David A Relman; Patrick O Brown
Journal:  Proc Natl Acad Sci U S A       Date:  2003-02-10       Impact factor: 11.205

7.  The clinical evolution of Lyme arthritis.

Authors:  A C Steere; R T Schoen; E Taylor
Journal:  Ann Intern Med       Date:  1987-11       Impact factor: 25.391

8.  Host Transcriptional Response to Influenza and Other Acute Respiratory Viral Infections--A Prospective Cohort Study.

Authors:  Yijie Zhai; Luis M Franco; Robert L Atmar; John M Quarles; Nancy Arden; Kristine L Bucasas; Janet M Wells; Diane Niño; Xueqing Wang; Gladys E Zapata; Chad A Shaw; John W Belmont; Robert B Couch
Journal:  PLoS Pathog       Date:  2015-06-12       Impact factor: 6.823

9.  Interferome v2.0: an updated database of annotated interferon-regulated genes.

Authors:  Irina Rusinova; Sam Forster; Simon Yu; Anitha Kannan; Marion Masse; Helen Cumming; Ross Chapman; Paul J Hertzog
Journal:  Nucleic Acids Res       Date:  2012-11-29       Impact factor: 16.971

10.  Direct Diagnostic Tests for Lyme Disease.

Authors:  Steven E Schutzer; Barbara A Body; Jeff Boyle; Bernard M Branson; Raymond J Dattwyler; Erol Fikrig; Noel J Gerald; Maria Gomes-Solecki; Martin Kintrup; Michel Ledizet; Andrew E Levin; Michael Lewinski; Lance A Liotta; Adriana Marques; Paul S Mead; Emmanuel F Mongodin; Segaran Pillai; Prasad Rao; William H Robinson; Kristian M Roth; Martin E Schriefer; Thomas Slezak; Jessica L Snyder; Allen C Steere; Jan Witkowski; Susan J Wong; John A Branda
Journal:  Clin Infect Dis       Date:  2019-03-05       Impact factor: 9.079

View more
  5 in total

1.  Lyme Disease in Humans.

Authors:  Justin D Radolf; Klemen Strle; Jacob E Lemieux; Franc Strle
Journal:  Curr Issues Mol Biol       Date:  2020-12-11       Impact factor: 2.081

Review 2.  Borreliella burgdorferi Antimicrobial-Tolerant Persistence in Lyme Disease and Posttreatment Lyme Disease Syndromes.

Authors:  Felipe C Cabello; Monica E Embers; Stuart A Newman; Henry P Godfrey
Journal:  mBio       Date:  2022-04-25       Impact factor: 7.786

3.  Predicting Lyme Disease From Patients' Peripheral Blood Mononuclear Cells Profiled With RNA-Sequencing.

Authors:  Daniel J B Clarke; Alison W Rebman; Allison Bailey; Megan L Wojciechowicz; Sherry L Jenkins; John E Evangelista; Matteo Danieletto; Jinshui Fan; Mark W Eshoo; Michael R Mosel; William Robinson; Nitya Ramadoss; Jason Bobe; Mark J Soloski; John N Aucott; Avi Ma'ayan
Journal:  Front Immunol       Date:  2021-03-08       Impact factor: 7.561

4.  A diagnostic classifier for gene expression-based identification of early Lyme disease.

Authors:  Venice Servellita; Jerome Bouquet; Alison Rebman; Ting Yang; Erik Samayoa; Steve Miller; Mars Stone; Marion Lanteri; Michael Busch; Patrick Tang; Muhammad Morshed; Mark J Soloski; John Aucott; Charles Y Chiu
Journal:  Commun Med (Lond)       Date:  2022-07-22

Review 5.  Lyme arthritis: linking infection, inflammation and autoimmunity.

Authors:  Robert B Lochhead; Klemen Strle; Sheila L Arvikar; Janis J Weis; Allen C Steere
Journal:  Nat Rev Rheumatol       Date:  2021-07-05       Impact factor: 32.286

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.