Literature DB >> 35879995

A diagnostic classifier for gene expression-based identification of early Lyme disease.

Venice Servellita1, Jerome Bouquet1, Alison Rebman2, Ting Yang2, Erik Samayoa1, Steve Miller1, Mars Stone3, Marion Lanteri3, Michael Busch3, Patrick Tang4, Muhammad Morshed5, Mark J Soloski2, John Aucott2, Charles Y Chiu1,6.   

Abstract

Background: Lyme disease is a tick-borne illness that causes an estimated 476,000 infections annually in the United States. New diagnostic tests are urgently needed, as existing antibody-based assays lack sufficient sensitivity and specificity.
Methods: Here we perform transcriptome profiling by RNA sequencing (RNA-Seq), targeted RNA-Seq, and/or machine learning-based classification of 263 peripheral blood mononuclear cell samples from 218 subjects, including 94 early Lyme disease patients, 48 uninfected control subjects, and 57 patients with other infections (influenza, bacteremia, or tuberculosis). Differentially expressed genes among the 25,278 in the reference database are selected based on ≥1.5-fold change, ≤0.05 p value, and ≤0.001 false-discovery rate cutoffs. After gene selection using a k-nearest neighbor algorithm, the comparative performance of ten different classifier models is evaluated using machine learning.
Results: We identify a 31-gene Lyme disease classifier (LDC) panel that can discriminate between early Lyme patients and controls, with 23 genes (74.2%) that have previously been described in association with clinical investigations of Lyme disease patients or in vitro cell culture and rodent studies of Borrelia burgdorferi infection. Evaluation of the LDC using an independent test set of samples from 63 subjects yields an overall sensitivity of 90.0%, specificity of 100%, and accuracy of 95.2%. The LDC test is positive in 85.7% of seronegative patients and found to persist for ≥3 weeks in 9 of 12 (75%) patients. Conclusions: These results highlight the potential clinical utility of a gene expression classifier for diagnosis of early Lyme disease, including in patients negative by conventional serologic testing.
© The Author(s) 2022.

Entities:  

Keywords:  Bacterial host response; Bacterial infection; Diagnostic markers; RNA sequencing; Targeted resequencing

Year:  2022        PMID: 35879995      PMCID: PMC9306241          DOI: 10.1038/s43856-022-00127-2

Source DB:  PubMed          Journal:  Commun Med (Lond)        ISSN: 2730-664X


Introduction

Lyme disease is a systemic tick-borne infection caused by Borrelia burgdorferi sensu lato and the most common vector-borne disease in the United States[1]. Lyme disease can cause arthritis, facial palsy, neuroborreliosis (neurological disease including meningitis, radiculopathy, and encephalitis), and even myocarditis resulting in sudden death[2]. Most patients treated with appropriate antibiotics recover rapidly and completely, but 5–15% of patients develop persistent or recurring symptoms. When prolonged and associated with functional disability, patients are considered to have post-treatment Lyme disease syndrome (PTLDS)[3,4]. The failure to diagnose and treat Lyme disease in a timely fashion results in higher morbidity and protracted recovery times[5]. Diagnosis of early Lyme disease is challenging[6]. Clinical manifestations can be highly variable, presenting as non-specific “flu-like” symptoms, and a characteristic bullseye erythema migrans (EM) rash is seen only 60–70% of the time[7]. Available FDA-approved serologic assays, including two-tier antibody testing recommended by the CDC for diagnosis, are negative in up to 40% of early Lyme patients[8-10]. Nucleic acid testing is hindered by low titers of B. burgdorferi in the blood during acute infection, with only 20–62% reported sensitivity of detection[11,12]. The advent of the genomics era has spurred the development of diagnostic tests based on transcriptome (“RNA-Seq”) analyses of the human host response[13]. Classification by gene expression profiling has been useful in the identification of various infections, including Staphylococcal bacteremia[14], active versus latent tuberculosis[15], influenza[16,17], and COVID-19[18,19]. Transcriptome profiling of peripheral blood mononuclear cells (PBMCs)[20] or EM skin lesions[21] from patients with early Lyme disease has demonstrated pronounced inflammatory responses predominated by interferon signaling. Machine learning (ML)-based analyses of RNA-Seq data have been used for cancer classification[22], but to date have not yet been applied for infectious disease diagnosis. Here we sought to leverage iterative ML analyses of global and targeted RNA-Seq data to define a panel of differentially expressed genes (DEGs) to distinguish Lyme disease from non-Lyme controls. This panel, referred to as a Lyme disease classifier (LDC), consisted of 31 genes and was able to diagnose Lyme disease with >95% accuracy, including in >85% of Lyme seronegative patients.

Methods

Patient information

Patient enrollment, chart review, collection of clinical samples, and analysis of clinical samples by transcriptomic profiling or targeted RNA sequencing were done under protocols approved by the Institutional Review Boards of Johns Hopkins University (JHU) (JHU IRB # NA_00011170) and the University of California, San Francisco (UCSF IRB # 17–241124211). Written informed consent was obtained from all JHU Lyme disease and uninfected control patients for enrollment into the study. No consents were obtained from other, non-JHU patients since only remnant clinical samples from these patients were used, and the samples were analyzed under protocols approved by the UCSF IRB as part of a “no subject contact” biobanking study with waiver of consent (UCSF IRB #17–2411). All 94 Lyme disease subjects included in this study presented with a physician documented EM of ≥5 cm and either concurrent flu-like symptoms that included at least one of the following: fever, chills, fatigue, headache, and/or new muscle or joint pains or dissemination of the EM rash to multiple skin locations. Controls (n = 26) were enrolled from the same physician practice as cases. Two-tier serological Lyme disease testing was performed on clinical Lyme patients by a clinical reference laboratory (Quest Diagnostics) at the first visit and at 3 weeks, following a standard 3-week course of doxycycline treatment. Patients found to be Lyme seropositive at the first visit did not get repeat testing. Seropositivity was assessed according to established CDC criteria[23], including the requirement that patients have had symptoms for less than or equal to 30 days for Lyme diagnosis by positive ELISA and IgM testing. All controls were required to have a negative Lyme serologic test and no clinical history of Lyme disease to be enrolled in the study. All Lyme disease patients and controls were collected in Maryland, USA, an area highly endemic for Lyme disease. PBMC samples from 57 patients diagnosed with other infections were collected at the UCSF, and 22 controls (asymptomatic blood donors) were collected at the Blood Systems Research Institute in San Francisco, California. Patients with other infections were diagnosed with either bacteremia (n = 21), caused by Enterococcus faecium, Escherichia coli, Klebsiella pneumoniae, Staphylococcus aureus, Staphylococcus epidermidis, or Streptococcus pneumoniae by standard plate culture, or influenza (n = 36) by positive RT-PCR testing (Luminex NxTAG Respiratory Pathogen Panel). PBMC samples from 19 adults, 9 patients diagnosed with tuberculosis using an interferon-gamma release assay (Oxford Immunotec T-SPOT.TB), and 10 uninfected controls, were collected at the British Columbia Centre for Disease Control in Vancouver, Canada. PBMCs were isolated from freshly collected whole blood in EDTA tubes (kept at 4 °C for <24 h) using Ficoll (Ficoll-Paque Plus, GE Healthcare) and total RNA was extracted from 107 PBMCs using TRIzol reagent (Life Technologies).

Transcriptome sequencing

Messenger RNA was isolated with the Oligotex mRNA mini kit (Qiagen). The Scriptseq RNA-Seq library preparation kit (Epicentre) was used to generate the RNA-Seq libraries according to the manufacturer’s protocol. Libraries were sequenced as 100 bp paired-end reads on a HiSeq 2000 instrument (Illumina). Samples were processed in two batches (Fig. 1). Set 1 corresponds to samples from 28 Lyme disease patients and 13 matched control samples as previously described[20]. Set 2 corresponds to samples from 13 new Lyme disease and 6 matched control samples prepared and sequenced alongside samples from 6 influenza and 6 bacteremia patients. One sample was not included in the pooled analysis due to insufficient read counts.
Fig. 1

Flowchart of the approach used to develop and validate a 31-gene Lyme disease classifier panel for identification of early Lyme disease.

DEGs differentially expressed genes, KNNXV k-nearest neighbor cross-validation, TREx targeted RNA expression sequencing.

Flowchart of the approach used to develop and validate a 31-gene Lyme disease classifier panel for identification of early Lyme disease.

DEGs differentially expressed genes, KNNXV k-nearest neighbor cross-validation, TREx targeted RNA expression sequencing.

Transcriptome RNA-Seq data analyses

Paired-end reads were mapped to the human genome (hg19), followed by annotation of exons and calculation of FPKM (fragments per kilobase of exon per million fragments mapped) values for all 25,278 expressed genes with version 2 of the TopHat/Cufflinks pipeline[24]. Differential expression of genes was calculated using the variance modeling at the observational level transformation[25], which applies precision weights to the matrix count, followed by linear modeling with the Limma package. Genes were considered to be differentially expressed when the change was ≥1.5-fold, the p value ≤ 0.05, and the adjusted p value (or false-discovery rate, FDR) was ≤0.001[26].

Targeted RNA sequencing

Quantitative analysis of a custom panel of transcripts of interest was performed using a targeted RNA enrichment sequencing approach that incorporated an anchored multiplex PCR technique. PBMC samples (~1 million cells) were extracted using Zymo DirectZol RNA Miniprep Kit with on-column DNase following the manufacturer’s instructions. Reverse transcription was performed using the Illumina TruSeq Targeted RNA Expression Kit on 50 ng of RNA according to the manufacturer’s instructions. A custom panel of oligoucleotides representing the genes of interest was designed and ordered using the Illumina DesignStudio platform. This pool of oligonucleotides, each attached to a small RNA sequencing primer (smRNA) binding site, was used to hybridize, extend, and ligate the second strand of cDNA from targeted genes of interest. Thirty-five cycles of amplification were then performed using primers with a complementary smRNA sequence. The resulting libraries were sequenced on an Illumina MiSeq to a depth of ~2500 reads per sample per gene. Expression counts per sample per gene were calculated on the instrument using MiSeq reporter targeted RNA workflow software (revision C). Briefly, following demultiplexing and FASTQ file generation, reads from each sample were normalized in R and then aligned locally against references corresponding to targeted regions of interest using a banded Smith–Waterman algorithm[27].

Machine learning

The k-nearest neighbor classification with leave-one-out cross-validation algorithm (KNNXV)[8], as implemented on Genepattern[28], was used on the set of DEGs identified by RNA-Seq-based transcriptome profiling, using a k of 3, signal-to-noise ratio feature selection, Euclidean distance, and by iteratively decreasing the number of features until reaching maximum accuracy. Class prediction performance using receiver-operating characteristic (ROC) metric on targeted RNA sequencing read count results was tested using the glmnet[29] and caret[30] packages in R for ten different ML methods at default parameters: classification and regression trees (“rpart” method), generalized linear models (“glmnet” method), linear discriminant analysis (“lda” method), k-nearest neighbor (“knn” method), random forest (“rf” method), eXtreme Gradient Boosting (“xgbTree” method), neural networks (“nnet” method), linear and radial support vector machine (“svmLinear” and “svmRadial” methods), and nearest shrunken centroid (“pam” method). Subsequent feature selection and fitting of the glmnet or generalized linear models were performed using 10-fold cross-validation with regularization using lasso (least absolute shrinkage and selection operator) penalty and lambda (λ) parameter. The value of lambda that provided the minimum mean cross-validated error was used to determine the optimal set of genes.

Statistical methods

The performance of the classifier was evaluated with the use of ROC curves, calculation of area under the curve (AUC)[31], and estimates of sensitivity, specificity, positive predictive value, and negative predictive value. A Mann–Whitney nonparametric test was used for the analysis of continuous variables, and Fisher’s exact test was used for categorical variables. All confidence intervals were reported as two-sided binomial 95% confidence intervals. Statistical analysis was performed, and plots were generated using R software, version 4.0.3 (R Project for Statistical Computing).
Table 1

Performance characteristics of the 31-gene Lyme disease classifier.

Study subjectsNo. of samples testedNo. classified as Lyme (%)
Training seta137
Serologically confirmed Lyme diseaseb4439 (89)
 Seropositive at time of presentation2623 (88)
 Seropositive at 3 weeks1816 (89)
Controls9312 (13)
 Uninfected579 (16)
 Bacteremia92 (22)
 Influenza211 (5)
 Tuberculosis60 (0)
Test setc63
Serologically confirmed Lyme diseaseb1615 (94)
 Seropositive at time of presentation (early seroconversion)1010 (100)
 Seropositive at 3 weeks (late seroconversion)65 (83)
Seronegative Lyme disease1412 (86)
Controls370 (0)
 Uninfected150 (0)
 Bacteremia60 (0)
 Influenza90 (0)
 Tuberculosis30 (0)
Longitudinally collected samples
Lyme disease 0 week1614 (88)
Lyme disease 3 weeks post diagnosis1713 (76)
Lyme disease 6 months post diagnosis103 (30)

aSensitivity 95.5% [84.1–100%), specificity 86.0% (77.4–98.98%), accuracy 87.6% (80.9–92.6%), area under the curve (AUC) 97.2% (95.0–99.3%).

bPositive by two-tiered Lyme antibody testing.

cSensitivity 90.0% (83.3–100%), specificity 100% (90.0–100%), accuracy 95.2% (86.7–99.0%), AUC 98.2% (95.7–100%).

  42 in total

1.  GenePattern 2.0.

Authors:  Michael Reich; Ted Liefeld; Joshua Gould; Jim Lerner; Pablo Tamayo; Jill P Mesirov
Journal:  Nat Genet       Date:  2006-05       Impact factor: 38.330

2.  Advances in Serodiagnostic Testing for Lyme Disease Are at Hand.

Authors:  John A Branda; Barbara A Body; Jeff Boyle; Bernard M Branson; Raymond J Dattwyler; Erol Fikrig; Noel J Gerald; Maria Gomes-Solecki; Martin Kintrup; Michel Ledizet; Andrew E Levin; Michael Lewinski; Lance A Liotta; Adriana Marques; Paul S Mead; Emmanuel F Mongodin; Segaran Pillai; Prasad Rao; William H Robinson; Kristian M Roth; Martin E Schriefer; Thomas Slezak; Jessica Snyder; Allen C Steere; Jan Witkowski; Susan J Wong; Steven E Schutzer
Journal:  Clin Infect Dis       Date:  2018-03-19       Impact factor: 9.079

3.  Diagnosis of childhood tuberculosis and host RNA expression in Africa.

Authors:  Suzanne T Anderson; Myrsini Kaforou; Andrew J Brent; Victoria J Wright; Lachlan J Coin; Robert S Heyderman; Michael Levin; Brian Eley; Claire M Banwell; George Chagaluka; Amelia C Crampin; Hazel M Dockrell; Neil French; Melissa S Hamilton; Martin L Hibberd; Florian Kern; Paul R Langford; Ling Ling; Rachel Mlotha; Tom H M Ottenhoff; Sandy Pienaar; Vashini Pillay; J Anthony G Scott; Hemed Twahir; Robert J Wilkinson
Journal:  N Engl J Med       Date:  2014-05-01       Impact factor: 91.245

Review 4.  Diagnosis of lyme borreliosis.

Authors:  Maria E Aguero-Rosenfeld; Guiqing Wang; Ira Schwartz; Gary P Wormser
Journal:  Clin Microbiol Rev       Date:  2005-07       Impact factor: 26.132

5.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

Review 6.  Lyme disease: diagnostic issues and controversies.

Authors:  Maria E Aguero-Rosenfeld; Gary P Wormser
Journal:  Expert Rev Mol Diagn       Date:  2014-12-08       Impact factor: 5.225

Review 7.  RNA-Seq: a revolutionary tool for transcriptomics.

Authors:  Zhong Wang; Mark Gerstein; Michael Snyder
Journal:  Nat Rev Genet       Date:  2009-01       Impact factor: 53.242

8.  Phagocytosis of Borrelia burgdorferi, the Lyme disease spirochete, potentiates innate immune activation and induces apoptosis in human monocytes.

Authors:  Adriana R Cruz; Meagan W Moore; Carson J La Vake; Christian H Eggers; Juan C Salazar; Justin D Radolf
Journal:  Infect Immun       Date:  2007-10-15       Impact factor: 3.441

9.  Direct molecular detection and genotyping of Borrelia burgdorferi from whole blood of patients with early Lyme disease.

Authors:  Mark W Eshoo; Christopher C Crowder; Alison W Rebman; Megan A Rounds; Heather E Matthews; John M Picuri; Mark J Soloski; David J Ecker; Steven E Schutzer; John N Aucott
Journal:  PLoS One       Date:  2012-05-08       Impact factor: 3.240

10.  Notes from the field: update on Lyme carditis, groups at high risk, and frequency of associated sudden cardiac death--United States.

Authors:  Joseph D Forrester; Jonathan Meiman; Jocelyn Mullins; Randall Nelson; Starr-Hope Ertel; Matt Cartter; Catherine M Brown; Virginia Lijewski; Elizabeth Schiffman; David Neitzel; Elizabeth R Daly; Abigail A Mathewson; Whitney Howe; Lindsay A Lowe; Natalie R Kratz; Shereen Semple; P Bryon Backenson; Jennifer L White; Phillip M Kurpiel; Russell Rockwell; Kirsten Waller; Diep Hoang Johnson; Christopher Steward; Brigid Batten; Dianna Blau; Marlene DeLeon-Carnes; Clifton Drew; Atis Muehlenbachs; Jana Ritter; Jeanine Sanders; Sherif R Zaki; Claudia Molins; Martin Schriefer; Anna Perea; Kiersten Kugeler; Christina Nelson; Alison Hinckley; Paul Mead
Journal:  MMWR Morb Mortal Wkly Rep       Date:  2014-10-31       Impact factor: 17.586

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.