Treatment of acute lymphoblastic leukemia (ALL) necessitates continuous risk assessment of leukemic disease burden and infections that arise in the setting of immunosuppression. This study was performed to assess the feasibility of a hybrid capture next-generation sequencing panel to longitudinally measure molecular leukemic disease clearance and microbial species abundance in 20 pediatric patients with ALL throughout induction chemotherapy. This proof of concept helps establish a technical and conceptual framework that we anticipate will be expanded and applied to additional patients with leukemia, as well as extended to additional cancer types. Molecular monitoring can help accelerate the attainment of insights into the temporal biology of host-microbe-leukemia interactions, including how those changes correlate with and alter anticancer therapy efficacy. We also anticipate that fewer invasive bone marrow examinations will be required, as these methods improve with standardization and are validated for clinical use.
Treatment of acute lymphoblastic leukemia (ALL) necessitates continuous risk assessment of leukemic disease burden and infections that arise in the setting of immunosuppression. This study was performed to assess the feasibility of a hybrid capture next-generation sequencing panel to longitudinally measure molecular leukemic disease clearance and microbial species abundance in 20 pediatric patients with ALL throughout induction chemotherapy. This proof of concept helps establish a technical and conceptual framework that we anticipate will be expanded and applied to additional patients with leukemia, as well as extended to additional cancer types. Molecular monitoring can help accelerate the attainment of insights into the temporal biology of host-microbe-leukemia interactions, including how those changes correlate with and alter anticancer therapy efficacy. We also anticipate that fewer invasive bone marrow examinations will be required, as these methods improve with standardization and are validated for clinical use.
Cell-free DNA (cfDNA) sequencing has shown promise as a noninvasive diagnostic tool for evaluating maternal and child health (, ), detecting cancer earlier and monitoring treatment response (–), as well as identifying infectious microbes with less bias (, ). Acute lymphoblastic leukemia (ALL) is the most common pediatric cancer and, despite marked improvements in outcomes over the past 50 years, remains one of the leading causes of pediatric cancer–associated morbidity and mortality (). Further, because of the immunosuppressive and myelosuppressive consequences of current cytotoxic therapies, infectious complications remain the major cause of treatment-associated mortality ().The presence of cfDNA in plasma has been known for decades (, ). Predicated upon advances in DNA sequencing technologies, circulating tumor DNA (ctDNA) is currently being evaluated as a biomarker for the presymptomatic detection of malignancies, as well as a tool to monitor response and guide treatment (–, , ). Most of the work to date has been performed on solid tumors where ctDNA monitoring has increasingly demonstrated diagnostic and prognostic value that has the potential to improve the outcomes of patients with cancer (–). Cell-free RNA sequencing and nucleosome imprinting have revealed that hematopoietic cells are the largest contributors to circulating nucleic acid pools in healthy individuals (, ), suggesting that ctDNA monitoring may be even more sensitive in the detection of tumors of hematopoietic origin when compared to solid tumors. Large studies of ctDNA derived from lymphomas have demonstrated improved risk assessment when used to monitor minimal residual disease (MRD) (–). Previous groups have applied next-generation sequencing (NGS) on cellular pediatric leukemia samples to characterize mutational landscapes at diagnosis (–) and relapse (–), as well as to measure immunoglobulin clonality () as a sensitive biomarker of residual disease. Limited cfDNA sequencing studies have been performed on acute leukemias (, ), and, to date, we are unaware of studies that have evaluated ctDNA in pediatric patients with leukemia.The identification and quantification of nonhuman DNA in plasma have also shown promise as a strategy for diagnosing infectious diseases (), as well as monitoring the human virome (–). Studies have shown that microbial cfDNA (mcfDNA) sequencing has similar sensitivity to standard methods for common causes of bloodstream infection, with an increased capacity to more rapidly detect rare species, as well as pathogens that are otherwise difficult to detect through conventional cell culture methods (, ). For example, shotgun mcfDNA sequencing has been recently shown to predict impending infections in pediatric patients with cancer (). Additional improvements in the sensitivity of mcfDNA sequencing could further establish the clinical utility of microbial sequencing for diagnosing and predicting the presence of infectious complications. One strategy for increasing the sensitivity of mcfDNA detection is to enrich for microbe-specific sequences.In the present study, we sought to develop an approach that would simultaneously monitor the two leading causes of mortality in pediatric patients with ALL: (i) disease persistence or relapse and (ii) infectious microbes in the setting of immunosuppression. To accomplish this, we developed and evaluated a novel capture-based ultradeep sequencing strategy that enriched for leukemia-specific mutations and potential microbes in the plasma of patients with ALL as they underwent induction chemotherapy. We found that we can detect and monitor leukemia in most patients and identified specific patterns of ctDNA dynamics as patients underwent treatment. Further, by comparing cellular to ctDNA variant allelic fractions (VAFs), we found that performing invasive bone marrow examinations at diagnosis to isolate malignant cells may be unnecessary for analyzing small genomic variants, as similar variants are detected at comparable frequencies in both sample types. Last, we found that the human virome is dynamic in patients with ALL during the first 6 weeks of treatment, with evidence for widespread reactivation of herpes and polyoma viruses.
RESULTS
Study overview
A total of 168 samples underwent sequencing, prospectively collected from 20 newly diagnosed patients with ALL over a 6-week course of induction therapy on the Total Therapy XVII clinical trial. Both cellular and plasma (cell-free) fractions were collected from the peripheral blood (PB) and bone marrow to enable comparisons of distinct potential reservoirs of leukemia-associated mutations and infectious microbial DNA (Fig. 1A). Panel coverage (Fig. 1B) was found to be uniform over the target sequence with 100% of the target region covered at a mean depth of 12,884 reads. Most of the reads had insert sizes consistent with cfDNA wrapped around a single nucleosome, rather than DNA that had been released from tumor cells during sample processing (fig. S9).
Fig. 1.
Specimen workflow, quality control metrics, and cfDNA yield throughout induction therapy.
(A) Workflow for contemporaneous and quantitative profiling of cancer variants and microbial species. Blood draw is followed by centrifugation for separation of cellular and acellular compartments. Probes spanning the target regions of the somatic leukemia panel and microbial species hybridize to targets and are enriched through magnetic separation, followed by NGS. (B) High depth of coverage is uniform across the panel. (C) cfDNA yield from PB uptrends a few days after starting chemotherapy during tumor lysis, whereas yield from noncellular bone marrow (NCBM) increases during later phases of induction with normal blood cell count recovery.
Specimen workflow, quality control metrics, and cfDNA yield throughout induction therapy.
(A) Workflow for contemporaneous and quantitative profiling of cancer variants and microbial species. Blood draw is followed by centrifugation for separation of cellular and acellular compartments. Probes spanning the target regions of the somatic leukemia panel and microbial species hybridize to targets and are enriched through magnetic separation, followed by NGS. (B) High depth of coverage is uniform across the panel. (C) cfDNA yield from PB uptrends a few days after starting chemotherapy during tumor lysis, whereas yield from noncellular bone marrow (NCBM) increases during later phases of induction with normal blood cell count recovery.At diagnosis, average plasma cfDNA yield was similar to noncellular bone marrow (NCBM) cfDNA yield (mean, 85 and 75 ng, respectively). Plasma cfDNA yield increased during the first few days of induction chemotherapy, likely as a result of rapidly dying leukemic cells after beginning treatment. Similarly, NCBM cfDNA yield peaked at day 15, suggesting that the cells within the bone marrow were still dying at a higher rate and at a later time point than the leukemia cells in PB. Thereafter, cfDNA concentration remained around 10 ng/ml of blood throughout induction therapy, whereas NCBM cfDNA yield increased in the latter part of induction therapy (Fig. 1C), potentially reflecting the turnover of rejuvenating hematopoietic cells after induction therapy.
Patient dynamics in mutational VAF clearance during induction therapy
The custom panel–based ctDNA NGS identified one or more variants at diagnosis in 17 of 20 patients (85%) (Fig. 2A). Mutational clearance throughout induction chemotherapy was near universally observed. Each patient had an average of 2.3 detectable mutations (range, 0 to 5; SD, 1.3). Thirty-nine unique mutations were detected with only one instance of a shared mutation (KRAS p.G13D in cases 3 and 16). Distinct patterns of variant allele clearance were observed as patients underwent induction chemotherapy treatment. In general, VAFs across all compartments increased and then diminished throughout the course of induction therapy (Fig. 2B) coincident with pharmacokinetics of chemotherapy and rapid cell death. Coverage of those loci was maintained over this time period (Fig. 2C). Truncal mutations (defined as an allele frequency of >0.25 in both NCBM and PB) were found in 8 of 16 patients (71%) and included the genes NOTCH1, NRAS, KRAS, JAK2, NF1, and ADGRL2.
Fig. 2.
Variants detected within patient cohort with longitudinal clearance over time.
(A) Number of unique variants detected per case. (B) Mutational clearance visualized as average VAF across all compartments throughout induction chemotherapy with flow cytometry MRD overlayed quantitatively for days 8 (from PB), 15 [from bone marrow (BM)], and 42/EOI (from bone marrow). Mutational clearance was observed in most cases. In some instances (for example, case 17 on day 147), mutations were detectable in the blood that were not observed in the bone marrow likely because of spatial heterogeneity. Note the case 10 and 14 to 16 MRD detectable by flow at end of induction (EOI). (C) Coverage across cellular bone marrow, cell-free bone marrow, and PB of selected variants known to be found in patients with ALL. (D) Correlations of variants within any mutated gene detected on days 1 and 15 across compartments for all patients. Cellular bone marrow was not available on day 15.
Variants detected within patient cohort with longitudinal clearance over time.
(A) Number of unique variants detected per case. (B) Mutational clearance visualized as average VAF across all compartments throughout induction chemotherapy with flow cytometry MRD overlayed quantitatively for days 8 (from PB), 15 [from bone marrow (BM)], and 42/EOI (from bone marrow). Mutational clearance was observed in most cases. In some instances (for example, case 17 on day 147), mutations were detectable in the blood that were not observed in the bone marrow likely because of spatial heterogeneity. Note the case 10 and 14 to 16 MRD detectable by flow at end of induction (EOI). (C) Coverage across cellular bone marrow, cell-free bone marrow, and PB of selected variants known to be found in patients with ALL. (D) Correlations of variants within any mutated gene detected on days 1 and 15 across compartments for all patients. Cellular bone marrow was not available on day 15.
Comparison of cells, NCBM, PB cfDNA, and minimum residual disease
The concordance of variants detected across cellular, NCBM, and PB cfDNA was assessed (Fig. 2D). At diagnosis, PB VAFs correlated most strongly with VAFs detected in the cellular bone marrow (23 of 30 variants detected in both) with each compartment picking up a few unique variants (three and four for cellular bone marrow and PB cfDNA, respectively), suggesting that both compartments are capturing most of the same variant information (fig. S1). Approximately a third of variants were simultaneously detected in all three compartments. Three cases showed VAFs of >30% in the PB that were not detected in the bone marrow (case 20: FBXW7, HUWE1, and MTUS2; case 11: SMARCB4; case 4: THSD7A). On day 15, 68% of variants detected in the PB were also found in the cell-free bone marrow. Four patients (cases 10, 14, 15, and 16) had detectable MRD by flow at end of induction (EOI) (Fig. 2B). Several cases demonstrated detectable variants despite negative EOI MRD by flow. Conversely, one patient (case 10) had persistent flow MRD at EOI (day 42) without any detectable mutations thereafter, suggesting that a subclone not captured by the sequencing panel persisted through treatment.
Microbial invasion observed during induction
We also established technical feasibility for simultaneously detecting viral, bacterial, and fungal microbial species in both cells and cfDNA through our custom microbe panel and novel microbe detection pipeline (Fig. 3 and figs. S3 to S6). Notably, cases 1 to 3 did not undergo microbial enrichment as part of the workflow and therefore did not have any microbes detected at any time points. In addition, we evaluated mcfDNA from five healthy controls and five patients with acute myelogenous leukemia (AML) at diagnosis as controls. We found that the reads assigned to a specific microbe spanned both the probe target regions and surrounding areas of the genome (fig. S2). Fifteen of 20 of patients had a fever during induction chemotherapy that triggered standard blood cultures and initiation of antibiotics during induction chemotherapy (table S2). Only two patients had positive cultures (case 8 urine culture positive for staph species and case 10 blood culture positive for Candida lusitaniae).
Fig. 3.
Microbial sequencing workflow and longitudinal detection of species across compartments.
(A) Workflow for identification of microbial species. Probes spanning the target regions of the microbial species hybridized to targets and were enriched, followed by NGS. Raw reads underwent concordance checks through Kraken and Blast identification to determine the valid number of reads per species. (B) Microbial (viral, bacterial, and fungal) abundance in cellular bone marrow, NCBM, and PB across induction days. Note that y axis for each case is fit to scale.
Microbial sequencing workflow and longitudinal detection of species across compartments.
(A) Workflow for identification of microbial species. Probes spanning the target regions of the microbial species hybridized to targets and were enriched, followed by NGS. Raw reads underwent concordance checks through Kraken and Blast identification to determine the valid number of reads per species. (B) Microbial (viral, bacterial, and fungal) abundance in cellular bone marrow, NCBM, and PB across induction days. Note that y axis for each case is fit to scale.In general, we found the changes in mcfDNA to be dynamic and the microbes detected to be recurrent across induction therapy within a patient (Fig. 3B). When compared to healthy control samples, we found changes in the dominant microbe species detected in patients with ALL. For example, bacterial species from the Ralstonia genus were detected in all healthy samples but only one of the 126 ALL samples. Conversely, Cutibacterium was detected in most ALL samples but none of the healthy controls (fig. S4), which may represent the ability of the commensal skin bacteria to invade the human host. Bacillus species, a known cause of mortality of pediatric patients with ALL (), were detected in a subset of patients. For fungal DNA, Malassezia species were detected in 60% of healthy controls and 13 of 17 (82%) of diagnostic ALL samples but none of the AML samples (fig. S5).
Complex virome dynamics in patients with ALL during induction therapy
Trending of viral species detection during induction therapy in PB demonstrated reproducibility in species detected for each case (Fig. 4A) and compared to healthy controls and five age-matched AML cases. All healthy control samples had a single viral species detected in all samples, human herpesvirus 6 (HHV6B), whereas only 1 of 17 (16%) of ALL diagnostic samples had HHV6B detected. Nine of 17 (53%) diagnostic samples had at least one viral species detected at diagnosis. Nine of nine patients with a viral species detected at diagnosis had the same species detected in at least one subsequent sample, eight of nine patients had additional viral species detected during induction therapy, and six of eight patients with no viral species detected at diagnosis subsequently had the detection of a DNA virus (Fig. 3B). These results suggest that viral infection and reactivation are present at the time of diagnosis but that the relative species composition changes over time as patients receive chemotherapy with attendant immunosuppression (fig. S8). To look for evidence that these viral species could be contributing to leukemia pathogenesis and persistence, we used the same enrichment strategy to examine the diagnostic cellular bone marrow fractions. We found a small number of sequencing reads from cellular bone marrow samples mapping to HHV6B in three patients and human gammaherpesvirus 4 (Epstein-Barr virus) in one patient. However, those species were both found in normal healthy and AML samples, suggesting that they are not uniquely required for the persistence of ALL cells at the time of diagnosis.
Fig. 4.
Trending of PB viral counts with representation of microbial species across compartments.
(A) PB viral species detection during induction therapy. For graphic simplicity, only cases where same virus appears three times are plotted. See figs. S4 to S6 for all microbial species trends, without any filtering for repeat detection. Note that cases 1 to 3 are not depicted, as they did not undergo microbial enrichment as part of the workflow and therefore did not have any microbes detected at any time points. (B) Representation of microbial species detected according to compartment across all cases and time points. (C) Concordance of fungal, viral, and bacterial species detected across compartments on days 1 and 15.
Trending of PB viral counts with representation of microbial species across compartments.
(A) PB viral species detection during induction therapy. For graphic simplicity, only cases where same virus appears three times are plotted. See figs. S4 to S6 for all microbial species trends, without any filtering for repeat detection. Note that cases 1 to 3 are not depicted, as they did not undergo microbial enrichment as part of the workflow and therefore did not have any microbes detected at any time points. (B) Representation of microbial species detected according to compartment across all cases and time points. (C) Concordance of fungal, viral, and bacterial species detected across compartments on days 1 and 15.The viral families detected in our ALL samples were Herpesviridae, Polyomaviridae, Parvoviridae, and Anelloviridae. Five different Herpesviridae were detected, and 16 of 17 patients had at least one viral species detected, suggesting that herpes virus reactivation is common in children undergoing induction therapy for ALL. Human alphaherpesvirus 1 (HSV1) showed the most persistent detection in positive patients. None of these patients developed clinical symptoms that motivated clinicians to test them for HSV1 (table S2). Analyses of white blood cell counts over induction found that patients with any of the same Herpesviridae or HSV1 specifically detected in multiple samples had higher absolute lymphocyte and neutrophil counts on day 8 of induction, suggesting that blood counts may be a surrogate marker of herpes virus reactivation (fig. S8). Polyomaviridae was the second most common family of viruses detected with 12 of 17 patients testing positive for at least one species, while the Parvoviridae and Anneloviridae showed more sporadic detection.
Common features of microbes in patients with ALL
We found several consistent microbe reservoir patterns within classes across patients. First, fungal and viral sequences tended to be found consistently in both the bone marrow and PB plasma, while bacteria tended to be compartment specific (Fig. 4, A and B). That pattern changes between days 1 and 15 when PB identified more species at day 1, whereas the two compartments are usually concordant on day 15 (Fig. 4C). This suggests that the organisms are in locations outside of the bone marrow sampling site at diagnosis but are still detected in the PB and that mcfDNA has become more widespread on day 15 (fig. S6) coincident with increasing immunosuppression. All 17 patients had negative viral and bacterial results when tested for clinical indications except case 10. This suggests that the microbes detected by cfDNA are below the clinical detection and/or sensitivity of current microbial diagnostics.
DISCUSSION
We have shown that a custom ctDNA capture strategy can be used to simultaneously follow disease persistence and microbe dynamics in patients with ALL. We generally found that PB ctDNA persistence generally correlates with disease persistence detected by flow cytometry and that each compartment can detect disease when the other is negative. Flow cytometry and NGS thus capture complementary information, with ctDNA having the added benefit of potentially identifying variants that are known to have qualitative differences in outcomes, such as NT5C2 () or PRSP1 () mutations resulting in thiopurine resistance and disease recurrence. Our comparisons of NCBM and PB plasma with the cellular leukemia cells at diagnosis found that PB and cellular leukemia cells most closely correlate at diagnosis but that each compartment is able to capture distinct variants. After starting therapy, the NCBM and PB ctDNA were more closely correlated. These findings suggest that performing invasive bone marrow exams may not add much additional information for genomic testing of small variants. Further work is needed to add the detection of gene fusions and copy number variants from ctDNA or circulating tumor RNA to develop global genomic testing noninvasively. Additional studies are also needed on larger populations of patients to further clarify these findings and establish clinical relevance.We also found evidence for dynamic changes in invasive microbes in patients with ALL. We identified changes in the most common viral and bacterial species detected compared to healthy controls and found viral reactivation, especially of Herpesviridae and Polyomaviradae, as patients underwent treatment. These microbes were all present at subclinical levels and were not otherwise detected through standard methods such as blood and urine cultures although often demonstrated concordance across compartments (fig. S7). Further work is needed to clarify whether these microbial species are altering the clinical course of these patients, including by triggering fevers (and initiation of antibiotics), directly causing infections (), or limiting the effectiveness of anticancer therapies by delaying administration of chemotherapy or altering pharmacokinetics. We did not find direct evidence for active microbe infections contributing to the transformation or persistence of the leukemia cells. However, it is possible that one or more of these microbes could be contributing to the initial leukemic transformation but become dispensable as the malignant populations gain the ability to autonomously expand.Our approach had several important limitations. First, we used a targeted approach, without simultaneously assaying for copy number and fusions as has been demonstrated previously (), targeting known genes and microbes. Using a broader exome panel would enable us to capture unknown mutations that enable leukemic cell persistence through cancer evolution or iatrogenic mutagenesis, likely increasing the number of mutations detected per patient and sensitivity of our approach albeit at higher cost if the same sensitivity is achieved. Similarly, our microbe approach focused on known species. Performing a less biased experimental or computational approach, such as the depletion of human DNA and/or assembly of unmapped reads, could uncover novel microbes. Moreover, microbial identification does not establish sensitivity to chosen antimicrobial agents that remains most clinically helpful. We anticipate that blood culture will remain invaluable for microbial susceptibility assays, whereas molecular methods for identification will become increasingly feasible, as sequencing costs decrease and detection thresholds become more established.Our targeted approach remains more cost effective than exome-based methods owing to biology-informed panel design. Future inclusion of RNA sequencing has the potential to determine structural variants otherwise detected by cytogenetic methods such as fluorescence in situ hybridization that inform both patient risk (and therefore treatment intensity) and, when certain fusions are present, inclusion of tyrosine kinase inhibitors such as dasatinib in the treatment regimen. Here, we acquired some information on the spatial dynamics of leukemic evolution and microbe invasion by comparing the bone marrow to the PB. Additional insights can be gained by accessing additional bone marrow sites and other tissue types such as the cerebrospinal fluid in these patients. Prospective clinical trials that include molecular monitoring methods will further establish how and when simultaneous tumor and microbial assessments inform personalized patient risk stratification, targeted therapy initiation, detection of mutations conferring innate and acquired resistance, and infectious complications.We have demonstrated that a customized hybrid capture NGS panel can noninvasively measure the two leading causes of mortality in pediatric patients with ALL: leukemic disease burden and the invasion of infectious microbes. This proof of concept helps establish a technical and conceptual framework that we anticipate will be expanded and applied to different populations of patients with leukemias, as well as additional cancers. This will help accelerate the attainment of biological insights into the temporal biology of host-microbe-leukemia interactions, including how those changes correlate with and alter the efficacy of anticancer therapies. We also anticipate that fewer invasive bone marrow biopsies will be necessary, as these methods improve to include copy number and fusion analysis with standardization and validation for clinical use.
MATERIALS AND METHODS
Patient cohort, sample processing, and sequencing library preparation
Twenty patients (8 females and 12 males; average age, 7.9 years) enrolled on Total Therapy XVII for Newly Diagnosed Patients with Acute Lymphoblastic Leukemia and Lymphoma (NCT03117751) at St. Jude Children’s Research Hospital (table S1) had serial samples collected during routine blood draws and standard bone marrow biopsy procedures after approval of the Institutional Review Board and informed consent for research use. During induction chemotherapy, patients received chemotherapy consisting of prednisone, vincristine, daunorubicin, pegaspargase, and triple intrathecal therapy, followed by cyclophosphamide, cytarabine, and mercaptopurine. One patient with an tyrosine-protein kinase ABL1-class fusion received dasatinib (patient 17). The protocol was reviewed and approved by the St. Jude and Stanford Institutional Review Boards. The sample workflow (Fig. 1A) involved DNA extraction, followed by sequencing of libraries prepared through direct ligation and capture-based enrichment using a custom panel. Bone marrow samples were collected at the time of diagnosis, at day 15, at day 22 for patients with an MRD of ≥1% on day 15, and at EOI defined as either day 42 of induction or upon PB count recovery per protocol. PB samples (5 ml in EDTA) were also collected at diagnosis and days 8, 15, 22, and EOI. Samples were centrifuged at 400g for 10 min, and plasma was transferred to a separate tube. When cells were isolated, phosphate-buffered saline was then added to the remaining sample in a 1:2 ratio, followed by density centrifugation at 400g for 30 min with Ficoll-Plus (Amersham) per the manufacturer’s recommendations. Plasma and cells were then stored at −80°C until DNA was extracted. Plasma separation was done within 6 hours from blood collection to limit cellular apoptosis and necrosis. cfDNA was extracted from 650 μl to 1 ml (average, 950 μl) of PB plasma or NCBM plasma using Maxwell RSC ccfDNA Plasma Kit, according to the manufacturer’s protocol using 40 μl of elution buffer (QIAGEN). cfDNA was quantified with a Qubit dsDNA High Sensitivity Kit. For each sample, all extracted cfDNA (i.e., without size selection) was input into NEBNext UltraII DNA Library Preparation Kit for Illumina (New England Biolabs, #E7645) according to the manufacturer’s protocol except that the KAPA 10x primer mix was used for polymerase chain reaction enrichment of the adapter-ligated libraries and QIAGEN buffer EB was used for elution. The traditional shearing step in library preparation was skipped to minimize contamination from nonapoptotic sources and increase the ctDNA purity of the library. Adapter combinations were selected per TruSeq DNA Sample Preparation Pooling Guidelines for HiSeq (Illumina).
Sequencing
Following library preparation and quantification, sequencing was performed on the MiSeq instrument (Illumina) for assessment of the quality of the sequencing libraries before hybridization capture. For each case, libraries (equivalent to predefined time points) were pooled together in a 1:1 ratio into one capture. Whole-exome sequencing (WES) was performed with the xGen Exome Research Panel [Integrated DNA Technologies (IDT)] that consists of 429,826 probes spanning 39 Mb (19,396 genes) of the human genome covering 51 Mb of end-to-end tiled probe space. Sequencing was performed on the NextSeq instrument (Illumina).
Custom capture panel design
The design of our capture panel was informed by a priori sequencing of 600 patients with ALL, and the microbe capture panel was designed using probes specific to common human pathogens. For targeted hybridization-based capture of cfDNA, a fit-for-purpose gene panel covering 1668 exons (319.4 kb; IDT) was used to target previously defined somatic mutations in leukemia oncogene pathways, tumor suppressor genes, genes associated with relapse, genes associated with drug metabolism and glucocorticoid resistance, and genes identified from recent CRISPR-Cas9 screens (). A deletion probe set was designed to regions surrounding heterozygous single-nucleotide polymorphisms (SNPs) in genes commonly deleted in ALL. Heterozygous SNPs were from dbSNP (build 150) and found in ≥1% of samples. Only SNPs in nonrepetitive regions with allele frequency of 0.3 to 0.5% were considered. Probes were 120 base pairs (bp) in size as per the manufacturer’s specifications. Genomic coordinates for gene segments were retrieved from Halper-Stromberg et al. (). Microbial probe sets were designed from molecular barcodes [16S/18S ribosomal RNA and fungal nuclear ribosomal internal transcribed spacer (ITS) regions] and species-specific genes from human pathogens common in patients with leukemia. Sequence data were derived from the National Center for Biotechnology Information (NCBI) gene database, OrthoDB, Virus Pathogen Database and Analysis Resource, Ribosomal Database Project, OrthoMCL Database, and EuPathDB databases.
Variant calling
For WES data analysis, FASTQ files were trimmed with Trimmomatic 0.35 to cut adapters and other manufacturer-specific sequences from reads and remove low-quality bases. Alignment to the human reference genome GRCh38/hg38 was performed with a Burrows-Wheeler Aligner (BWA) MEM 0.7.17. Duplicates were marked with Picard’s MarkDuplicates, and local realignment and base recalibration were achieved with GATK 4.1.4.1. Variant calling and filtering were performed following GATK’s best practices workflow for germline somatic pipeline for short variant discovery involving the HaplotypeCaller tool for calling and the VariantRecalibrator tool for filtering. Annotation using various databases was performed with ANNOVAR ().For targeted sequencing using the customized ALL gene panel, NGS reads were demultiplexed using Illumina’s bcl2fastq 2.20. Trimmomatic was again used for removing manufacturer-specific sequences and alignment performed using BWA MEM 0.7.17 and GRCh38/hg38. Analysis of this deep sequencing data was done using the error suppression tool CleanDeepSeq ().
Microbial filtering
The targeted sequencing data contains reads from both the ALL and microbe panels. Using the targeted sequencing data, manufacturer-specific sequences were trimmed from the FASTQ files using Trimmomatic 0.35 and then sequentially aligned to human reference genome GRCh38/hg38 and Macaca mulatta (rhesus monkey) Mmul_10 reference genome using the stringent BWA ALN 0.7.17 algorithm. The remaining unaligned reads were then collected for further microbe analysis. Microbes from the unaligned reads were first identified using metagenomic classifier Kraken2 2.0.8, and reads supporting these microbes underwent concordance check through the BLAST+ 2.10.0 algorithm (Fig. 3A) (, ). False-positive microbial reads were characterized by (i) fewer than 10 supporting reads from virus and 50 reads from bacteria and fungi, (ii) less than 95% BLAST+ that match on both forward and reverse reads, (iii) forward reads and reverse reads that do not match to the same Kraken species, or (iv) the contig length outside a 120- to 600-bp window.
Authors: Iwijn De Vlaminck; Lance Martin; Michael Kertesz; Kapil Patel; Mark Kowarsky; Calvin Strehl; Garrett Cohen; Helen Luikart; Norma F Neff; Jennifer Okamoto; Mark R Nicolls; David Cornfield; David Weill; Hannah Valantine; Kiran K Khush; Stephen R Quake Journal: Proc Natl Acad Sci U S A Date: 2015-10-12 Impact factor: 11.205
Authors: Anne Marie Lennon; Adam H Buchanan; Isaac Kinde; Andrew Warren; Ashley Honushefsky; Ariella T Cohain; David H Ledbetter; Fred Sanfilippo; Kathleen Sheridan; Dillenia Rosica; Christian S Adonizio; Hee Jung Hwang; Kamel Lahouel; Joshua D Cohen; Christopher Douville; Aalpen A Patel; Leonardo N Hagmann; David D Rolston; Nirav Malani; Shibin Zhou; Chetan Bettegowda; David L Diehl; Bobbi Urban; Christopher D Still; Lisa Kann; Julie I Woods; Zachary M Salvati; Joseph Vadakara; Rosemary Leeming; Prianka Bhattacharya; Carroll Walter; Alex Parker; Christoph Lengauer; Alison Klein; Cristian Tomasetti; Elliot K Fishman; Ralph H Hruban; Kenneth W Kinzler; Bert Vogelstein; Nickolas Papadopoulos Journal: Science Date: 2020-04-28 Impact factor: 47.728
Authors: Winston Koh; Wenying Pan; Charles Gawad; H Christina Fan; Geoffrey A Kerchner; Tony Wyss-Coray; Yair J Blumenfeld; Yasser Y El-Sayed; Stephen R Quake Journal: Proc Natl Acad Sci U S A Date: 2014-05-05 Impact factor: 11.205
Authors: Frank Diehl; Kerstin Schmidt; Michael A Choti; Katharine Romans; Steven Goodman; Meng Li; Katherine Thornton; Nishant Agrawal; Lori Sokoll; Steve A Szabo; Kenneth W Kinzler; Bert Vogelstein; Luis A Diaz Journal: Nat Med Date: 2007-07-31 Impact factor: 53.440
Authors: Muhammed Murtaza; Sarah-Jane Dawson; Dana W Y Tsui; Davina Gale; Tim Forshew; Anna M Piskorz; Christine Parkinson; Suet-Feung Chin; Zoya Kingsbury; Alvin S C Wong; Francesco Marass; Sean Humphray; James Hadfield; David Bentley; Tan Min Chin; James D Brenton; Carlos Caldas; Nitzan Rosenfeld Journal: Nature Date: 2013-04-07 Impact factor: 49.962
Authors: Charles G Mullighan; Salil Goorha; Ina Radtke; Christopher B Miller; Elaine Coustan-Smith; James D Dalton; Kevin Girtman; Susan Mathew; Jing Ma; Stanley B Pounds; Xiaoping Su; Ching-Hon Pui; Mary V Relling; William E Evans; Sheila A Shurtleff; James R Downing Journal: Nature Date: 2007-04-12 Impact factor: 49.962
Authors: Yuxuan Wang; Lu Li; Joshua D Cohen; Isaac Kinde; Janine Ptak; Maria Popoli; Joy Schaefer; Natalie Silliman; Lisa Dobbyn; Jeanne Tie; Peter Gibbs; Cristian Tomasetti; Kenneth W Kinzler; Nickolas Papadopoulos; Bert Vogelstein; Louise Olsson Journal: JAMA Oncol Date: 2019-08-01 Impact factor: 31.777
Authors: Robert J Autry; Steven W Paugh; Robert Carter; Lei Shi; Jingjing Liu; Daniel C Ferguson; Calvin E Lau; Erik J Bonten; Wenjian Yang; J Robert McCorkle; Jordan A Beard; John C Panetta; Jonathan D Diedrich; Kristine R Crews; Deqing Pei; Christopher J Coke; Sivaraman Natarajan; Alireza Khatamian; Seth E Karol; Elixabet Lopez-Lopez; Barthelemy Diouf; Colton Smith; Yoshihiro Gocho; Kohei Hagiwara; Kathryn G Roberts; Stanley Pounds; Steven M Kornblau; Wendy Stock; Elisabeth M Paietta; Mark R Litzow; Hiroto Inaba; Charles G Mullighan; Sima Jeha; Ching-Hon Pui; Cheng Cheng; Daniel Savic; Jiyang Yu; Charles Gawad; Mary V Relling; Jun J Yang; William E Evans Journal: Nat Cancer Date: 2020-03-09
Authors: Kathryn P Goggin; Veronica Gonzalez-Pena; Yuki Inaba; Kim J Allison; David K Hong; Asim A Ahmed; Desiree Hollemon; Sivaraman Natarajan; Ousman Mahmud; William Kuenzinger; Sarah Youssef; Abigail Brenner; Gabriela Maron; John Choi; Jeffrey E Rubnitz; Yilun Sun; Li Tang; Joshua Wolf; Charles Gawad Journal: JAMA Oncol Date: 2020-04-01 Impact factor: 31.777