| Literature DB >> 28128213 |
Soohong Kim1,2, Joachim De Jonghe1,3, Anthony B Kulesa1,2, David Feldman1,4, Tommi Vatanen1,5, Roby P Bhattacharyya1,6, Brittany Berdy7, James Gomez1, Jill Nolan1, Slava Epstein7, Paul C Blainey1,2.
Abstract
Low-cost shotgun DNA sequencing is transforming the microbial sciences. Sequencing instruments are so effective that sample preparation is now the key limiting factor. Here, we introduce a microfluidic sample preparation platform that integrates the key steps in cells to sequence library sample preparation for up to 96 samples and reduces DNA input requirements 100-fold while maintaining or improving data quality. The general-purpose microarchitecture we demonstrate supports workflows with arbitrary numbers of reaction and clean-up or capture steps. By reducing the sample quantity requirements, we enabled low-input (∼10,000 cells) whole-genome shotgun (WGS) sequencing of Mycobacterium tuberculosis and soil micro-colonies with superior results. We also leveraged the enhanced throughput to sequence ∼400 clinical Pseudomonas aeruginosa libraries and demonstrate excellent single-nucleotide polymorphism detection performance that explained phenotypically observed antibiotic resistance. Fully-integrated lab-on-chip sample preparation overcomes technical barriers to enable broader deployment of genomics across many basic research and translational applications.Entities:
Mesh:
Year: 2017 PMID: 28128213 PMCID: PMC5290157 DOI: 10.1038/ncomms13919
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Figure 1Genomic sample preparation device operation and performance.
(a) A photograph of the 96 × 36 nl microfluidic sample preparation device filled with food colouring to highlight features. Dime indicates scale (white bar indicates 1 cm). Inset: the reactor (red), filter (yellow) and reservoir (green) units. Black and red arrows designate reagent input ports and sample input/output ports, respectively. (b) The microfluidic sample preparation workflows for biomass input (extreme left) and gDNA input (top). (c–e) Estimated sequence library complexity (units of fold-coverage) and mapping rate for (c) the clinical P. aeruginosa isolate gDNA samples mapped to PA-14 (of the 384 samples, eight individual replicates were lost completely during the barcoding step, leaving two replicates for each of these isolates. We know that these PCR drop-outs occurred due to faulty primers because these particular eight primer sets were subsequently observed to fail consistently across multiple samples). (d) Low-input E. coli biomass samples mapped to BL21-DE3 and (e) low-input M. tuberculosis biomass samples mapped to OFXR-14. (f) Comparison of microfluidic and optimized bench-top sample preparation from low-input from soil micro-colony biomass (left, library complexity; right, human contamination). The library complexities were calculated using Picard tools (http://broadinstitute.github.io/picard/) and the human DNA read fraction was determined using deconseq (http://deconseq.sourceforge.net/).
Figure 2Pseudomonas aeruginosa clinical isolate sequencing and error correction.
(a) Four types of samples were collected from six different subjects; (1) bronchial alveolar lavage, (2) sputum, (3) thoracostomy and (4) urine. Samples were streaked on primary plates and two control plates, which are expanded from single colonies. (b) Concordant SNP call sets are defined as call sets that agree among all technical replicate libraries. (c) The fraction of concordant and discordant SNPs among technical replicates (n=3) of one isolate from each subject (P01-4, P02-26, P03-58, P04-72, P05-105 and P06-115, where the notation P01-4 means patient subject 1, sample 4) is plotted with decreasing AF stringency (minimum read depth fixed at 6; minimum mapping quality fixed at 45; excluding indels; see ‘Methods' section). Inset: thresholding at an AF value of 0.82 (circles) balances the maximization of concordant SNPs and minimization of discordant SNPs and was used for variant calling across all our samples. (d) Receiver operator characteristic (ROC) plot compares the accuracy and sensitivity in SNP calling among low-input libraries of sample P01-4 made in the device, on the bench-top libraries from individual replicates, from pooled triplicate replicates, and at different average coverage levels. With the AF threshold value at 0.82 (red circles), there is no meaningful difference between microfluidic and bench-top libraries, or between single replicate and pooled triplicates at equal coverage. Due to differences in gene content and genomic structure of the reference and the subject P01 strain, 7% of the reference genome had no coverage in this analysis and 1.9% of the remaining sequence was masked due to poor average read mapping quality (MQ<45; Supplementary Note 9). (e) Homology tree constructed based on the number of inter-subject SNPs. (f) Heat map of SNPs between each sample, grouped by the patient subject. Asterisks indicate control plate isolates that were expanded from single colonies. Numbers represent median number of SNPs in each subject pair block.
Figure 3Functional genomics from low-input microbial samples.
(a) Antibiotic susceptibility phenotyping and genotyping. Antibiotic susceptibility was tested on a randomly sampled subset of isolates from each subject by the disc diffusion susceptibility assay. The drugs tested are IPM, CIP and CAZ (10 μg imipenem, 5 μg ciprofloxacin and 30 μg ceftazidime, respectively). Raw images of one plate from each subject that was analysed (left). Plot shows inhibition zone radius for each antibiotic and known antibiotic resistance SNPs detected by WGS, grouped by subjects (right). The samples with smaller radii close to the Clinical & Laboratory Standards Institute (CLSI) break point (red line) indicate samples that are resistant to the specific antibiotic (grey points). Radii greater than the green line indicates samples identified as susceptible by the CLSI break point (blue points). The first ten samples from subject P05 were measured on a separate day from the remaining P05 samples; the apparent difference in susceptibility in these samples and the remaining P05 samples is most likely due to a systematic difference in the assay (possibly image contrast) on the second day or degradation of the drug sample used between the two measurement sets (we classify all the P05 isolates tested as susceptible to ciprofloxacin). (b) Phylotyping of the soil micro-colonies and (c) secondary metabolite class prediction using AntiSMASH analysis of de novo assembled genomic contigs. The clustering of samples in the phylotyping results is reflected in the secondary metabolite predictions. The red numbers in (b) indicate the number of SNPs between the strains spanning each value. The heat map values in (c) represent empirical similarity scores11.