| Literature DB >> 25549184 |
Yan Wei Lim1, Matthew Haynes2, Mike Furlan3, Charles E Robertson4, J Kirk Harris5, Forest Rohwer3.
Abstract
The accessibility of high-throughput sequencing has revolutionized many fields of biology. In order to better understand host-associated viral and microbial communities, a comprehensive workflow for DNA and RNA extraction was developed. The workflow concurrently generates viral and microbial metagenomes, as well as metatranscriptomes, from a single sample for next-generation sequencing. The coupling of these approaches provides an overview of both the taxonomical characteristics and the community encoded functions. The presented methods use Cystic Fibrosis (CF) sputum, a problematic sample type, because it is exceptionally viscous and contains high amount of mucins, free neutrophil DNA, and other unknown contaminants. The protocols described here target these problems and successfully recover viral and microbial DNA with minimal human DNA contamination. To complement the metagenomics studies, a metatranscriptomics protocol was optimized to recover both microbial and host mRNA that contains relatively few ribosomal RNA (rRNA) sequences. An overview of the data characteristics is presented to serve as a reference for assessing the success of the methods. Additional CF sputum samples were also collected to (i) evaluate the consistency of the microbiome profiles across seven consecutive days within a single patient, and (ii) compare the consistency of metagenomic approach to a 16S ribosomal RNA gene-based sequencing. The results showed that daily fluctuation of microbial profiles without antibiotic perturbation was minimal and the taxonomy profiles of the common CF-associated bacteria were highly similar between the 16S rDNA libraries and metagenomes generated from the hypotonic lysis (HL)-derived DNA. However, the differences between 16S rDNA taxonomical profiles generated from total DNA and HL-derived DNA suggest that hypotonic lysis and the washing steps benefit in not only removing the human-derived DNA, but also microbial-derived extracellular DNA that may misrepresent the actual microbial profiles.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25549184 PMCID: PMC4354477 DOI: 10.3791/52117
Source DB: PubMed Journal: J Vis Exp ISSN: 1940-087X Impact factor: 1.355
|
|
|
|
|
|
|
|
| |
| Total number of reads | 224,859 | 87,891 | 106,189 | 93,301 | 140,020 | 1,558 | 272,552 | 217,438 |
| Preprocessed readsa | 109,389 | 73,624 | 67,070 | 82,011 | 68,617 | 1,137 | 215,808 | 158,432 |
| 49% | 84% | 63% | 88% | 49% | 73% | 79% | 73% | |
| Number of bases | 47,239,573 | 33,351,525 | 28,922,479 | 27,667,695 | 29,386,841 | 243,986 | 95,205,805 | 69,581,811 |
| Mean read length | 432 | 453 | 431 | 337 | 428 | 215 | 441 | 439 |
| Host sequencesb | 240 | 526 | 28 | 79,774 | 13 | 797 | 585 | 5,859 |
| 0.21% | 0.71% | 0.04% | 97.27% | 0.02% | 70.10% | 0.27% | 3.70% | |
| Viral hitsc | 7,214 | 23,550 | 4,070 | 737 | 4,642 | 22 | 6,466 | 5,981 |
| 6.59% | 31.99% | 6.07% | 0.90% | 6.77% | 1.93% | 3.00% | 3.78% | |
| Unassigned Readsd | 103,888 | 60,490 | 32,780 | 1,935 | 68,440 | 311 | 105,612 | 119,551 |
| 94.97% | 82.16% | 48.87% | 2.36% | 99.74% | 27.35% | 48.94% | 75.46% | |
| a Reads after data pre-processing by PRINSEQ29. | ||||||||
| b Human reads identified by DeconSeq30 plus reads with a best BLASTn hit (NCBI nucleotide database) to the phylum Chordata. | ||||||||
| c tBLASTx hits against in-house viral genome database. The percentage was calculated using the total number of preprocessed reads. | ||||||||
| d Reads with no BLASTn hit against the NCBI nucleotide database. The percentage was calculated using the total number of preprocessed reads. Some reads with no BLASTn hit against the NCBI nucleotide database were identified as viral at protein level in the tBLASTx analysis. |
| Sample |
|
|
|
| |
| (ng/μl) | (ng) | (Rawa) | (%) | ||
| CF1-1A* | 2.3 | 230 | 1,098,454 | 937,688 | 691,541 |
| 74% | |||||
| CF1-1 | 13 | 1,300 | 2,212,756 | 1,958,910 | 1,574,520 |
| 80% | |||||
| CF1-2A* | 2.1 | 210 | 672,878 | 588,106 | 407,530 |
| 69% | |||||
| CF1-2 | 5.2 | 520 | 1,944,012 | 1,697,010 | 1,455,174 |
| 86% | |||||
| CF1-3 | 28.8 | 2,880 | 1,048,304 | 896,756 | 560,852 |
| 63% | |||||
| CF1-4 | 24.1 | 2,410 | 1,154,922 | 984,702 | 621,098 |
| 63% | |||||
| CF1-5 | 33.6 | 3,360 | 1,029,622 | 888,630 | 481,548 |
| 54% | |||||
| CF1-6 | 43.2 | 4,320 | 1,434,016 | 1,256,504 | 725,858 |
| 58% | |||||
| CF1-7 | 57.8 | 5,780 | 1,000,174 | 872,036 | 565,376 |
| 65% | |||||
| * 1 ml of sample was subsampled from CF1-1 and CF1-2 following the first hypotonic lysis step (Step 3.1.5) before the second hypotonic lysis procedure. The cells were spun down as described in 3.1.7 and proceed through the remaining protocol without any modification. | |||||
| a Unprocessed Illumina reads from a 2 x 300 bp MiSeq sequencing run. | |||||
| b Reads were assessed, trimmed, and removed based on quality and length as described in the discussion. |
| Sample |
|
|
|
| ||||
| Treatment | None | Ribo-Zero | None | Ribo-Zero | None | Ribo-Zero | None | Ribo-Zero |
| Preprocessed reads | 2,088 | 1,991 | 40,876 | 25,238 | 19,728 | 32,737 | 31,791 | 36,172 |
| Mean read length | 275 | 245 | 262 | 270 | 233 | 259 | 240 | 267 |
| Total rRNA reads | 1,737 | 91 | 29,499 | 17,267 | 5,285 | 291 | 16,371 | 1,761 |
| 83.20% | 4.60% | 72.20% | 68.40% | 26.80% | 0.90% | 51.50% | 4.90% | |
| Microbial rRNA | 1,414 | 32 | 19,978 | 12,035 | 23 | 227 | 6,916 | 1,076 |
| 67.70% | 1.60% | 48.90% | 47.70% | 0.10% | 0.70% | 21.80% | 3.00% | |
| Eukaryota rRNA | 323 | 59 | 9,520 | 5,232 | 5,262 | 64 | 9,455 | 683 |
| 15.50% | 3.00% | 23.30% | 20.70% | 26.70% | 0.20% | 29.70% | 1.90% | |
| % rRNA removed* | 0% | 95% | 0% | 5% | 0% | 97% | 0% | 91% |
| Non-rRNA reads | 351 (16.8%) | 1,900 (95.4%) | 11,377 (27.8%) | 7,971 (31.6%) | 14,443 (73.2%) | 32,446 (99.1%) | 15,420 (48.5%) | 34,411 (95.1%) |
| Total NR hits | 102 (4.9%) | 691 (34.7%) | 3,327 (8.1%) | 2,857 (11.3%) | 4,938 (25.0%) | 10,751 (32.8%) | 5,905 (18.6%) | 15,766 (43.6%) |
| Eukaryotic | 74 | 407 | 2,790 | 2,524 | 4,614 | 10,227 | 4,553 | 8,274 |
| Bacterial | 26 | 283 | 520 | 312 | 287 | 471 | 1,326 | 7,442 |
| Unassigned reads | 249 (11.9%) | 1,209 (60.7%) | 8,050 (19.7%) | 5,114 (20.3%) | 9,505 (48.2%) | 21,695 (66.3%) | 9,515 (29.9%) | 18,645 (51.5%) |
| *The amount of rRNA removed expressed as a percentage of the amount present in the non-depleted aliquot. |