| Literature DB >> 22962364 |
Christine M Malboeuf1, Xiao Yang, Patrick Charlebois, James Qu, Aaron M Berlin, Monica Casali, Kendra N Pesko, Christian L Boutwell, John P DeVincenzo, Gregory D Ebel, Todd M Allen, Michael C Zody, Matthew R Henn, Joshua Z Levin.
Abstract
RNA viruses are the causative agents for AIDS, influenza, SARS, and other serious health threats. Development of rapid and broadly applicable methods for complete viral genome sequencing is highly desirable to fully understand all aspects of these infectious agents as well as for surveillance of viral pandemic threats and emerging pathogens. However, traditional viral detection methods rely on prior sequence or antigen knowledge. In this study, we describe sequence-independent amplification for samples containing ultra-low amounts of viral RNA coupled with Illumina sequencing and de novo assembly optimized for viral genomes. With 5 million reads, we capture 96 to 100% of the viral protein coding region of HIV, respiratory syncytial and West Nile viral samples from as little as 100 copies of viral RNA. The methods presented here are scalable to large numbers of samples and capable of generating full or near full length viral genomes from clone and clinical samples with low amounts of viral RNA, without prior sequence information and in the presence of substantial host contamination.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22962364 PMCID: PMC3592391 DOI: 10.1093/nar/gks794
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Composition of sequence data and de novo assembly statistics
| Sample | Sample ID | Virus | Copies viral RNA used | Version | Reads aligning to viral referenceb (%) | rRNAc (%) | Hostd (%) | CDS covered by all contigse (%) | Average coverage in target region | Genes intactf |
|---|---|---|---|---|---|---|---|---|---|---|
| NL4-3 | D615 | HIV | 10 000 | 1 | 67.1 | 0.3 | 3.5 | 100 | 36 021 | 9 |
| Clinical sample A | D614 | HIV | 10 000 | 1 | 7.1 | 32.3 | 32.5 | 100 | 3869 | 9 |
| Clinical sample A | D613 | HIV | 1 000 | 1 | 6.5 | 25.9 | 28.8 | 96 | 3489 | 7 |
| Clinical sample B | D616 | HIV | 800 | 1 | 5.9 | 18.6 | 18.9 | 100 | 3109 | 9 |
| Clinical sample C | D617 | HIV | 200 | 1 | 2.2 | 17.5 | 12.8 | 100 | 965 | 9 |
| NL4-3 | D619 | HIV | 10 000 | 2 | 68.7 | 0.6 | 4.6 | 100 | 38 725 | 9 |
| Clinical sample B | D620 | HIV | 800 | 2 | 1.1 | 16.8 | 18.2 | 100 | 661 | 9 |
| Clinical sample C | D621 | HIV | 200 | 2 | 0.4 | 14.8 | 8.8 | 97 | 233 | 8 |
| Clinical sample B | G15482 | HIV | 200 | 2 | 1.3 | 18.2 | 27.9 | 100 | 647 | 9 |
| Clinical sample B | G15480 | HIV | 100 | 2 | 1.7 | 17.5 | 28.0 | 99 | 385 | 8 |
| WNV clone | G15493 | WNV | 10 000 | 2 | 31.1 | 0.11 | 47.6 | 100 | 14 822 | 10 |
| WNV clone | G15494 | WNV | 1500 | 2 | 14.5 | 0.09 | 59.1 | 100 | 6925 | 10 |
| WNV clone | G15495 | WNV | 1000 | 2 | 14.3 | 0.09 | 59.1 | 99 | 6800 | 9 |
| WNV clone | G15496 | WNV | 750 | 2 | 13.7 | 0.08 | 59.1 | 100 | 6594 | 10 |
| WNV clone | G15497 | WNV | 500 | 2 | 13.9 | 0.10 | 59.8 | 100 | 6681 | 10 |
| WNV clone | G15498 | WNV | 250 | 2 | 14.3 | 0.08 | 59.0 | 100 | 6786 | 10 |
| WNV clone | G15499 | WNV | 150 | 2 | 15.1 | 0.09 | 58.7 | 100 | 7253 | 10 |
| WNV clone | G15500 | WNV | 100 | 2 | 13.8 | 0.09 | 59.4 | 100 | 6576 | 10 |
| Clinical sample 1 | V6100 | RSV | 30 470 | 2 | 16.6 | 31.8 | 37.0 | 100 | 5599 | 10 |
| Clinical sample 2 | V6103 | RSV | 1795 | 2 | 10.1 | 36.7 | 26.5 | 100 | 3386 | 10 |
aNuGEN’s Ovation RNA-Seq version 1 or 2 system. bFor HIV, the viral reference genome used was HXB2. For WNV, the viral reference genome used was NY99. For RSV, the viral reference genome used was RSV A2. cPercent of reads aligning to both cytoplasmic and mitochondrial rRNA. For HIV and RSV, human rRNA sequences were used. For WNV, hamster rRNA sequences were used. dFor HIV and RSV, the host used was human. For WNV, the host used was hamster. eThose samples with 100% genome covered were in a single contig except V6103 which was covered in two contigs. Those with less than 100% were covered in two contigs except D613 which was covered in three contigs. fFor HIV, the total number of genes is 9. For WNV and RSV, the total number of genes is 10.
Figure 1.Complete sequence coverage of viral coding region. (A) Reproducibility of read coverage for technical replicates for HIV clone and clinical and WNV clone samples. (B) Comparison of read coverage for HIV clone and clinical samples between Ovation RNA-Seq version 1 (red) and version 2 (blue) systems. Reads were aligned to the CDS of the relevant viral reference using Mosaik. Coverage was computed as the total number of reads covering a given residue and was normalized by the total coverage summed across all residues; at each residue, the coverage was divided by the total coverage and sum of normalized coverage equals one.
Comparison of HIV and WNV technical replicate assemblies
| Samples | Aligned bases in CDS | Assembly identity | Composition mismatchesb | Indel events | Indel bases (composition)c | Composition identity (%) |
|---|---|---|---|---|---|---|
| HIV clinical sample B (100 copies) | 8 379 | 98.90 | 18 | 2 | 18 (12–6) | 99.76 |
| HIV clinical sample B (200 copies) | 8 523 | 99.26 | 15 | 3 | 9 (3–3–3) | 99.79 |
| HIV clone (NL4-3) | 8 478 | 99.94 | 0 | 2 | 5 (4–1) | 99.98 |
| WNV clone (100 copies) | 10 303 | 100.00 | 0 | 0 | 100.00 | |
| WNV clone (250 copies) | 10 303 | 99.98 | 0 | 0 | 100.00 |
aNucleotide % identity between assemblies of replicates. bNumber of mismatches between assemblies that were not supported by read data (see Materials and Methods). cNumber outside the parentheses is the total number of indel bases. Number within the parentheses refers to size of individual indels with a hyphen between each occurrence.
Comparison between Ovation RNA-Seq-Illumina and RT-PCR-454 assemblies
| Samples | Sample ID | Aligned bases | Assembly identity | Composition mismatchesb | Indel events | Indel bases (composition)c | Composition identity (%) |
|---|---|---|---|---|---|---|---|
| Clinical sample A | D614 | 8 642 | 98.25 | 45 | 6 | 24 (3–3–3–3–6–6) | 99.41 |
| HIV clone (NL4-3) | D618 | 8 627 | 100.00 | 0 | 0 | 0 | 100.00 |
aNucleotide % identity between Ovation RNA-Seq-Illumina and RT-PCR-454 assemblies. bNumber of mismatches between assemblies that were not supported by read data (see Materials and Methods). cNumber outside the parentheses was the total number of indel bases. Number within the parentheses refers to size of individual indels with a hyphen between each occurrence.