| Literature DB >> 32664593 |
Melanie Hiltbrunner1, Gerald Heckel1,2.
Abstract
Research on the ecology and evolution of viruses is often hampered by the limitation of sequence information to short parts of the genomes or single genomes derived from cultures. In this study, we use hybrid sequence capture enrichment in combination with high-throughput sequencing to provide efficient access to full genomes of European hantaviruses from rodent samples obtained in the field. We applied this methodology to Tula (TULV) and Puumala (PUUV) orthohantaviruses for which analyses from natural host samples are typically restricted to partial sequences of their tri-segmented RNA genome. We assembled a total of ten novel hantavirus genomes de novo with very high coverage (on average >99%) and sequencing depth (average >247×). A comparison with partial Sanger sequences indicated an accuracy of >99.9% for the assemblies. An analysis of two common vole (Microtus arvalis) samples infected with two TULV strains each allowed for the de novo assembly of all four TULV genomes. Combining the novel sequences with all available TULV and PUUV genomes revealed very similar patterns of sequence diversity along the genomes, except for remarkably higher diversity in the non-coding region of the S-segment in PUUV. The genomic distribution of polymorphisms in the coding sequence was similar between the species, but differed between the segments with the highest sequence divergence of 0.274 for the M-segment, 0.265 for the S-segment, and 0.248 for the L-segment (overall 0.258). Phylogenetic analyses showed the clustering of genome sequences consistent with their geographic distribution within each species. Genome-wide data yielded extremely high node support values, despite the impact of strong mutational saturation that is expected for hantavirus sequences obtained over large spatial distances. We conclude that genome sequencing based on capture enrichment protocols provides an efficient means for ecological and evolutionary investigations of hantaviruses at an unprecedented completeness and depth.Entities:
Keywords: de novo assembly; evolutionary history; hantavirus phylogeny; high-throughput deep sequencing; hybrid sequence capture; rodent-borne viruses; targeted enrichment; virus genomes
Mesh:
Year: 2020 PMID: 32664593 PMCID: PMC7412162 DOI: 10.3390/v12070749
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Figure 1(A) Map of Europe showing the origin of the novel and previously published Tula (TULV) (T#) and Puumala (PUUV) (P#) genome sequences analyzed in this study (for details see Table S1). The insert shows a zoom into the contact region between two phylogenetic clades in TULV from where two double-infected individuals (T1/11 and T2/12) were sequenced (see [26]). (B) Phylogenetic tree based on the concatenated complete coding nucleotide sequence (CDS) of TULV and PUUV. Posterior probabilities of Bayesian analyses are given for major nodes. Sequence labels follow Table S1.
List of TULV and PUUV high-throughput sequencing-based genome assemblies analyzed in this study. The number of capture rounds (zero: shotgun; one or two enrichment steps), the sequencing instrument used, number of total sequence reads, unique sequence reads (number of reads after filtering duplicates), mapped virus reads (deduplicated reads post-mapping) and the average genome sequence depth are shown. Shotgun sequence read data are from [26].
| Capture | Sequencer | Total Reads | Unique Reads | Virus Reads | Mean Sequence Depth | |
|---|---|---|---|---|---|---|
|
| ||||||
| T10 MarCzKa04 | 0 | HiSeq | 30,240,364 | 30,239,770 | 1737 | 9 |
| T9 MarDGb22 | 0 | HiSeq | 43,470,283 | 43,469,619 | 1781 | 8 |
| T18 MarDKbB31 | 0 | HiSeq | 92,841,279 | 92,824,579 | 26,721 | 48 |
| T19 MarDPf01 | 0 | HiSeq | 65,793,875 | 65,787,860 | 12,006 | 34 |
| T8 MarDRb01 | 0 | HiSeq | 57,580,789 | 57,571,558 | 16,077 | 38 |
| T13 MarDHg01 | 0 | MiSeq | 11,563,256 | 11,563,254 | 1539 | 30 |
| T3 MarDDh05 | 0 | MiSeq | 12,157,472 | 12,157,461 | 1712 | 34 |
| T4 MarCzHo09 | 0 | MiSeq | 10,363,464 | 10,363,453 | 1850 | 38 |
| T5 MarCzJe04 | 0 | MiSeq | 12,634,308 | 12,634,294 | 1974 | 38 |
| T15 MarDSp01 | 0 | MiSeq | 2,286,266 | 2,286,252 | 4766 | 97 |
| T14 MarDOt03 | 0 | MiSeq | 11,907,702 | 2,286,176 | 4682 | 91 |
| T17 MarDSq15_1 1 | 0 | HiSeq | 53,177,359 | 53,177,238 | 583 | 4 |
| T17 MarDSq15_2 1 | 1 | HiSeq | 25,564,658 | 22,901,877 | 515,194 | 6238 |
| T6 MarCzGr07 | 1 | HiSeq | 13,928,878 | 10,770,028 | 62,939 | 771 |
| T2/T12 MarDSu08_1 1 | 1 | HiSeq | 16,711,898 | 12,660,735 | 184,258 | 2249 |
| T2/T12 MarDSu08_2 1 | 2 | HiSeq | 28,180,378 | 1,069,556 | 334,893 | 4068 |
| T2/T12 MarDSu08_3 1 | 1 | MiSeq | 1,812,673 | 1,812,673 | 19,035 | 378 |
| T1/T11 MagDEf02_1 1 | 2 | HiSeq | 30,333,274 | 1,168,089 | 734,578 | 8780 |
| T1/T11 MagDEf02_2 1 | 2 | HiSeq | 31,420,140 | 1,099,805 | 436,943 | 5267 |
| T16 MarCHEl42 | 2 | HiSeq | 19,306,058 | 1,666,013 | 1,066,770 | 12,896 |
|
| ||||||
| P9 MglDCr02 | 1 | MiSeq | 2,914,826 | 1,779,417 | 57,070 | 1692 |
| P11 MglDKe04 | 1 | MiSeq | 2,243,640 | 1,160,159 | 19,312 | 413 |
| P12 MglDKe05 | 1 | MiSeq | 2,702,868 | 1,374,479 | 12,927 | 247 |
| P7 MglLTU01 | 1 | MiSeq | 2,676,336 | 1,719,083 | 152,509 | 3004 |
1 Technical replicates.
Figure 2The average genome sequence depth significantly increases with the number of total sequence reads for libraries enriched using hybrid capture. Shotgun data are shown for comparison in black.
Figure 3Sliding window analyses of nucleotide diversity (Pi) and the average number of nucleotide substitutions per site (DXY, blue line) between 20 TULV and 12 PUUV hantavirus genomes (window size 100 nt, step size 25 nt). The coding region/open reading frame (ORF) is indicated in grey. Genomic landscapes were similar between TULV and PUUV for the M- and L-segment but differed in the non-coding region of the S-segment. The nucleotide diversity within both TULV (black line) and PUUV (green line) was lower in the region encoding the non-structural (NS) protein (blue area) compared to the rest of the genome.