| Literature DB >> 27421209 |
David J King1, Graham L Freimanis1, Richard J Orton2, Ryan A Waters1, Daniel T Haydon3, Donald P King4.
Abstract
Due to the poor-fidelity of the enzymes involved in RNA genome replication, foot-and-mouth disease (FMD) virus samples comprise of unique polymorphic populations. In this study, deep sequencing was utilised to characterise the diversity of FMD virus (FMDV) populations in 6 infected cattle present on a single farm during the series of outbreaks in the UK in 2007. A novel RT-PCR method was developed to amplify a 7.6kb nucleotide fragment encompassing the polyprotein coding region of the FMDV genome. Illumina sequencing of each sample identified the fine polymorphic structures at each nucleotide position, from consensus level changes to variants present at a 0.24% frequency. These data were used to investigate population dynamics of FMDV at both herd and host levels, evaluate the impact of host on the viral swarm structure and to identify transmission links with viruses recovered from other farms in the same series of outbreaks. In 7 samples, from 6 different animals, a total of 5 consensus level variants were identified, in addition to 104 sub-consensus variants of which 22 were shared between 2 or more animals. Further analysis revealed differences in swarm structures from samples derived from the same animal suggesting the presence of distinct viral populations evolving independently at different lesion sites within the same infected animal.Entities:
Keywords: Foot-and-mouth disease; Next generation sequencing; Phylogenetics; Variant analysis; Viral diversity
Mesh:
Substances:
Year: 2016 PMID: 27421209 PMCID: PMC5036933 DOI: 10.1016/j.meegid.2016.07.010
Source DB: PubMed Journal: Infect Genet Evol ISSN: 1567-1348 Impact factor: 3.342
Details of estimated lesion age for each of the infected animals, with information regarding the RNA extraction and MiSeq sequencing. The sequence data presented represents an average for each of the duplicate samples. The table includes changes in consensus sequence compared to the reference sequence - EU448371, with synonymous changes indicated in yellow, non-synonymous changes indicated in red. The table also includes the number of sub-consensus changes present in each sample.
aMean of two replicates. bAfter sickle trimming.
Fig. 1The average coverage distribution for the sequenced samples. For both PCR duplicate sets (with filtered, trimmed reads), average coverage values were between 5.79 × 102 and 1.00 × 104 nucleotides/site, with the mean of all samples being 6.44 × 103 nucleotides/site. Each sample is indicated by a different colour. Sample 147 – red; sample 161A – orange; sample 004 – yellow; sample 241 – green; sample 161B – dark blue; sample 341 – black, and sample 238 – grey.
Fig. 2Statistical parsimony analysis of consensus sequences generated in relation to the estimated lesion age of each sample. Grey triangles represent samples that were sequenced as part of this study, white triangles (i.e. UKG/95 and UKG/93) indicate samples from IP2b sequenced previously by Sanger methods (Valdazo-González et al., 2015). Black lines represent single nucleotide substitutions. Closest consensus sequences for viruses recovered from other farms infected in sequence (IP1b and IP5) are indicated.
A list of total synonymous and non-synonymous changes present only at sub-consensus (below 50% frequency) level for each coding and UTR region for all samples sequenced.
| Genome region | 5′UTR | Leader | VP4 | VP2 | VP3 | VP1 | 2A | 2B | 2C | 3A | 3B | 3C | 3D | 3′UTR | Total |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Length of region | 559 | 603 | 255 | 654 | 660 | 633 | 54 | 462 | 969 | 444 | 213 | 639 | 1410 | 86 | |
| Synonymous | 15 | 6 | 1 | 2 | 9 | 12 | 2 | 1 | 1 | 4 | 1 | 2 | 4 | 1 | 45 |
| Non-synonymous | 5 | 3 | 1 | 5 | 4 | 1 | 3 | 1 | 3 | 0 | 2 | 15 | 43 | ||
| Total number of variants | 15 | 11 | 4 | 3 | 14 | 16 | 3 | 4 | 2 | 7 | 1 | 4 | 19 | 1 | |
| % (unique sites/region length) | 2.68% | 1.82% | 1.57% | 0.46% | 2.12% | 2.53% | 5.55% | 0.87% | 0.21% | 1.58% | 0.47% | 0.63% | 1.35% | 1.16% |
Total number of synonymous or non-synonymous changes in the protein coding regions of the genome (excluding UTRs).
Fig. 3a: The distribution of variants across the FMDV genome. The variants belonging to each sample are indicated by a difference in colour. Sample 147 – red; sample 161A – orange; sample 004 – yellow; sample 241 – green; sample 161B – dark blue; sample 341 – black, and sample 238 – grey. b: A Circos histogram showing the relationship between FMDV consensus and sub-consensus variants. Only variants that were called by Lofreq in both duplicates and shared between 2 of more samples are represented here. The histogram is split into 7 sections, with each representing a different sample. Each genome region is represented by a different colour (light grey – 5′UTR, dark grey – leader, dark blue – VP4, dark green – VP2, red – VP3, dark purple – VP1, dark orange – 2A, light blue – 2B, light green – 2C, light orange – 3A, light purple – 3B, yellow – 3C, black – 3D, grey – 3′UTR). Variants which are shared between 2 or more samples are indicated with links, with the colour correlating to the coding region the variant is present in. The inner circles represent the variant frequencies across the genome. In pseudo log scale, the first gridline represents variants from 0% to 1%, the second gridline represents variants from 1% to 10%, the third gridline represents variants from 10% to 50% and the fourth gridline represents variants from 50% to 100%. For visual purposes, variants were scaled up, 0% to 1% was scaled up to 1%, 1% to 10% was scaled up to 10%, 10% to 50% was scaled up to 50% and 50% to 100% was scaled up to 100%.
Fig. 4A pairwise comparison of the relationships between consensus sequences (genomic distance) and shared variants in the viral swarm populations. Each of the samples was compared to the other samples in turn, giving a total of 42 data points. Firstly the genome distance (number of nucleotide substitutions in the consensus sequences) between the samples was calculated. Secondly, the observed number of shared variants between the samples was divided by the total number of observed variants in one of the samples (as a consequence, a different value was generated when comparing samples X and Y than when comparing sample Y to X, to address differences in depth of coverage between the samples).