| Literature DB >> 21960432 |
Linda Verhoef1, Kelly P Williams, Annelies Kroneman, Bruno Sobral, Wilfrid van Pelt, Marion Koopmans.
Abstract
The recognition of a common source norovirus outbreak is supported by finding identical norovirus sequences in patients. Norovirus sequencing has been established in many (national) public health laboratories and academic centers, but often partial and different genome sequences are used. Therefore, agreement on a target sequence of sufficient diversity to resolve links between outbreaks is crucial. Although harmonization of laboratory methods is one of the keystone activities of networks that have the aim to identify common source norovirus outbreaks, this has proven difficult to accomplish, particularly in the international context. Here, we aimed at providing a method enabling identification of the genomic region informative of a common source norovirus outbreak by bio-informatic tools. The data set of 502 unique full length capsid gene sequences available from the public domain, combined with epidemiological data including linkage information was used to build over 3,000 maximum likelihood (ML) trees for different sequence lengths and regions. All ML trees were evaluated for robustness and specificity of clustering of known linked norovirus outbreaks against the background diversity of strains. Great differences were seen in the robustness of commonly used PCR targets for cluster detection. The capsid gene region spanning nucleotides 900-1,400 was identified as the region optimally substituting for the full length capsid region. Reliability of this approach depends on the quality of the background data set, and we recommend periodic reassessment of this growing data set. The approach may be applicable to multiple sequence-based data sets of other pathogens.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21960432 PMCID: PMC3293504 DOI: 10.1007/s11262-011-0673-x
Source DB: PubMed Journal: Virus Genes ISSN: 0920-8569 Impact factor: 2.332
Bootstrap values, as derived from 100 runs of ML trees built in RAxML 7.0.4 [25], for clades without invading sequences or with a maximum of one invading sequence, and for different levels of resolution, i.e., genogroups, genotypes, variants, and outbreak events and bootstraps values were calculated for different fragment lengths within their optimal genomic region, and for several target regions in commonly applied genotyping protocols (color online)
Fig. 1Maximum likelihood tree for 502 unique full capsid gene sequences (nt positions 5,085–6,702 on the basis of X86557) from the FoodBorne Viruses in Europe database (http://www.rivm.nl/pubmpf/norovirus/database#/outbreaks/list) and Genbank. Clades are condensed (triangles) to the genotype level, assigned according to the publicly available typing tool http://www.rivm.nl/mpf/norovirus/typingtool. A nexus file is provided in electronic supplementary material 2, providing the possibility to see the tree in detail
Fig. 2Summary of performance of phylogeny-based typing of norovirus capsid gene sequences. Clade impurity scores were calculated for each of 3,075 ML trees built in RAxML 7.0.4 [25] and presented per center position of the window along all nt positions of the full capsid gene of the reference sequence AB220921. A score of 0 is optimal and indicates that all clades of a specific level do not show invading sequences within this sub-alignment tree, for example, all genotypes are correctly positioned together while separate from others. Scores >0 indicate that some of the minimal differentiating clades within levels in the sub-alignment tree contain invading sequences. Scores were calculated for six fragment lengths, which are indicated as window-100 to window-500, and with each fragment length represented by a different color, and calculated separately for genotypes (upper panel), variants (mid panel), and outbreak events (lower panel). Scores for the full capsid alignment were 0 for genotypes, 0.000759 for variants, and 0 for outbreak events. The different domains in ORF2 are depicted: the N-terminal domain, the Shell domain, the Protruding domain split up into P1 and P2. The norovirus particle is built from 180 copies of the capsid protein (90 dimers)
Specificity (%) of different genomic regions in clustering genogroups, genotypes, variants, and outbreak events as a group separated from others, as derived from bootstrapped ML trees
| Group |
| Full | Windows (center 1,150) | PCR regions | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1–1,620 (1,620 nt) full Capsid | 900–1,400 (500 nt) window-500 | 950–1,350 (400 nt) window-400 | 1,000–1,300 (300 nt) window-300 | 1,025–1,275 (250 nt) window-250 | 1,050–1,350 (200 nt) window-200 | 1,100–1,200 (100 nt) window-100 | 1–282 (282 nt) G2SK region C | 5–284 (280 nt) G1SK region C | 301–577 (277 nt) region E | 795–1,253 (459 nt) P2 domain | 1,372–1,585 (214 nt) CapC/D1/D3 region D | 1,439–1,581 (143 nt) CapA/B1/B2 region D | ||
| On the basis of criterion 1: pure branches with support values >70 | ||||||||||||||
| Genogroups | 4 | 100 | 75 | 75 | 50 | 50 | 25 | 25 | 25 | 50 | 50 | 75 | 75 | 75 |
| GI genotypes | 8 | 100 | 100 | 88 | 63 | 63 | 63 | 38 | 75 | 75 | 75 | 100 | 63 | 63 |
| GII genotypes | 20 | 100 | 95 | 90 | 70 | 80 | 65 | 45 | 100 | 100 | 80 | 95 | 63 | 75 |
| GIII genotypes | 2 | 100 | 100 | 100 | 50 | 50 | 50 | 100 | 50 | 50 | 100 | 100 | 50 | 0 |
| Variants | 12 | 83 | 67 | 58 | 50 | 42 | 50 | 25 | 33 | 33 | 17 | 58 | 42 | 17 |
| Outbreak events | 11 | 100 | 64 | 64 | 55 | 45 | 45 | 27 | 0 | 0 | 55 | 64 | 27 | 9 |
| On the basis of criteria 1 and 2: pure branches irrespective of support values | ||||||||||||||
| Genogroups | 4 | 100 | 75 | 75 | 75 | 75 | 50 | 75 | 50 | 75 | 75 | 75 | 75 | 75 |
| GI genotypes | 8 | 100 | 100 | 100 | 100 | 88 | 88 | 75 | 75 | 75 | 100 | 100 | 75 | 63 |
| GII genotypes | 20 | 100 | 100 | 95 | 85 | 80 | 75 | 70 | 100 | 100 | 95 | 95 | 100 | 95 |
| GIII genotypes | 2 | 100 | 100 | 100 | 50 | 50 | 100 | 100 | 50 | 50 | 100 | 100 | 50 | 50 |
| Variants | 12 | 83 | 75 | 83 | 75 | 58 | 67 | 67 | 67 | 33 | 42 | 75 | 58 | 50 |
| Outbreak events | 11 | 100 | 91 | 82 | 73 | 64 | 64 | 64 | 36 | 36 | 8 | 73 | 45 | 36 |
Nucleotide position was chosen from the reference strain AB220921, and nt position 1 of the capsid gene corresponds with nt position 5,085 of the GII-4 strain Lordsdale/1,995/UK (GenBank X86557)