Literature DB >> 28348850

Identifying copy number variation of the dominant virulence factors msa and p22 within genomes of the fish pathogen Renibacterium salmoninarum.

Ola Brynildsrud1, Snorre Gulla2, Edward J Feil3, Simen Foyn Nørstebø4, Linda D Rhodes5.   

Abstract

Renibacterium salmoninarum is the causative agent of bacterial kidney disease, an important disease of farmed and wild salmonid fish worldwide. Despite the wide spatiotemporal distribution of this disease and habitat pressures ranging from the natural environment to aquaculture and rivers to marine environments, little variation has been observed in the R. salmoninarum genome. Here we use the coverage depth from genomic sequencing corroborated by real-time quantitative PCR to detect copy number variation (CNV) among the genes of R. salmoninarum. CNV was primarily limited to the known dominant virulence factors msa and p22. Among 68 isolates representing the UK, Norway and North America, the msa gene ranged from two to five identical copies and the p22 gene ranged from one to five copies. CNV for these two genes co-occurred, suggesting they may be functionally linked. Isolates carrying CNV were phylogenetically restricted and originated predominantly from sites in North America, rather than the UK or Norway. Although both phylogenetic relationship and geographical origin were found to correlate with CNV status, geographical origin was a much stronger predictor than phylogeny, suggesting a role for local selection pressures in the repeated emergence and maintenance of this trait.

Entities:  

Keywords:  copy number variation; gene duplication-amplification; major soluble antigen; p22; renibacterium salmoninarum

Mesh:

Substances:

Year:  2016        PMID: 28348850      PMCID: PMC5320689          DOI: 10.1099/mgen.0.000055

Source DB:  PubMed          Journal:  Microb Genom        ISSN: 2057-5858


Data Summary

1. The sequence data for all isolates used in this study is available for download from http://trace.ncbi.nlm.nih.gov/Traces/sra/?study=ERP003780

Impact Statement

This article identifies expansive duplication of the genes encoding the dominant virulence factors msa and p22 in the fish pathogen Renibacterium salmoninarum, the organism responsible for bacterial kidney disease of salmonid fish. R. salmoninarum is a highly clonal bacterium with a very limited accessory genome, and although duplication of msa is already known as a concept, this study extends the finding to p22, the other major surface protein. The number of identical gene copies may in some cases be as high as five. The data suggest multiple independent duplication events that appear to be much more common in strains circulating in the Pacific Northwest region of North America, pointing to local selection pressures as important for the repeated emergence. Gene copy number variation in bacteria is probably severely underreported, and there are very few reports on the regional distribution of the phenomenon. It is hoped that the findings and methodology presented in this article may serve to fuel the interest in performing gene copy number variation studies, as the mechanism is increasingly being seen as more frequent and phenotypically important than previously believed.

Introduction

Renibacterium salmoninarum is the causative agent of bacterial kidney disease (BKD) in cultured and wild salmonid fish. BKD can result in acute morbidity or mortality, or it can be a slowly progressive disease causing an often dramatic decline in growth. BKD is economically important in aquaculture, where it can spread horizontally throughout sea pens of juvenile and subadult Atlantic salmon (Salmo salar) (Murray ) or vertically through transferred broodstock or eggs (Evelyn ). It is also a concern for conservation and restoration efforts for endangered fish stocks because infections are prevalent among more susceptible free-ranging Pacific salmon in river and marine systems (Pascho ; Rhodes ; Sandell ). Although the pathogenicity of R. salmoninarum is incompletely understood, several antigenic determinants have been described, including the dominant immunogenic protein major soluble antigen (MSA) (Turaga ; Wiens and Kaattari, 1991), an abundant heat-stable 57 kDa extracellular protein that makes up 60–70 % of all surface proteins in R. salmoninarum (Fredriksen ; Wood and Kaattari, 1996), and is involved in immunosuppression (Brown ; Fredriksen ; Turaga ), agglutination (Senson & Stevenson, 1999; Wiens ; Wiens and Kaattari, 1991) and virulence (Coady ; O’Farrell & Strom, 1999; Senson & Stevenson, 1999). Other antigenic determinants include capsular synthesis, heme acquisition operons, haemolysins and an immunosuppressive 22 kDa surface protein provisionally named p22 (Fredriksen ). The p22 gene encodes a poorly described loosely associated surface protein (Fredriksen & Bakken, 1994) that has been implicated in suppression of antibody production and a stronger agglutination of leucocytes than that which is seen for the MSA protein (Fredriksen ). The genome of the type strain of R. salmoninarum, ATCC 33209T, contains two identical transcriptionally active copies of the MSA-encoding gene: msa1 and msa2 (O’Farrell & Strom, 1999; Rhodes ). Both genes are essential for the development of clinical disease and mortality (Coady ). Whilst it seems certain that a single copy was originally acquired through horizontal gene transfer and subsequently duplicated within the bacterial genome (Wiens ), the origin of this gene is unclear, as no homologue to the msa gene has ever been found in any other sequenced genome. Both msa loci are flanked by insertion sequences and transposases, and msa2 is additionally flanked by several degraded genes related to conjugation (including traA relaxase, type IV secretion protein and site-specific recombinase resolvase). Because multiple copies of identical genes are unusual in bacterial genomes, O’Farrell & Strom (1999) suggested that multiple msa copies might confer a selective advantage. Subsequently, Rhodes ) demonstrated the presence of a third copy in some isolates, and provided clear evidence for a positive correlation between msa copy number and mortality at lower infection doses. The gene content variation of this species appears to be exceptionally low, with core- and pan-genomes reported to be very similar even for strains sampled over 50 years from a wide range of habitats (Brynildsrud ). However, this does not include paralogues, and the findings of Rhodes et al. (2004) suggest that copy number variation (CNV) in the msa genes of R. salmoninarum has phenotypic relevance. Gene duplication has been shown to be adaptive in bacteria (Riehle ), and CNV is known to be an important mechanism for dose variation of specific proteins under appropriate environmental conditions (Stranger ). As an example, a recent study demonstrated that some strains of Mycobacterium tuberculosis harboured a large, tandem gene duplication and noted greater expression of an anaerobic survival regulon that is contained within the duplication (Domenech ). The aim of the present study was to screen a diverse collection of R. salmoninarum isolates for evidence of CNV in any of the core genes and, if found, to investigate phylogenetic and spatial patterns of the distribution of genetic variants. This work can provide a better understanding of Renibacterium microevolution and may shed light on the mechanisms of differential disease manifestation in different populations.

Methods

Computational analyses

Sixty-eight isolates whose spatial and temporal origins varied widely were sequenced on an Illumina GAII platform at The Genome Analysis Centre (TGAC), Norwich, UK, as part of a previous effort by the authors, and are available at the Sequence Read Archive of the National Center for Biotechnology Information (NCBI) under the accession numbers listed in Table 1. Non-pairing reads, reads containing ambiguous characters and reads with an average PHRED score of <20 were discarded before alignment to reference genome ATCC 33209 (available from NCBI GenBank under accession number NC010168) with Geneious v7.1 (Biomatters), using the option to randomly map reads with multiple best hits.
Table 1.

R. salmoninarum isolates screened for copy number variation

Sample IDHostsw/fwf/wOriginYearAlternative IDEBI accession no.
MT1351S. salarswfScottish Highlands, UK1993ERR327904
Carson 5b† O. tshawytschafwfTyee Creek/Wind River, USA1994ERR327905
05372K*O. tshawytschaswfGrande Ronde Basin, USA2005ERR327906
NCIMB 1116S. salarfwwRiver Dee, UK196296056ERR327907
NCIMB 1114S. salarfwwRiver Dee, UK19625005ERR327908
MT1880S. salarswfStrathclyde, UK1996ERR327909
MT1470O. mykissfwfTayside, UK1994ERR327910
NCIMB 2235O. tshawytschaswfOregon, USA1974ATCC 33209ERR327911
9025O. mykissfwfYorkshire, UK200916251-1ERR327912
MT239S. salarScotland, UK1988ERR327913
MT1511O. mykissfwfStrathclyde, UK1994ERR327914
Cow-chs-94* O. tshawytschafwCowlitz River, USA1994GR 16ERR327915
MT444S. salarswfWestern Isles, UK1988ERR327916
MT839S. salarswfScottish Highlands, UK1990ERR327917
MT452O. mykissfwfDumfries and Galloway, UK1988ERR327918
MT861S. salarswfScotland, UK1990ERR327919
MT1363O. mykissswfStrathclyde, UK1993ERR327920
99333O. mykissfwfWales, UK1998980036-102ERR327921
MT1262S. salarfwfScottish Highlands, UK1992ERR327922
5007O. mykissScotland, UK20050180-18ERR327923
MT3313O. mykissfwfCentral Scotland, UK2008ERR327925
MT3277O. mykissfwfDumfries and Galloway, UK2008ERR327926
96071O. mykissfwfHampshire, UK1996TEST VALLEY FDLERR327927
MT3315O. mykissfwfStrathclyde, UK2008ERR327928
MT2622O. mykissswfStrathclyde, UK2002ERR327929
1205O. mykissfUK20013104-67ERR327930
99327O. mykissfwfUK1997970313-2ERR327931
7105O. mykissfUK2007P0416 T83 10-3 2ERR327932
MT3479S. salarswfOrkney, UK2008ERR327933
MT3482S. salarswfStrathclyde, UK2009ERR327934
MT2979O. mykissfwfScottish Highlands, UK2005ERR327935
MT2943S. salarswfScottish Highlands, UK2005ERR327936
99329O. mykissfwfWales, UK1998980036-125ERR327937
99326O. mykissfwfWales, UK19992119-8ERR327938
MT3106O. mykissfwfStrathclyde, UK2006ERR327939
99344O. mykissfwfHampshire, UK1998980106-1.1.5ERR327940
MT3483S. salarswfStrathclyde, UK2009ERR327941
5006†O. kisutchswfBella Bella, Canada1996960046ERR327942
99332O. mykissfwfWales, UK19992119-3ERR327943
Rs 8S. salarswfNew Brunswick, Canada2008ERR327944
Rs 10* S. salarswfNew Brunswick, Canada2009ERR327945
Rs 4S. salarswfNew Brunswick, Canada2006ERR327946
Rs 3S. salarfwfNew Brunswick, Canada2005ERR327947
99345O. mykissfwfWales, UK1998980070-18ERR327948
99341O. mykissfwfHampshire, UK1998980109-20ERR327949
Rs 5S. salarswfNew Brunswick, Canada2007ERR327950
Rs 2* S. salarswfNew Brunswick, Canada2005ERR327951
BPS 91* O. gorbuschaNanaimo, Canada1991ERR327952
Rs 6* S. salarswfNew Brunswick, Canada2007ERR327953
DR143S. fontinalisfwwAlberta, Canada1972GR 17ERR327954
6553S. salarswfHemne, Norway20082008-09-495ERR327955
6642S. salarfHemne, Norway20082008-06-633ERR327956
Car 96O. tshawytschaWashington, USA1996ERR327957
684S. truttafwfAurland, Norway1987ERR327958
GR5* T. thymallusfwwMontana, USA1997980036-87ERR327959
WR99 c2O. kisutchWashington, USA1999ERR327960
D6O. tshawytschaOregon, USA1982ERR327961
6694O. mykissswfHemne, Norway2008ERR327962
BQ96 91-1* O. kisutchNanaimo, Canada1996ERR327963
5223* S. salarswfKvinnherad, Norway20052005-50-579ERR327964
6863O. mykissswfOsterøy, Norway2009ERR327965
7441S. salarfStorfjord, Norway19851985-09-667ERR327966
7450S. salarfAskøy, Norway19871987-09-1185ERR327967
6695O. mykissswfHemne, Norway20082008-06-631ERR327968
7449S. salarfSkjervøy, Norway19871987-09-932ERR327969
7448S. salarfStranda, Norway19861986-09-4366ERR327970
7439S. salarfSognefjorden, Norway19841984-40.992ERR327971
5004UnknownUSA1960sNCIMB 1111ERR327924
ATCC 33209‡O. tshawytschaswfOregon, USA1974NC_010168.1

sw/fw saltwater/freshwater habitat; f/w farmed/wild fish origin.

*Duplication in msa-p22k.

†Other gene duplication.

‡Type strain. Sequence data downloaded from Genbank.

sw/fw saltwater/freshwater habitat; f/w farmed/wild fish origin. *Duplication in msa-p22k. †Other gene duplication. ‡Type strain. Sequence data downloaded from Genbank. CNVs were discovered using the R (R Development Core Team, 2012) package CNOGpro (Brynildsrud ) with the following parameters: coverage counted in sliding windows of length 50 bp, prior probability of changing states (for each read count observation) was set to p=1.0×10–10 and the error-rate parameter was set to 0.01. The runHMM method was used to call CNV regions and copy numbers were considered correct if they agreed with credible intervals (percentiles 1–99) from the runBootstrap method. When evaluating results we discarded IS994 tallies, as 69 copies (69 orfA and 67 orfB) of this element are known to exist in the reference genome (Wiens ), making it impossible to evaluate copy number variation with our method. We also considered standalone CNV calls in segments shorter than 300 bp as unreliable, as such calls could happen from chance alone (Brynildsrud ). When quantifying total msa enrichment, the signal from msa1 and msa2 were added together, and the relative frequencies were inferred by inspecting the signal from the hypothetical protein-encoding gene p12 (Fig. S1, available in the online Supplementary Material). These results were corroborated using real-time quantitative PCR (qPCR) on selected isolates with different copy number multiplicities (detailed in the Supplementary Material). CNV distribution by phylogeny and geography. Phylogenetic tree revealing patterns of CNV distribution. Horizontal branches represent patristic distances, and isolates are coloured according to their origin. Purple stars indicate CNV in the msa and p22 genes, while an olive star represents CNV in other genes. The most probable copy number of each of the msa (types I and II), p12 and p22 genes is shown on the extreme right. ATCC 33209 is marked with double dagger symbol because it is a duplicate of NCIMB 2235 and thus not counted separately when tallying CNV frequencies. The inter-lineage distance has been truncated and represents one third of the actual distance. ‘NB’ represents isolates from New Brunswick, Canada, and ‘PNW’ represents isolates from British Columbia, Canada, as well as Washington, Oregon and Montana, all USA. The isolate with unknown North American origin and the one from Alberta, Canada, has been labelled ‘Other’. Adapted from Brynildsrud .

Regression analysis

The presence or absence of msa CNV in an isolate was considered a binary trait, and associations between this trait and year of isolation, host species and saltwater/freshwater habitat were investigated by logistic regression, both bivariable and multivariable with interaction terms using R.

Cluster analysis through matrix correlation

Phylogenetic trees were created from single nucleotide polymorphism alignments with the program MrBayes (Ronquist & Huelsenbeck, 2003) (see Supplementary Material). Pairwise patristic distances between isolates were calculated as the sum of branch lengths between leaf pairs of the consensus tree. Pairwise geodesic distances between isolates' geographical origins were calculated by solving for central angle in the spherical law of cosines and multiplying by the radius of the Earth. The latitude–longitude coordinates were rounded to the nearest degree. In some cases the exact sample origin was not known, so the coordinate pair was set to represent geographical midpoints for the sub-national region. To test for phylogenetic and spatial clustering of CNV presence/absence, we created a binary matrix where equal CNV statuses of isolate pairs were coded as 1 and unequal as 0. In this analysis we regarded isolates with asterisks listed in Table 1 as positive for the duplication in question and the remaining isolates as negative. We then adopted a Mantel test-like approach by performing the Mann–Whitney U test of equal distributions between groups defined by CNV status on patristic/geodesic distance data. This test estimator was subsequently compared with those obtained from 10 000 random permutations of the CNV status matrix. The trait was considered to be phylogenetically or spatially clustered if the test estimator fell below the lower 1-percentile limit in the distribution of permuted data set estimators.

Results

Overall, very little CNV was seen in our isolates. In fact, the coverage data of most isolates (57/68) indicated no variation at all. This finding is consistent with previous reports of a high degree of sequence conservation in the R. salmoninarum genome. Nevertheless, CNV was found in 11 isolates, shown in Fig. 1.
Fig. 1.

CNV distribution by phylogeny and geography. Phylogenetic tree revealing patterns of CNV distribution. Horizontal branches represent patristic distances, and isolates are coloured according to their origin. Purple stars indicate CNV in the msa and p22 genes, while an olive star represents CNV in other genes. The most probable copy number of each of the msa (types I and II), p12 and p22 genes is shown on the extreme right. ATCC 33209 is marked with double dagger symbol because it is a duplicate of NCIMB 2235 and thus not counted separately when tallying CNV frequencies. The inter-lineage distance has been truncated and represents one third of the actual distance. ‘NB’ represents isolates from New Brunswick, Canada, and ‘PNW’ represents isolates from British Columbia, Canada, as well as Washington, Oregon and Montana, all USA. The isolate with unknown North American origin and the one from Alberta, Canada, has been labelled ‘Other’. Adapted from Brynildsrud .

A complete list of all CNVs discovered in this study can be found in Table 2. In total, there were nine distinct CNV regions. Four of these were unique to the Carson5b isolate and two to isolate 5006. The remaining CNVs were non-unique and occurred jointly (i.e. the presence of one CNV type also implied the presence of the others) in all 11 CNV isolates. Among these were duplications of the genes encoding the primary surface surface proteins of R. salmoninarum: the msa gene and a 22 kDa hypothetical protein (hereafter referred to as p22).
Table 2.

Copy number estimates in CNV isolates.Duplicated genes with the copy number and 95% confidence intervals (95% CI). The most probable copy number is based on the most common local copy number state from the Hidden Markov Model method. The individual msa gene copy numbers could not be differentiated and have been merged. All results are from CNOGpro (Brynildsrud et al., 2015)

IsolateGeneCopy number95% CI Most probable
5006msa11.11.0–1.22
msa21.21.1–1.3
p120.80.6–1.01
p221.21.0–1.41
2 974 628 to 3 084 569 (segmental duplication)1.41.4–1.42
3 088 016 to 3 100 482 (segmental duplication)1.61.5–1.62
5223msa12.32.1–2.44
msa22.12.0–2.3
p122.82.4–3.13
p223.83.3–4.34
05372Kmsa12.52.3–2.65
msa22.42.2–2.7
p122.92.5–3.33
p224.33.9–4.74
BQ96_91msa11.81.7–1.94
msa21.81.6–1.9
p122.92.5–3.33
p221.81.4–2.12
BPS91msa12.01.9–2.14
msa22.11.9–2.3
p122.52.4–2.82
p222.62.4–2.83
Carson5bmsa12.11.9–2.44
msa22.11.9–2.4
p122.11.7–2.72
p225.34.2–6.45
Rsal33209_0109 (lacI family trans. reg.)1.61.2–2.12
Rsal33209_1458 (NADH-dep. flav. oxidored.)1.41.0–2.02
Rsal33209_2607 (ferredox. NADH red.)1.61.3–2.02
Rsal33209_3193 (hypothetical protein)1.91.4–2.42
Cow-Chs-94msa11.71.5–1.83
msa21.61.5–1.8
p121.61.3–2.02
p222.21.9–2.52
GR5msa11.41.2–1.53
msa21.41.3–1.5
p122.42.3–2.62
p221.91.7–2.02
RS2msa11.51.4–1.73
msa21.51.4–1.7
p122.22.0–2.32
p221.71.3–2.12
RS6msa12.42.2–2.75
msa22.52.3–2.7
p122.62.0–3.13
p223.73.4–4.04
RS10msa11.61.5–1.73
msa21.51.4–1.7
p121.91.7–2.22
p221.91.7–2.22
The total number of msa copies in CNV-positive isolates ranged from two to five. This confirms the supposition that the minimum copy number of msa genes is two, as no isolate presented a read coverage that was suggestive of only a single copy. There were two different msa duplication types, for which we provisionally introduce the nomenclature ‘type I’ and ‘type II’. Type II was a subunit of type I, but the two can be differentiated by type II’s lack of a marker gene, p12 (a predicted gene annotated Rsal33209_1032). As this gene is only part of type I duplications, the relative frequency of the two types can be found by inspecting the coverage of the p12 gene (Fig. S1). Type I msa duplication included the msa gene, the p12 marker gene, the transposase-encoding Rsal33209_0133 and the inactivated insertion sequence (IS) sequence ISRs3, including all intergenic segments and flanking inverted IS994 sequences. Type I msa duplication thus very closely resembles the genomic region roughly between coordinates 110 000 and 115 000 in ATCC 33209, and is surely a duplication of the msa1 gene. Type II msa duplication included the msa gene with the intergenic sequence from the terminus of the gene and roughly 800 bp downstream, which resembles two different regions of ATCC 33209: coordinates 110 400 to 112 901 or 945 077 to 947 575 in ATCC 33209. We could therefore not determine whether type II duplications represent duplications of msa1 or msa2, and unfortunately read mapping proved unhelpful to investigate this. Although the msa1 and msa2 genes differ very slightly at upstream and downstream sites, the ORFs themselves are identical, and there are several large (130–180 bp) inverted and direct repeats plus one 91 bp perfect palindrome associated with the gene, confounding read mapping (Fig. 2). However, previous experiments have only found duplications of msa1 (Rhodes ). An msa1 origin must also be suspected for our data due to the fact that the traA.2 gene neighbouring msa2 was not duplicated in any isolates.
Fig. 2.

Duplication maps with gene dotplot. (a) Schematic view of the three major CNV regions discovered in the current study. (b) Genome dot plot of the major (type I) and minor (type II) msa duplication units to itself, showing repeat regions and palindromic sequence. Solid lines represent a minimum of 85% sequence identity. DR, direct repeat; IR, inverted repeat. The 91 bp palindrome encodes a predicted rho-independent terminator with a central loop polymorphism between msa1 and msa2. The polymorphism is located 37 bp downstream of the msa ORF. We could not resolve the correct orientation of this segment in duplications, and the polymorphism is therefore labelled by the ambiguity character S (C/G).

Duplication maps with gene dotplot. (a) Schematic view of the three major CNV regions discovered in the current study. (b) Genome dot plot of the major (type I) and minor (type II) msa duplication units to itself, showing repeat regions and palindromic sequence. Solid lines represent a minimum of 85% sequence identity. DR, direct repeat; IR, inverted repeat. The 91 bp palindrome encodes a predicted rho-independent terminator with a central loop polymorphism between msa1 and msa2. The polymorphism is located 37 bp downstream of the msa ORF. We could not resolve the correct orientation of this segment in duplications, and the polymorphism is therefore labelled by the ambiguity character S (C/G). It remains unknown whether msa loci are differentially regulated. Using the terminator prediction tool ARNold (Naville ) and the RibEx riboswitch explorer (Abreu-Goodger & Merino, 2005), we discovered that the palindrome at the 3' of the msa ORF contained a predicted rho-independent terminator/riboswitch-like element at both the msa loci, although with ‘G’ as the central loop nucleotide for msa1 and ‘C’ for msa2, opening the possibility for riboswitch-mediated regulation (Fig. 2). The third non-unique CNV region matched the region between coordinates 2 965 759 and 2 967 751 in ATCC 33209. This region is flanked by inverted IS elements and contains a single ORF, encoding the p22 protein (a 22 kDa hypothetical protein labelled RSal33209_3334). Also part of the duplication unit was the intergenic segments on both sides of this ORF. The total number of p22 copies in msa-duplicated isolates was estimated as ranging from two to five.

Trait clustering

The msa–p22 duplication trait did not correlate with year of isolation, host species or saltwater/freshwater habitat. However, a strong geographical pattern was seen in the presence of gene duplication. CNV was absent in the exclusively European lineage 2 (lineage notation from Brynildsrud ), and limited to defined clusters within the widely distributed lineage 1A and the Pacific Northwest-associated lineage 1B. Among the 10 isolates containing additional copies of msa and p22 genes, six are from the Pacific Northwest, three are from Eastern North America (New Brunswick, Canada) and oneis from Norway, corresponding to 55, 43 and 8% of the total investigated isolates from each respective region. Notably, of the 36 UK isolates, not one displayed CNV (Fig. 3).
Fig. 3.

Geographical distribution of CNV isolates. Relative frequencies of CNV-positive isolates from each major sample region. Isolate origin has been truncated down to represent either Norwegian, UK, New Brunswick and Pacific Northwest (including the Canadian province of British Columbia, as well as the US states of Washington, Oregon and Montana), except for a single isolate from Alberta, Canada. At each location, the size of the pie chart represents the number of isolates. The red sectors and green sectors indicate the fraction of CNV-negative and CNV-positive isolates, respectively.

Geographical distribution of CNV isolates. Relative frequencies of CNV-positive isolates from each major sample region. Isolate origin has been truncated down to represent either Norwegian, UK, New Brunswick and Pacific Northwest (including the Canadian province of British Columbia, as well as the US states of Washington, Oregon and Montana), except for a single isolate from Alberta, Canada. At each location, the size of the pie chart represents the number of isolates. The red sectors and green sectors indicate the fraction of CNV-negative and CNV-positive isolates, respectively. To test whether CNV was clustered within different phylogenetically and spatially defined groups, we used Mantel correlation analyses (Fig. 1). For geodesic data, we found the Mann–Whitney U estimator to be 255 344, compared with the full range 432 002–515 531 from the permuted dataset, which translates to a p-value of <1.0×10–4 when calculated conservatively as in Diniz-Filho ). However, because the distribution of U values follows a near-perfect normal distribution (as calculated by the Anderson–Darling test of normality), a parametric p-value estimation of p<1.0×10–50 can be used (Fig. 4). In other words, CNV was strongly clustered into geographically defined groups. This can also happen because phylogenetically related isolates tend be spatially clustered, so we also investigated whether the pairwise patristic distances between isolates impacted the CNV. For these data, Mann–Whitney's U was computed as 419 090, which is also lower than the full range of all permuted-matrix values (424 489–525 215) (non-parametric p<1.0×10–4; Gaussian parameterization p=7.4×10–5). Although this implies association between patristic distance and CNV as well, the pairwise geodesic distance is a much stronger predictor of CNV status, implying that these gene duplications are primarily the result of local selection pressures.
Fig. 4.

Mantel correlation between CNV and phylogeny/geography. Mann–Whitney U test statistic distribution in the Mantel correlation analysis. Correlation is measured between pairwise patristic (a) and geodesic (b) distances to identical CNV status, measured as a binary trait. The vertical red line represents our observed statistic and the white boxes represent the histogram of the 10 000 permuted matrix-statistics. Note the Gaussian distribution of U values for both the patristic (a) and the geodesic (b) distance analyses. The increased distance between our observed U and the permuted matrix-U valuess in (b) indicates a more extreme correlation.

Mantel correlation between CNV and phylogeny/geography. Mann–Whitney U test statistic distribution in the Mantel correlation analysis. Correlation is measured between pairwise patristic (a) and geodesic (b) distances to identical CNV status, measured as a binary trait. The vertical red line represents our observed statistic and the white boxes represent the histogram of the 10 000 permuted matrix-statistics. Note the Gaussian distribution of U values for both the patristic (a) and the geodesic (b) distance analyses. The increased distance between our observed U and the permuted matrix-U valuess in (b) indicates a more extreme correlation.

Discussion

Although duplicates of genes encoding ribosomal and transfer RNA subunits are well known in bacteria, other gene duplication–amplification events are only now gaining attention and have probably been underreported in the literature (Andersson & Hughes, 2009; Elliott ). CNV in msa has been documented previously (Rhodes ), and the current study extends CNV to p12 and p22. No isolates were found to have CNV in one gene but not the others, suggesting that these genes have a functional interaction relationship in which increased copies of either are not valuable without concomitant copy number increases of the other, or that the genes are somehow duplicated together due to linkage. The latter possibility is perhaps somewhat marginalized by the known genomic distance between msa and p22, which in ATCC 33209 is around 300 000 bp between msa1 and p22, going through the origin. However, it is possible that these genes are more closely located in strains other than ATCC 33209. Although we have detected several large gene duplications, we have not been able to predict their relative orientation and distance to each other or to the rest of the chromosome. Wiens and Dale suggest a plasmid context of msa3, based on variable hybridization intensity in Southern blots, and another possible scenario could be the association of msa3 with a phage, as an unconfirmed observation of an R. salmoninarum phage was previously reported (Fryer & Lannan, 1993). Both msa1 and msa2 are flanked by inverted IS sequences, notably IS994, and IS3-like insertion sequences as well as other ORFs with high homology to transposable elements and transposases, suggesting that they could be transferred and integrated through recombination or transposition mechanisms, although duplications have only been documented in msa1 (Rhodes ) Ten of the 68 isolates screened in this study (∼15%) displayed an increased copy number msa, p12 and p22 genotype, in stark contrast to the 19 of 26 isolates (∼73%) that Rhodes ) found to be msa3-positive. In their paper, every isolate except two (MT239 and GL64, which are msa3-negative) were from the Pacific Northwest region of the USA, suggesting that the strains circulating in that particular region have a higher frequency of multiple-copy msa genotypes. A predominantly North American CNV distribution is also consistent with the findings of Wiens & Dale (2008), who observed the msa3 gene in North American but not in European isolates. In this study, the geographical origin was a much stronger predictor of CNV status than the inferred patristic distance from the phylogenetic tree. Isolates 05372K and Cow-Chs-94 for example are of lineage 1B origin, and thus thought to have diverged from lineage 1A isolates such as Carson5b between 100 and 700 years ago (Brynildsrud ). In spite of this these isolates all have duplications of the msa, p12 and p22 genes, a trait not shared by phylogenetic neighbours of these isolates. Note, however, that they are all sampled from fish originating from the Columbia River main basin, where multiple fish stocks co-occur. The fact that we observe this pattern of low intra-cluster but high inter-cluster patristic distances and that isolates originate from multiple geographical locations across North America (and, in a single case, Norway), sampled over a 19 year period from five different species of salmon from both freshwater and saltwater habitats, strongly suggests multiple independent introductions of the trait rather than simple inheritance. Importantly, gene copy numbers varied widely across isolates displaying CNV. This has a number of apparent implications. Firstly, it seems that two is the basic copy number of the msa gene, as this genotype was by far the most common across lineages and ecosystems, was the genotype of the oldest isolates, and no isolates contained fewer than two msa copies. The diverse duplication pattern thus points to a base number of two msa genes with subsequent copy number expansions as a more parsimonious explanation than higher-value msa copy number and subsequent gene loss. Secondly, this heterogeneous duplication pattern indicates locally restricted gene duplication–amplification events rather than prevailing ecotypes as an explanation for the geographical clustering of CNV. It is not clear to what extent the duplications we have found in the present work impact overall pathogen fitness. One possibility is that the observed duplications in fact represent selfish mobile genetic elements. However, this possibility contradicts the current understanding of the msa gene, as two copies have been proposed to confer selective advantage (O’Farrel & Strom, 1999). The immediate benefit of duplications could be through modulation of protein dosage under variable environmental conditions, while the long-term advantage is that the extra copies can, over time, accumulate mutations and evolve new functions (Conant & Wolfe, 2008; Kondrashov, 2012; Kondrashov ). In favour of a selectionist explanation is the observation that these duplications are seemingly not immediately removed from the population, but rather shared by related isolates and thus perhaps maintained in local populations. (Isolates 05372K and Cow-chs-94, for example, are closely related despite being from separate river systems and isolated 11 years apart, and they both have multiple duplications of the msa, p12 and p22 genes, although the exact numbers of each gene appear to vary.) Rhodes ) found that the presence of a third msa copy was clearly associated with increased mortality at lower, environmentally relevant doses. It is therefore tempting to speculate that the additional copies that we have found are increasingly beneficial to the bacterium. Such duplication–amplification events of immunomodulatory genes are now thought to be common under adaptation to new, extreme and variable environments (Elliott ), and these results point to a higher extent of such selection pressures in the Pacific Northwest than elsewhere. Our findings suggest that extra msa copies interact with the relatively unknown p22 protein, as the two were always duplicated together. The nature of this interaction remains unknown and more research is needed to conclusively determine the relative fitness- and virulence relationships between different duplication-value R. salmoninarum isolates.
  25 in total

1.  MrBayes 3: Bayesian phylogenetic inference under mixed models.

Authors:  Fredrik Ronquist; John P Huelsenbeck
Journal:  Bioinformatics       Date:  2003-08-12       Impact factor: 6.937

2.  CNOGpro: detection and quantification of CNVs in prokaryotic whole-genome sequencing data.

Authors:  Ola Brynildsrud; Lars-Gustav Snipen; Jon Bohlin
Journal:  Bioinformatics       Date:  2015-02-01       Impact factor: 6.937

3.  ARNold: a web tool for the prediction of Rho-independent transcription terminators.

Authors:  Magali Naville; Adrien Ghuillot-Gaudeffroy; Antonin Marchais; Daniel Gautheret
Journal:  RNA Biol       Date:  2011-01-01       Impact factor: 4.652

4.  Identification of Renibacterium salmoninarum surface proteins by radioiodination.

Authors:  A Fredriksen; V Bakken
Journal:  FEMS Microbiol Lett       Date:  1994-09-01       Impact factor: 2.742

5.  Both msa genes in Renibacterium salmoninarum are needed for full virulence in bacterial kidney disease.

Authors:  Alison M Coady; Anthony L Murray; Diane G Elliott; Linda D Rhodes
Journal:  Appl Environ Microbiol       Date:  2006-04       Impact factor: 4.792

6.  Relative impact of nucleotide and copy number variation on gene expression phenotypes.

Authors:  Barbara E Stranger; Matthew S Forrest; Mark Dunning; Catherine E Ingle; Claude Beazley; Natalie Thorne; Richard Redon; Christine P Bird; Anna de Grassi; Charles Lee; Chris Tyler-Smith; Nigel Carter; Stephen W Scherer; Simon Tavaré; Panagiotis Deloukas; Matthew E Hurles; Emmanouil T Dermitzakis
Journal:  Science       Date:  2007-02-09       Impact factor: 47.728

7.  Renibacterium salmoninarum p57 antigenic variation is restricted in geographic distribution and correlated with genomic markers.

Authors:  Gregory D Wiens; Ole Bendik Dale
Journal:  Dis Aquat Organ       Date:  2009-02-12       Impact factor: 1.802

8.  Genetic architecture of thermal adaptation in Escherichia coli.

Authors:  M M Riehle; A F Bennett; A D Long
Journal:  Proc Natl Acad Sci U S A       Date:  2001-01-09       Impact factor: 11.205

Review 9.  Gene duplication as a mechanism of genomic adaptation to a changing environment.

Authors:  Fyodor A Kondrashov
Journal:  Proc Biol Sci       Date:  2012-09-12       Impact factor: 5.349

10.  RibEx: a web server for locating riboswitches and other conserved bacterial regulatory elements.

Authors:  Cei Abreu-Goodger; Enrique Merino
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

View more
  7 in total

1.  A large accessory genome and high recombination rates may influence global distribution and broad host range of the fungal plant pathogen Claviceps purpurea.

Authors:  Stephen Wyka; Stephen Mondo; Miao Liu; Vamsi Nalam; Kirk Broders
Journal:  PLoS One       Date:  2022-02-10       Impact factor: 3.240

2.  Survey of Toxin⁻Antitoxin Systems in Erwinia amylovora Reveals Insights into Diversity and Functional Specificity.

Authors:  Teja Shidore; Quan Zeng; Lindsay R Triplett
Journal:  Toxins (Basel)       Date:  2019-04-06       Impact factor: 4.546

3.  Proteome analysis of the Gram-positive fish pathogen Renibacterium salmoninarum reveals putative role of membrane vesicles in virulence.

Authors:  Tobias Kroniger; Daniel Flender; Rabea Schlüter; Bernd Köllner; Anke Trautwein-Schult; Dörte Becher
Journal:  Sci Rep       Date:  2022-02-22       Impact factor: 4.379

4.  Helicobacter pylori virulence factors: relationship between genetic variability and phylogeographic origin.

Authors:  Aura M Rodriguez; Daniel A Urrea; Carlos F Prada
Journal:  PeerJ       Date:  2021-11-26       Impact factor: 2.984

5.  Lumpfish (Cyclopterus lumpus) Is Susceptible to Renibacterium salmoninarum Infection and Induces Cell-Mediated Immunity in the Chronic Stage.

Authors:  Hajarooba Gnanagobal; Trung Cao; Ahmed Hossain; My Dang; Jennifer R Hall; Surendra Kumar; Doan Van Cuong; Danny Boyce; Javier Santander
Journal:  Front Immunol       Date:  2021-11-22       Impact factor: 7.561

6.  Pathogen enrichment sequencing (PenSeq) enables population genomic studies in oomycetes.

Authors:  Gaetan J A Thilliez; Miles R Armstrong; Tze-Yin Lim; Katie Baker; Agathe Jouet; Ben Ward; Cock van Oosterhout; Jonathan D G Jones; Edgar Huitema; Paul R J Birch; Ingo Hein
Journal:  New Phytol       Date:  2018-10-05       Impact factor: 10.151

7.  A 19-isolate reference-quality global pangenome for the fungal wheat pathogen Zymoseptoria tritici.

Authors:  Thomas Badet; Ursula Oggenfuss; Leen Abraham; Bruce A McDonald; Daniel Croll
Journal:  BMC Biol       Date:  2020-02-11       Impact factor: 7.431

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.