| Literature DB >> 26813574 |
Josephine M Bryant1,2, Virginie C Thibault3, David G E Smith4,5, Joyce McLuckie6, Ian Heron7, Iker A Sevilla8, Franck Biet9, Simon R Harris10, Duncan J Maskell11, Stephen D Bentley12, Julian Parkhill13, Karen Stevenson14.
Abstract
BACKGROUND: Mycobacterium avium subspecies paratuberculosis (Map) is an infectious enteric pathogen that causes Johne's disease in livestock. Determining genetic diversity is prerequisite to understanding the epidemiology and biology of Map. We performed the first whole genome sequencing (WGS) of 141 global Map isolates that encompass the main molecular strain types currently reported. We investigated the phylogeny of the Map strains, the diversity of the genome and the limitations of commonly used genotyping methods.Entities:
Mesh:
Year: 2016 PMID: 26813574 PMCID: PMC4729121 DOI: 10.1186/s12864-015-2234-5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Whole genome SNP-based phylogenetic tree of strains included in this study. a Phylogenetic tree of Map and non-Map strains built using FastTree [52]. Map strains form a single clade and are separated from other mycobacterial species by at least 40,000 SNPs (branches shortened for illustrative purposes). b Maximum likelihood phylogenetic tree of Map strains sequenced as part of this study. The tree was created using RAxML [30], and is based on SNPs identified through mapping to Map-K10 as described in the text. Branches are annotated with the host, country of origin and isolate MAPMRI numbers. Numbers in black to the far right represent the INMV profiles. Previously described lineages are labeled. The dashed box represents strains designated as the Indian bison type. Bootstrap values are shown in Additional file 2: Figure S1
Distance matrix plot showing the number of SNPs present between selected strain groups
| Type S | Type S(I) | Type S(III) | Type C | |
|---|---|---|---|---|
| Type S | 2360 | |||
| Type S(I) | 1051 | 2684 | ||
| Type S(III) | 1051 | 3087 | ||
| Type C | 2360 | 2684 | 3087 | |
| Type B | 2565 | 2889 | 3292 | 264 |
Fig. 2Type and genomic position of IS1311 element found in the Map sequencing data. The proportion of reads mapping to the IS1311 element with a T or C at position 223 is indicated by the colour gradient of circles displayed on the phylogenetic tree. Asterisks indicate samples where the “TG” deletion was detected [22]. The position of the IS1311 element in the genome was predicted by identifying paired reads where one read maps to the IS1311 element and the other does not, and using the mapped position of the latter. This is displayed horizontally with the genome co-ordinates displayed along the top. The colour intensity of the predicted IS1311 element position represents the relative number of supporting reads detected within the sample, and thus is normalised for differences in depth of coverage between samples
Fig. 3Investigating the presence of a clock-like signal in the data. The presence of a molecular clock in the dataset was assessed by plotting the root to tip distance of isolates in the phylogeny against isolation date [53] for both the dataset as a whole (a + b) and only Type II isolates (c + d). Both had evidence of a very weak positive signal, as indicated by a low linear regression correlation coefficient (a + c). In order to test the significance of these observations, 99 comparison datasets were produced, in which the isolation dates were permuted on the phylogeny. b and d show histograms of correlation coefficient values from the permuted datasets, with the red line indicating the value from the real data. In both cases the correlation coefficient of the real data is significantly greater than the permuted data at the 0.05 level. Passing this test is a minimal requirement for the application of BEAST analysis (Fig. 4), which assumes the presence of a molecular clock
Fig. 4Estimates of substitution rate. Coalescent analyses were implemented using the BEAST package (v1.7.5) [31] as described in the text. This was carried out both on the entire data and the Type C isolates alone, using a variety of population models as shown. The mean value represents the mean estimated substitution rate from three independent runs. The confidence intervals represent the maximum and minimum higher posterior densities obtained. As a comparison, the estimated rates are shown alongside a predicted substitution rate of M. tuberculosis in the context of transmission [49]