| Literature DB >> 24267777 |
Wen Zou, Hailin Tang, Weizhong Zhao, Joe Meehan, Steven L Foley, Wei-Jiun Lin, Hung-Chia Chen, Hong Fang, Rajesh Nayak, James J Chen.
Abstract
BACKGROUND: Pulsed field gel electrophoresis (PFGE) is currently the most widely and routinely used method by the Centers for Disease Control and Prevention (CDC) and state health labs in the United States for Salmonella surveillance and outbreak tracking. Major drawbacks of commercially available PFGE analysis programs have been their difficulty in dealing with large datasets and the limited availability of analysis tools. There exists a need to develop new analytical tools for PFGE data mining in order to make full use of valuable data in large surveillance databases.Entities:
Mesh:
Year: 2013 PMID: 24267777 PMCID: PMC3851133 DOI: 10.1186/1471-2105-14-S14-S15
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The data composition in BACPAK and Salmonella PFGE fingerprints database.
| BACPAK | |
|---|---|
Figure 1The flow chart of the tool for . (The image was taken from BACPAK.)
Five selected test Salmonella isolates, the prediction results and the distinguished band markers by the two-way hierarchical cluster analysis tool for five serotype identification ("X" stands for band presence).
| Test | Predicted serotypes | Real serotypes | Distinguished band markers (Kb) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
Figure 2Hierarchical cluster analysis of PFGE . The dendrogram shows a simplified tree-structure of 10,198 isolates of five serotypes: S. I 4, [5],12:i:- (4,5), S. Hadar (H), S. Oranienburg (O), S. Thompson (T), and S. Typhi (Ty). Five of 10,198 isolates are test isolates and labeled as T1 to T5 in various colors, while the rest of the isolates are retrieved from the PFGE database bound with the tools. The number in parentheses indicates the number of isolates in the branch squares. There are nine major clusters (C1 to C9) and 17 sub-clusters (c1 to c17) grouped by the hierarchical cluster analysis tool.
Figure 3Distance matrix of five selected serotypes. The heatmap shows the distances matrix presenting the dissimilarities for any two patterns in the selected dataset of five serotypes. The dissimilarity of PFGE patterns inter- or intra-serotypes was calculated by Jaccard Distance, and the values ranged from 0 (blue) to 1 (red) (shown in the index).
Figure 4Two-way hierarchical clustering analysis of the five selected serotypes. The color histogram shows the proportions of the bands present at every designated band location with values ranging from 0 to 1. The hierarchical cluster analysis was applied based on the dissimilarity measures of any two serotypes calculated by the Euclidean distance of the characteristic parameters. Both serotypes and band locations were clustered according to dissimilarity measures. Red asterisks indicated the distinguished band markers.