| Literature DB >> 32761141 |
Xiaoru Wang1, Zongbao Liu2, Xiaoying Li1, Danwei Li1, Jiayu Cai1, He Yan1.
Abstract
The rapid and accurate diagnosis of swine diseases is indispensable for reducing their negative impacts on the pork industry. Next-generation sequencing (NGS) is a promising diagnostic tool for swine diseases. To support the application of NGS in the diagnosis of swine disease, we established the Swine Pathogen Database (SPDB). The SPDB represents the first comprehensive and highly specialized database and analysis platform for swine pathogens. The current version features an online genome search tool, which now contains 26 148 genomes of swine, swine pathogens and phylogenetically related species. This database offers a comprehensive bioinformatics analysis pipeline for the identification of 4403 swine pathogens and their related species in clinical samples, based on targeted 16S rRNA gene sequencing and metagenomic NGS data. The SPDB provides a powerful and user-friendly service for veterinarians and researchers to support the applications of NGS in swine disease research. Database URL: http://spdatabase.com:2080/.Entities:
Mesh:
Year: 2020 PMID: 32761141 PMCID: PMC7409514 DOI: 10.1093/database/baaa063
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1Workflow of the data integration processes. Genetic sequence and pathogen information are imported from two main sources: public resources and users.
Figure 2Database schema of data search and download system. Main data structure and relationships between different tables are illustrated.
Figure 3Interfaces of data search and download system. (A) Search page. A search box on the top is for a keyword-based search for quick and focused searches; clicking on the search without character input in the search box will allow the user to enter accurate and customized searches; the next section displays the top 20 hot words for each category, which can be directly linked to the corresponding data. (B) Search result page. The top of this page shows an option for advanced search, allowing the user to search the target using 17 fields to find the appropriate information more accurately. The bottom section represents the search result list, with 30 fields listed in total.
Figure 4Integrated workflows for a pathogen screening tool. The workflows are shown in different colors based on input data (reads in blue, contig/scaffold/gene in yellow, protein in green and 16S tag in purple).
Figure 5Screenshots of the pathogen screening tool. (A) File upload module. (B) Necessary arguments setting. (C–D) Overview of results. The top 10 species are shown in both a table and a histogram format. (E) The resulting files. The generated files can be downloaded. Species_result.xls is the result of species level. Taxa_summary_result.xls shows the result at the genus to phylum levels. Taxonomy.mid.xls contains all analysis records. Taxonomy.mid.filter.xls is the filtered result based on the set parameters and is the source file of Species_result.xls and Taxa_summary_result.xls.
Performance of the pathogen screening tool
| Dataset number | Data type | Content | Data size | Analysis tool | Database | CPU | Running time | Correlation |
|
|---|---|---|---|---|---|---|---|---|---|
| 1 | Paired-end reads | Bacteria | 5.43 × 109 | Bowtie 2 | Bacteria | 16 | 327 | 9.91 × 10−1 | 2.75 × 10−8 |
| BWA | 8 | 435 | 9.91 × 10−1 | 2.56 × 10−8 | |||||
| SOAP | 8 | 225 | 9.94 × 10−1 | 6.29 × 10−9 | |||||
| 2 | Paired-end reads | Parasite | 5.59 × 109 | Bowtie 2 | Parasite | 16 | 192 | 9.94 × 10−1 | 6.29 × 10−9 |
| BWA | 8 | 205 | 9.64 × 10−1 | 7.26 × 10−6 | |||||
| SOAP | 8 | 233 | 9.82 × 10−1 | 4.53 × 10−7 | |||||
| 3 | Paired-end reads | Phage | 5.67 × 109 | Bowtie 2 | Phage | 16 | 100 | 9.71 × 10−1 | 2.88 × 10−6 |
| BWA | 8 | 89 | 9.71 × 10−1 | 2.93 × 10−6 | |||||
| SOAP | 8 | 57 | 9.75 × 10−1 | 1.77 × 10−6 | |||||
| 4 | Paired-end reads | Virus | 4.26 × 109 | Bowtie 2 | Virus | 16 | 89 | 9.46 × 10−1 | 3.46 × 10−5 |
| BWA | 8 | 119 | 9.41 × 10−1 | 4.89 × 10−5 | |||||
| SOAP | 8 | 59 | 9.60 × 10−1 | 1.07 × 10−5 | |||||
| 5 | Paired-end reads | Combine* | 7.20 × 109 | Bowtie 2 | Host and bacteria | 26 | 1135 | 8.76 × 10−1 | 1.85 × 10−4 |
| Host and Parasite | 1068 | ||||||||
| Host and phage | 1077 | ||||||||
| Host and virus | 1076 | ||||||||
| BWA | Host and bacteria | 20 | 1485 | 8.76 × 10−1 | 1.86 × 10−4 | ||||
| Host and parasite | 1251 | ||||||||
| Host and phage | 1269 | ||||||||
| Host and virus | 1247 | ||||||||
| SOAP | Host and bacteria | 20 | 631 | 8.13 × 10−1 | 1.31 × 10−3 | ||||
| Host and parasite | 528 | ||||||||
| Host and phage | 563 | ||||||||
| Host and virus | 510 | ||||||||
| 6 | Contig | Bacteria | 1.67 × 108 | blastn | Bacteria | 20 | 49 | 1.00 | 6.65 × 10−64 |
| 7 | 16S tag | Bacteria | 5.18 × 105 | blastn | 16S | 20 | 16 | 1.00 | 6.65 × 10−64 |
| 8 | Gene | Bacteria | 6.16 × 104 | blastn | Bacteria | 20 | 14 | 9.82 × 10−1 | 4.19 × 10−7 |
| 9 | Protein | Bacteria | 2.03 × 104 | tblastn | Bacteria | 20 | 21 | 9.08 × 10−1 | 2.85 × 10−4 |
| 10 | Contig | Parasite | 1.26 × 107 | blastn | Parasite | 20 | 26 | 1.00 | 6.65 × 10−64 |
| 11 | Gene | Parasite | 6.00 × 104 | blastn | Parasite | 20 | 14 | 9.95 × 10−1 | 3.50 × 10−9 |
| 12 | Protein | Parasite | 1.90 × 104 | tblastn | Parasite | 20 | 16 | 9.87 × 10−1 | 1.16 × 10−7 |
| 13 | Contig | Phage | 2.62 × 106 | blastn | Phage | 20 | 14 | 9.33 × 10−1 | 7.92 × 10−5 |
| 14 | Gene | Phage | 3.42 × 104 | blastn | Phage | 20 | 16 | 9.31 × 10−1 | 9.05 × 10−5 |
| 15 | Protein | Phage | 1.14 × 104 | tblastn | Phage | 20 | 17 | 8.58 × 10−1 | 1.50 × 10−3 |
| 16 | Contig | Virus | 2.12 × 106 | blastn | Virus | 20 | 14 | 1.00 | 6.65 × 10−64 |
| 17 | Gene | Virus | 5.37 × 104 | blastn | Virus | 20 | 13 | 9.33 × 10−1 | 7.92 × 10−5 |
| 18 | Protein | Virus | 1.79 × 104 | tblastn | Virus | 20 | 14 | 9.57 × 10−1 | 1.36 × 10−5 |
| 19 | Contig | Combine | 1.02 × 108 | blastn | Bacteria | 20 | 37 | 1.00 | 0.00 |
| Parasite | 17 | ||||||||
| Phage | 31 | ||||||||
| Virus | 17 | ||||||||
| 20 | Gene | Combine | 9.50 × 104 | blastn | Bacteria | 20 | 19 | 9.32 × 10−1 | 9.90 × 10−6 |
| Parasite | 19 | ||||||||
| Phage | 19 | ||||||||
| Virus | 19 | ||||||||
| 21 | Protein | Combine | 3.04 × 104 | tblastn | Bacteria | 20 | 27 | 8.16 × 10−1 | 1.22 × 10−3 |
| Parasite | 19 | ||||||||
| Phage | 21 | ||||||||
| Virus | 17 |
Ten columns are displayed. ‘Dataset number’ indicates the serial number of the resulting dataset. ‘Data type’ indicates the type of data displayed. ‘Content’ shows the species composition of the dataset, where ‘Combine’ indicates that the dataset contains sequences from bacteria, virus, parasite and phage, and asterisks (*) indicate that the dataset contains sequences from swine. ‘Data size’ indicates the number of bases or amino acids in the data. ‘Analysis tool’ lists the tool used for analysis. ‘Database’ lists the reference database used. ‘CPU’ lists the number of CPUs employed for the job. ‘Running time’ lists the total time taken for the job in minutes. ‘Correlation’ lists the average Pearson correlation coefficient between predicted and known relative proportions of species in the datasets. ‘P-value’ displays the observed P-values for analyzing differences between the predicted and known relative proportions of species. The data under ‘Data size’, ‘Correlation’ and ‘P-value’ are displayed in scientific notation.