| Literature DB >> 31060512 |
Lusine Khachatryan1, Margriet E M Kraakman2, Alexandra T Bernards2, Jeroen F J Laros3,4,5.
Abstract
BACKGROUND: Bacteria carry a wide array of genes, some of which have multiple alleles. These different alleles are often responsible for distinct types of virulence and can determine the classification at the subspecies levels (e.g., housekeeping genes for Multi Locus Sequence Typing, MLST). Therefore, it is important to rapidly detect not only the gene of interest, but also the relevant allele. Current sequencing-based methods are limited to mapping reads to each of the known allele reference, which is a time-consuming procedure.Entities:
Keywords: Allele typing; Database preprocessing; Multi-locus sequence typing; Next-generation sequencing
Mesh:
Year: 2019 PMID: 31060512 PMCID: PMC6501397 DOI: 10.1186/s12864-019-5723-0
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Schematic representation of the database preprocessing. All of the processes are illustrated for one gene. Calculations for several genes are done independently in parallel
Fig. 2Schematic representation of the analysis part of BacTag pipeline. All of the processes are illustrated for one gene. Calculations for multiple genes are done independently in parallel. The analysis of the low similarity group of sequences is highlighted by the dashed box and can be manually turned off by the user for the time efficiency
Preprocessed MLST databases
| MLST database | Genes including number of alleles per gene | Number of alleles (per gene) in the low similarity group | Strain and reference sequence we used for flanking region construction |
|---|---|---|---|
|
| UMN026, NC_011751.1 | ||
|
|
|
| Kp52.145, FO834906.1 |
|
|
| – | ED99, NC_017568.1 |
|
|
| – | PG45, NC_014760.1. |
|
|
| – | ATCC 33277, NC_010729.1 |
|
|
|
| |
|
|
|
|
Testing the pipeline on artificial WGS data
| Species and strain | GeneBank Accession number | Identified alleles |
|---|---|---|
| FN554766.1 |
| |
| FM180568.1 |
| |
| CP000800.1 |
| |
| NC_017628.1 |
| |
| CP005930.1 |
| |
| NZ_CP007149.1 |
| |
| NC_011751.1 |
| |
| NZ_CP016072.1 |
| |
| NC_017568.1 |
| |
| NC_014925.1 |
| |
| NZ_CP023663.1 |
| |
| NC_018077.1 |
| |
| NZ_CP011348.1 |
| |
| NC_015725.1 |
| |
| NC_014760.1 |
| |
| NZ_CP019639.1 |
| |
| NC_010729.1 |
| |
| NZ_CP011996.1 |
| |
| CP013131.1 |
| |
| NC_010673.1 |
| |
| NC_008710.1 |
| |
| CP005829 |
| |
| NC_011244 |
| |
| CP005851 |
| |
| CP005745 |
| |
| CP003426 |
| |
|
| NC_003888.3 |
|
| CP005080.1 |
| |
| NC_010572.1 |
| |
| NC_020990.1 |
|
Fig. 3The dependence of database preprocessing time from the amount of sequences in the database
Results of pipeline testing on real E. coli and K. pneumoniae data. Only samples with results different from expected are shown
| SRA Run AC | Reported ST | Expected ST | Genes with multiple variants at the same position |
|---|---|---|---|
| ERR966604 | 95 | 73 | – |
| SRR2767732 | 16 | 16 |
|
| SRR2767734 | 21 | 21 |
|
| SRR2970643 | 131 | 131 |
|
| SRR2970737 | 131 | 131 |
|
| SRR2970742 | 131 | 131 |
|
| SRR2970753 | 131 | 131 |
|
| SRR2970774 | 131 | 131 |
|
| SRR2970775 | 131 | 131 |
|
| SRR5973405 | 1164 | 1164 |
|
| SRR5973308 | 1180 | 1180 |
|
| SRR5973303 | 13 | 13 |
|
| SRR5973253 | 133 | 133 |
|
| SRR5973334 | 133 | 133 |
|
| SRR5973324 | 1373 | 1373 |
|
| SRR5973251 | 1426 | 1426 |
|
| SRR5973269 | 147 | 147 |
|
| SRR5973320 | 1876 | 1876 |
|
| SRR5973351 | 188 | 188 |
|
| SRR5973329 | 20 | 20 |
|
| SRR5973408 | 2267 | 2276 |
|
| SRR5973397 | 25 | 25 |
|
| SRR5973248 | 258 | 258 |
|
| SRR5973283 | 258 | 258 |
|
| SRR5973279 | 258 | 258 |
|
| SRR5973271 | 258 | 258 |
|
| SRR5973336 | 258 | 258 |
|
| SRR5973319 | 258 | 258 |
|
| SRR5973317 | 258 | 258 |
|
| SRR5973294 | 258 | 258 |
|
| SRR5973291 | 258 | 258 |
|
| SRR5973289 | 258 | 258 |
|
| SRR5973400 | 258 | 258 |
|
| SRR5973382 | 258 | 258 |
|
| SRR5973381 | 258 | 258 |
|
| SRR5973287 | 258 | 258 |
|
| SRR5973240 | 307 | 307 |
|
| SRR597324 | 307 | 307 |
|
| SRR5973282 | 307 | 307 |
|
| SRR5973280 | 307 | 307 |
|
| SRR5973339 | 307 | 307 |
|
| SRR5973322 | 307 | 307 |
|
| SRR5973288 | 307 | 307 |
|
| SRR5973396 | 307 | 307 |
|
| SRR5973380 | 307 | 307 |
|
| SRR5973379 | 307 | 307 |
|
| SRR5973376 | 307 | 307 |
|
| SRR5973373 | 307 | 307 |
|
| SRR5973361 | 307 | 307 |
|
| SRR5973355 | 307 | 307 |
|
| SRR5973284 | 23 | 23 |
|
| SRR5973332 | 35 | 35 |
|
| SRR5973389 | 35 | 35 |
|
| SRR5973368 | 35 | 35 |
|
| SRR5973393 | 405 | 405 |
|
| SRR5973311 | 412 | 412 |
|
| SRR5973371 | 429 | 429 |
|
| SRR5973327 | 466 | 466 |
|
| SRR5973407 | 466 | 466 |
|
| SRR5973239 | 492 | 492 |
|
| SRR5973301 | 502 | 502 |
|
| SRR5973348 | 753 | 753 |
|
| SRR5973362 | 8 | 8 |
|
Fig. 4Time required for the analysis of 30 samples belonging to the ST131 by two modes of BacTag (a and b), MLST 1.8 (c) and Enterobase (d)
Fig. 5Comparing of the processing time required for the Achtman seven genes MLST analysis of 30 WGS E. coli samples