| Literature DB >> 34222269 |
Xiao Li1,2, Kaichen Zhou1, Jingyu Wang1, Jiahe Guo1, Yang Cao3, Jie Ren1, Tao Guan4, Wenchao Sheng1, Mingyao Zhang1, Zhi Yao1,5, Quan Wang1.
Abstract
Urinary tract infections (UTIs) are one of the most common infectious diseases. UTIs are mainly caused by uropathogenic Escherichia coli (UPEC), and are either upper or lower according to the infection site. Fimbriae are necessary for UPEC to adhere to the host uroepithelium, and are abundant and diverse in UPEC strains. Although great progress has been made in determining the roles of different types of fimbriae in UPEC colonization, the contributions of multiple fimbriae to site-specific attachment also need to be considered. Therefore, the distribution patterns of 22 fimbrial genes in 90 UPEC strains from patients diagnosed with upper or lower UTIs were analyzed using PCR. The distribution patterns correlated with the infection sites, an XGBoost model with a mean accuracy of 83.33% and a mean area under the curve (AUC) of the receiver operating characteristic (ROC) of 0.92 demonstrated that fimbrial gene distribution patterns could predict the localization of upper and lower UTIs.Entities:
Keywords: UPEC; XGBoost; fimbriae; lower urinary tract infections; machine learning; upper urinary tract infections
Year: 2021 PMID: 34222269 PMCID: PMC8249706 DOI: 10.3389/fmed.2021.602691
Source DB: PubMed Journal: Front Med (Lausanne) ISSN: 2296-858X
Primers for PCR.
| CS1-like | Forward | GCTTGTACAACCGACAACA | 51 | 755 | a |
| Reverse | CTCTGTTCATCCTGTTCAGA | ||||
| Mat | Forward | ATGGACAGTTACGCATCC | 50 | 745 | a |
| Reverse | TCCACATCGTAAATACCGTA | ||||
| Type1 | Forward | ATGCCGCAGGTAATAGTG | 50 | 680 | a |
| Reverse | GAATTGCTCATCGACATTAC | ||||
| F1C/S | Forward | CGATTGTACCTGACCGTTCCT | 59 | 654 | This study |
| Reverse | CAGATGCCCTTCACGTTGC | ||||
| F9 | Forward | CGACACTTGCAGATGACAC | 51 | 536 | a |
| Reverse | TGACATACTGTAACTGGCGT | ||||
| Ycb | Forward | GTTGAGATAACGCCAGAGA | 51 | 727 | a |
| Reverse | CACTCGACGACGTAGAGTAG | ||||
| Auf | Forward | CTTTCGGTAACTACGGGTCT | 54 | 838 | This study |
| Reverse | CTGGCTGTAGCACCGAAT | ||||
| Sfm | Forward | ATTAGAGAATGGCACATCC | 54 | 862 | a |
| Reverse | ATCGCCATTTGAAGATGT | ||||
| LPF | Forward | AATAGTTACGCCACCTATTC | 49 | 550 | a |
| Reverse | TGAAGAGTACGCGATAGC | ||||
| ECSF-0165 | Forward | CTCCGTGAGTTCGGTCTT | 52 | 813 | a |
| Reverse | AACAGGTGTCTCAGCATGAT | ||||
| ECSF-4008 | Forward | CTGATGGTGATAATGCCA | 53 | 1,008 | a |
| Reverse | ACTGAGGCTCAGACACACTA | ||||
| CS12 | Forward | ATGTCTCGCGTCAATGTC | 54 | 730 | a |
| Reverse | CAGCATCGTAATAGTGTTCA | ||||
| AFA | Forward | GTACCTGAAGTACAACGTCAC | 53 | 543 | a |
| Reverse | CAGGACGTACTGTATGACG | ||||
| Yeh | Forward | CAGGTCGTAGCCATATTGA | 52 | 607 | a |
| Reverse | TGATTCTCGTCATAAGCATG | ||||
| Yeh-like | Forward | CTGCCTAAGGTGCTACTAAC | 55 | 688 | a |
| Reverse | TGCTGACATCGAGATCAGA | ||||
| F17-like | Forward | GTCATGGTAACCCTGTGC | 51 | 529 | a |
| Reverse | GCAAGGTCATGCATTATACT | ||||
| Yfc | Forward | TCGCAACATGAGCATCTC | 53 | 667 | a |
| Reverse | GTAGCTACCGTCACGCAA | ||||
| P | Forward | CCACCCAGACTGCGAGGCTAT | 64 | 546 | This study |
| Reverse | GTCGGCATCCGCATTATCAAA | ||||
| Pix | Forward | GCTGTACACCGTCACACTC | 53 | 812 | a |
| Reverse | TATCAGACATCCGCAACA | ||||
| Yad | Forward | AGCCATGCTTTCCTACAACC | 56 | 564 | This study |
| Reverse | ATATCCCAGCGACCAACG | ||||
| Yqi | Forward | CCGCAACATCTCCTACAG | 52 | 757 | a |
| Reverse | CGCGCTTTCACTAATGTT | ||||
| Ybg | Forward | ACCAAATCAGTAACGGACA | 51 | 451 | a |
| Reverse | CCTGACTGTTCATGGTTATC |
The primers for PCR are based on the sequences of usher protein encoding genes.
.
Figure 1Work flow diagram.
Figure 2Visualized cluster dendrogram of the distribution patterns of 22 fimbrial genes from 90 strains of uropathogenic Escherichia coli (UPEC). Strains from the upper urinary tract infection (UTI) group are shown in red.
Figure 3Receiver operating characteristic (ROC) curves produced from 22 fimbrial genes using an XGBoost classifier. (A) Five-fold cross-validation to evaluate classifier performance. Area under the curve (AUC) and ROC curve for each fold are presented in different colors, and the average AUC and ROC curve are in black. (B) Permutation test (n = 10,000) was performed to calculate the statistical significance of the model; the orange dotted line represents the final mean accuracy of the model.
Statistical calculation regarding the ratio of different fimbriae and the relevance to the UTIs.
| P | Positive | 12 | 12 | 0.061 |
| Negative | 19 | 47 | ||
| Auf | Positive | 8 | 3 | 0.007 |
| Negative | 23 | 56 | ||
| F1C/S | Positive | 3 | 2 | 0.335 |
| Negative | 28 | 57 | ||
| Yad | Positive | 8 | 3 | 0.007 |
| Negative | 23 | 56 | ||
| CS1-like | Positive | 3 | 21 | 0.008 |
| Negative | 28 | 38 | ||
| Mat | Positive | 31 | 55 | 0.294 |
| Negative | 0 | 4 | ||
| Type1 | Positive | 30 | 49 | 0.089 |
| Negative | 1 | 10 | ||
| F9 | Positive | 30 | 37 | 0.000 |
| Negative | 1 | 22 | ||
| Ycb | Positive | 5 | 40 | 0.000 |
| Negative | 26 | 19 | ||
| Sfm | Positive | 6 | 40 | 0.000 |
| Negative | 25 | 19 | ||
| LPF | Positive | 4 | 13 | 0.153 |
| Negative | 27 | 46 | ||
| ECSF-0165 | Positive | 17 | 12 | 0.001 |
| Negative | 14 | 47 | ||
| ECSF-4008 | Positive | 11 | 7 | 0.008 |
| Negative | 20 | 52 | ||
| CS12 | Positive | 0 | 0 | |
| Negative | 31 | 59 | ||
| AFA | Positive | 3 | 13 | 0.145 |
| Negative | 28 | 46 | ||
| Yeh | Positive | 28 | 57 | 0.335 |
| Negative | 3 | 2 | ||
| Yeh-like | Positive | 5 | 30 | 0.001 |
| Negative | 26 | 29 | ||
| F17-like | Positive | 9 | 2 | 0.001 |
| Negative | 22 | 57 | ||
| Yfc | Positive | 24 | 21 | 0.000 |
| Negative | 7 | 38 | ||
| Pix | Positive | 2 | 3 | 1 |
| Negative | 29 | 56 | ||
| Ygi | Positive | 17 | 12 | 0.001 |
| Negative | 14 | 47 | ||
| Ybg | Positive | 4 | 40 | 0.000 |
| Negative | 27 | 19 | ||
Figure 4Black and white grid of the distribution patterns of 22 fimbrial genes from 90 strains. Black represents positive status of a gene, whereas white represents negative status of a gene. The 22 fimbrial genes are in columns, and the 90 strains are in rows. The tree on the right shows clustering of the 90 strains, with strains from the upper urinary tract infection group mostly in the red box.
Figure 5Receiver operating characteristic (ROC) curves produced from nine fimbrial genes after feature selection using XGBoost. (A) Five-fold cross-validation to evaluate classifier performance. Area under the curve (AUC) and ROC curve for each fold are in different colors, and the mean AUC and ROC curve are in black. (B) Permutation test (n = 10,000), with orange dotted line representing the final mean accuracy of the model.