| Literature DB >> 35408437 |
Chase M Clark1, Linh Nguyen2, Van Cuong Pham2, Laura M Sanchez3, Brian T Murphy1.
Abstract
Libraries of microorganisms have served as a cornerstone of therapeutic drug discovery, though the continued re-isolation of known natural product chemical entities has remained a significant obstacle to discovery efforts. A major contributing factor to this redundancy is the duplication of bacterial taxa in a library, which can be mitigated through the use of a variety of DNA sequencing strategies and/or mass spectrometry-informed bioinformatics platforms so that the library is created with minimal phylogenetic, and thus minimal natural product overlap. IDBac is a MALDI-TOF mass spectrometry-based bioinformatics platform used to assess overlap within collections of environmental bacterial isolates. It allows environmental isolate redundancy to be reduced while considering both phylogeny and natural product production. However, manually selecting isolates for addition to a library during this process was time intensive and left to the researcher's discretion. Here, we developed an algorithm that automates the prioritization of hundreds to thousands of environmental microorganisms in IDBac. The algorithm performs iterative reduction of natural product mass feature overlap within groups of isolates that share high homology of protein mass features. Employing this automation serves to minimize human bias and greatly increase efficiency in the microbial strain prioritization process.Entities:
Keywords: IDBac; MALDI; bioinformatics; drug discovery; microorganisms; natural products
Mesh:
Substances:
Year: 2022 PMID: 35408437 PMCID: PMC9000433 DOI: 10.3390/molecules27072038
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1Dendrogram (a) created from 819 bacterial isolates (2000–15,000 Da; protein spectra run in triplicate for each isolate). An example Metabolite Association Network (MAN) is displayed in (b), where large circles represent bacterial isolates that are connected to smaller circles representing m/z features. Colored circles represent isolates and mass features chosen by the algorithm, discussed further in Figure 2. The pseudo-phylogenetic grouping of these (b) isolates is depicted by the blue box on the dendrogram.
Figure 2MAN of pseudo-phylogenetic Group 8, depicting the NP coverage resulting from manual and automatic prioritization processes. Large nodes represent bacterial isolates and small nodes represent mass features. Large blue nodes are isolates selected to be added to the library, while small green circles represent mass features “captured” by the selected isolates. Large grey nodes are isolates not selected, while small orange triangles are mass features “missed” by the selected isolates. Links to the other 82 groups may be found in Section Data Availability Statement.
Figure 3Venn diagram showing the number of mass features “captured” by the two researchers and the IDBac algorithm; 82% were captured by all three, though the algorithm did so with 47 and 104 fewer isolates than Researchers 1 and 2, respectively. The total isolates recovered from each group: Researcher 1: 236; Researcher 2: 293; the IDBac algorithm: 189.