SUMMARY: Information regarding pathways through voids in biomolecules and their roles in ligand transport is critical to our understanding of the function of many biomolecules. Recently, the advent of high-throughput molecular dynamics simulations has enabled the study of these pathways, and of rare transport events. However, the scale and intricacy of the data produced requires dedicated tools in order to conduct analyses efficiently and without excessive demand on users. To fill this gap, we developed the TransportTools, which allows the investigation of pathways and their utilization across large, simulated datasets. TransportTools also facilitates the development of custom-made analyses. AVAILABILITY AND IMPLEMENTATION: TransportTools is implemented in Python3 and distributed as pip and conda packages. The source code is available at https://github.com/labbit-eu/transport_tools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
SUMMARY: Information regarding pathways through voids in biomolecules and their roles in ligand transport is critical to our understanding of the function of many biomolecules. Recently, the advent of high-throughput molecular dynamics simulations has enabled the study of these pathways, and of rare transport events. However, the scale and intricacy of the data produced requires dedicated tools in order to conduct analyses efficiently and without excessive demand on users. To fill this gap, we developed the TransportTools, which allows the investigation of pathways and their utilization across large, simulated datasets. TransportTools also facilitates the development of custom-made analyses. AVAILABILITY AND IMPLEMENTATION: TransportTools is implemented in Python3 and distributed as pip and conda packages. The source code is available at https://github.com/labbit-eu/transport_tools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
At any moment, living systems contain thousands of small organic molecules that need to arrive at their sites of action to exert their function. The transport of these molecules around the cell (and beyond) is governed primarily by channels and tunnels (henceforth referred to as ‘pathways’) formed from the internal voids of biomolecules (Kingsley and Lill, 2015). These pathways enable the transport of ions and small molecules between different regions, connecting inner cavities with a surface, two different cavities with each other, or different cellular environments via transmembrane proteins. Operating as such, the investigation of these pathways is critical to drug discovery (Marques ) and protein engineering initiatives (Kokkonen ). Since pathways are often equipped with dynamic gates (Gora ), they are mostly transient and challenging to study.One of the most common approaches used to characterize these rare events of ligand transmission via transiently open pathways is to run molecular dynamics (MD) simulations (Decherchi and Cavalli, 2020), analyzing the pathway dynamics using tools like CAVER (Jurcik ) or tracking ligand migration through the biomolecules with AQUA-DUCT (Magdziarz ); see Supplementary File S1 for an overview of the state-of-the-art tools to study ligand transport pathways. The intensive development seen in computing hardware and sampling algorithms over recent years has led to considerable growth in the size and complexity of datasets typically generated for a single protein system. It is not uncommon for such datasets to consist of thousands simulations. Such high-throughput approaches, however, impose a substantial burden on researchers in establishing the identity of the pathways observed across all simulations, determining which pathways are used by particular ligands, and developing means of specific quantitative analyses. To this end, we present TransportTools: a library designed to alleviate these difficulties by providing easy, efficient access to comprehensive details on transport processes—even for large-scale simulation sets—and offering an environment for the development of novel analyses and tools.
2 Features
TransportTools is available as a Python3 module distributed under the GNU General Public License v3.0, and available via pip and conda managers as the transport_tools package. In its standard workflow (Fig. 1), TransportTools utilizes outputs from CAVER and AQUA-DUCT analyses of MD simulation, integrating their complementary insights to investigate transport pathways and corresponding ligand migration events in soluble and membrane-embedded proteins. To achieve efficiency in such a high-throughput regimen, raw data on pathway ensembles and ligand-transport events is first coarse-grained, and positioned on a spherical grid. Next, TransportTools identifies relationships between pathway ensembles from individual simulations and joins them into superclusters, to which ligand-transport events are then assigned (see Supplementary File S2 for method details). Critical analysis parameters can be controlled via a configuration file. These parameters are thoroughly explained in the user guide, which also includes a detailed walk-through tutorial (Supplementary File S3). Aside from the ready-made workflow, the library offers many classes to process, manipulate and analyze pathways and events, simplifying the production of custom-made analyses and, hopefully, stimulating further development of new packages (Supplementary File S4).
Fig. 1.
Schematic of a standard TransportTools analysis workflow
Schematic of a standard TransportTools analysis workflow
Outputs
The main results generated by TransportTools are presented as a set of tables stored in text files. These contain data on the composition of pathway superclusters, on their geometrical properties and utilization by transport events, and on critical protein residues. Using generated scripts, the spatial representation of superclusters and assigned events can be visualized in PyMOL (PyMOL, Schrödinger, 2017). All results can be refined using various filters and split by individual simulation or by user-defined groups to facilitate their convenient comparison.
Performance and limitations
The performance of TransportTools was analyzed on three datasets of 50 simulations (each sampling 100 ns and consisting of 10 000 frames) of up to 500 residue-long enzymes with different accessibilities of their active sites, resulting in the detection of up to 5 000 000 transport pathways and 50 000 water-transport events, which were processed within 2–21 h on a standard workstation (Supplementary File S5). TransportTools inherits the limitations of the CAVER and AQUA-DUCT packages; their descriptions of pathway geometries and the definitions of their clusters (see Supplementary Section S2.2 of Supplementary File S3 for best practice guidelines). When MD trajectories are utilized directly, usage is restricted to file formats supported by either MDtraj or pytraj packages (McGibbon ; Roe and Cheatham, 2013).
Use cases
To illustrate the applicability of TransportTools, we applied it to the analysis of three representative examples of biological problems connected with ligand transport using an established model system—enzymes DhaA and LinB from the haloalkane dehalogenase family (Brezovsky ; Pavlova ). First, we analyzed 10 simulations of DhaA in an effort to discover rare transient tunnels and their usage by water molecules (Supplementary File S6). Next, we derived an understanding of the effect of mutations on the system by contrasting simulations of LinB wild-type, LinB32 mutant with a closed primary tunnel, and LinB86 mutant with a de novo created tunnel (Supplementary File S7). Finally, we studied the substrate molecule selectivity of the pathways leading to the active site of LinB86 in almost 600 simulations (Supplementary File S8).
3 Conclusions
The TransportTools library provides users with access to (i) efficient analyses of transport pathways across extensive MD simulations, including those originating from massively parallel calculations or very long simulations; (ii) integrated data regarding transport pathways and their actual utilization by small molecules; and (iii) rigorous comparisons of transport processes under different settings, e.g. by contrasting transport in an original system against the same system perturbed by mutations, different solvents or bound ligands.Click here for additional data file.
Authors: Robert T McGibbon; Kyle A Beauchamp; Matthew P Harrigan; Christoph Klein; Jason M Swails; Carlos X Hernández; Christian R Schwantes; Lee-Ping Wang; Thomas J Lane; Vijay S Pande Journal: Biophys J Date: 2015-10-20 Impact factor: 4.033
Authors: Sergio M Marques; Lukas Daniel; Tomas Buryska; Zbynek Prokop; Jan Brezovsky; Jiri Damborsky Journal: Med Res Rev Date: 2016-12-13 Impact factor: 12.944
Authors: Tomasz Magdziarz; Karolina Mitusińska; Maria Bzówka; Agata Raczyńska; Agnieszka Stańczak; Michał Banas; Weronika Bagrowska; Artur Góra Journal: Bioinformatics Date: 2020-04-15 Impact factor: 6.937
Authors: Adam Jurcik; David Bednar; Jan Byska; Sergio M Marques; Katarina Furmanova; Lukas Daniel; Piia Kokkonen; Jan Brezovsky; Ondrej Strnad; Jan Stourac; Antonin Pavelka; Martin Manak; Jiri Damborsky; Barbora Kozlikova Journal: Bioinformatics Date: 2018-10-15 Impact factor: 6.937