Literature DB >> 35242287

Microtubule assembly and disassembly dynamics model: Exploring dynamic instability and identifying features of Microtubules' Growth, Catastrophe, Shortening, and Rescue.

Evgenii Kliuchnikov1, Eugene Klyshko1, Maria S Kelly2, Artem Zhmurov3, Ruxandra I Dima2, Kenneth A Marx1, Valeri Barsegov1.   

Abstract

Microtubules (MTs), a cellular structure element, exhibit dynamic instability and can switch stochastically from growth to shortening; but the factors that trigger these processes at the molecular level are not understood. We developed a 3D Microtubule Assembly and Disassembly DYnamics (MADDY) model, based upon a bead-per-monomer representation of the αβ-tubulin dimers forming an MT lattice, stabilized by the lateral and longitudinal interactions between tubulin subunits. The model was parameterized against the experimental rates of MT growth and shortening, and pushing forces on the Dam1 protein complex due to protofilaments splaying out. Using the MADDY model, we carried out GPU-accelerated Langevin simulations to access dynamic instability behavior. By applying Machine Learning techniques, we identified the MT characteristics that distinguish simultaneously all four kinetic states: growth, catastrophe, shortening, and rescue. At the cellular 25 μM tubulin concentration, the most important quantities are the MT length L , average longitudinal curvature κ long , MT tip width w , total energy of longitudinal interactions in MT lattice U long , and the energies of longitudinal and lateral interactions required to complete MT to full cylinder U long add and U lat add . At high 250 μM tubulin concentration, the most important characteristics are L , κ long , number of hydrolyzed αβ-tubulin dimers n hyd and number of lateral interactions per helical pitch n lat in MT lattice, energy of lateral interactions in MT lattice U lat , and energy of longitudinal interactions in MT tip u long . These results allow greater insights into what brings about kinetic state stability and the transitions between states involved in MT dynamic instability behavior.
© 2022 The Author(s).

Entities:  

Year:  2022        PMID: 35242287      PMCID: PMC8861655          DOI: 10.1016/j.csbj.2022.01.028

Source DB:  PubMed          Journal:  Comput Struct Biotechnol J        ISSN: 2001-0370            Impact factor:   7.271


Introduction

Microtubules (MTs) are essential for many fundamental processes involved in the organization and dynamics of substructures necessary for the health and viability of different eukaryotic cell types [1]. MTs are long, micrometer sized, hollow cylinders consisting of a lattice that can be formed from 9 to 16 laterally aggregating protofilaments [2]. Each protofilament strand is comprised of linearly arrayed αβ-tubulin dimers, interacting noncovalently via so-called longitudinal interactions. Within non-dividing cells, MTs help form the cytoskeleton, a multi-protein poly-filament interlinked superstructure comprised of MTs, intermediate filaments and actin fibers, that helps to organize and dynamically alter cells in response to their environment [3]. The MTs interact with many types of accessory proteins and microtubule associated proteins (MAPs), that can altering the structural and dynamic properties of the MT [4]. MTs possessing a different number of laterally aggregated protofilaments have been observed in different cell types and in vitro [5]. However, the 13 protofilament MTs appear to be dominant, representing the presumptive major canonical biological structure [2], [6], [7]. Another important MT function within most cell types involves the transport of molecular cargo along tracks on its surface by a variety of ATP dependent motor proteins [8]. For example, in neural origin cells, the prominent tau protein binds multiple αβ-tubulin dimers along a protofilament [9] that helps: to greatly stabilize MTs, to the point of promoting and stabilizing lattice defects, to promote MTs’ lateral aggregation behavior [10], and to provide long neural processes, traversed by such bundled MTs in association with specialized proteins ability to transport molecular cargo over considerable distances (tens of micrometers) [11]. The ability of MTs to undergo stochastic cycles of growth and shortening, a property called dynamic instability, is critically important in many cellular processes [12], [13], [14], [15], [16]. A prime example is cell mitosis, when dividing cells must accurately segregate their duplicated chromosomes into two identical daughter cells prior to the cell dividing. MTs are principal components of the spindle apparatus that organize this process. When the chromosomes are correctly organized and positioned, depolymerizing MTs exert pulling forces on chromosomes [17] to bring about accurate, error free, segregation of the identical duplicated sister chromatids to the daughter cells [18]. Depolymerizing MTs act as a biological motor creating poleward chromosome motion from the hydrolysis of tubulin-bound GTP to GDP that drives destabilization and depolymerization of the MT filaments [19], [20]. GTP hydrolysis results in formation of the GDP-bound bent metastable state of the αβ dimer, as opposed to the GTP-bound straight stable conformation [18]. Not surprisingly then, MT polymerization is stimulated by the stretching of cells [21]. The αβ-tubulin dimer subunit structure has been the target of drugs (colchicine, vinblastine, taxanes, epothilones) designed to bind specific sites on these subunits, leading to arrest of rapid cell growth in cancer cells by abolishing the MTs’ ability to segregate chromosomes [22]. During MT assembly, tubulin dimers possessing the GTP-bound β-tubulin subunits are added to protofilaments at the tip of a growing MT with a slight bend at the inter-dimer region. Due to GTP hydrolysis, protofilament bound dimers can transform into the more bent GDP bound dimer conformation. In the disassembly or shortening phase, weaker lateral tubulin bonds, responsible for protofilaments’ lateral aggregation, dissociate prior to the stronger longitudinal bonds, thereby causing a splaying out of individual protofilaments at the MT end away from the cylindrical axis direction. This creates the so-called “rams’ horns” structures, which have been observed in many high resolution electron microscopy and cryotomography studies [23], [24], [25], [26]. A number of investigators have measured the kinetics of MT assembly and attempted to discover the rationale for the dynamic instability behavior of MTs (continual transitions from assembly to disassembly and back to assembly), one extreme aspect of which is the so-called catastrophe behavior of MTs, namely the sudden transition from assembly to a significant disassembly event [27], [28], [29]. It has been hypothesized that the presence of the GTP cap at the growing MT tip stabilizes the MT structure [13], [30]. Due to the stochastic nature of the assembly process, occasionally the tip is depleted of its GTP cap via hydrolysis to GDP, and then the catastrophe event occurs via a mechanism that is unclear [31], [32]. MTs’ tip conformation and its role in the catastrophe phenomenon remains unclear; thus, they represent important targets for the present and future studies. One view is that the MT grows as an opened cylinder of PFs, which subsequently closes into a tube [33], [34]. However, most EM structure studies of the growing MT tip suggest that it is only represented by a closed tapered structure comprised of varying length protofilament extensions. MTs are critically important structures within cells, in which the interplay between the biomechanics of MT protofilaments and biochemistry of the αβ-tubulin heterodimers defines the MT properties; yet, the mechano-chemical aspects of MT dynamics are not well understood. While considerable experimental efforts have been expended to elucidate the mechanisms of MT catastrophe [29], almost nothing is known about the specific factors that trigger the onset of MT rescue, in part because a high-resolution dynamic view of MTs undergoing growth and shortening is not available experimentally. This, in addition to the complex multi-protofilament structure of MTs, makes it difficult to establish a direct correspondence between the microscopic properties of MTs and MT properties observed experimentally. These experimental limitations necessitate the use of high-resolution computer-based modeling to provide mechanistic insights. Several mechano-chemical properties of the 13 protofilament MT have been explored in a recent in silico study of MT indentation [35]. This has provided a high resolution structural and energetics view of the deformation process and has determined the tubulin-tubulin lateral and longitudinal bond strengths between adjacent protofilaments and within a single MT protofilament, respectively. Due to the complexity of MTs, their large system size (∼1,000 tubulin monomers for the purpose of modeling) and the long timescales involved (from seconds to minutes), in silico modeling of MT dynamics is challenging. Different kinetic and mechanical models of MT assembly and disassembly have been reported in the literature [25], [28], [36], [37], [38], [39]. These models vary in the types of representation of molecular processes, and the extent to which they agree with experiments [14]. Here, we briefly describe several of the most elaborate models [28], [40], [41]. Michaels et al. have developed a model of MT dynamics combining the MT lattice deformation mechanics with the kinetic description of MT growth and tubulin hydrolysis. The authors designed a simulation pipeline based on 3D geometry of MTs, which combines the MT wall deformation mechanics, the kinetics of MT growth and tubulin hydrolysis, and quenched disorder in the longitudinal and lateral mechanical parameters. The results of these studies suggest a possible rescue mechanism via random GTP-islands, i.e. patches of remaining GTP-bound tubulins within the MT wall [40]. Gudimchuk et al. employed a Monte Carlo algorithm to model the kinetics of MT growth, GTP hydrolysis and Brownian dynamics of MT depolymerization, where the motion of each of 13 MT protofilament is constrained to a 2D plane. Gudimchuk et al. explored the effect of activation barriers on MT dynamics, the MT growth dependence on assisting force, and studied mechanisms of generation of pushing force by growing MTs and pulling by shortening MTs [41]. Despite these recent advances in theoretical modeling of MTs, a molecularly detailed, physically realistic, thermodynamically accurate, and complete 3D molecular representation of MT dynamics has not yet been explored. There is a critical need for physically rigorous computational models capable of describing both MT polymerization and depolymerization, as well as transitions from growth to shortening (catastrophe), and from shortening to growth (rescue). Here, we developed the Microtubule Assembly and Disassembly DYnamics (MADDY) model for Langevin dynamics simulations of the canonical 13 PF MT’s 3D assembly and disassembly in biologically relevant times (seconds). The simulated kinetics of these processes compare well with the experimental values for MT rates of assembly and disassembly [33], [42], [43]. Stalling forces simulated on the Dam 1 ring agree with the experimental force distributions [44], [45]. The extent of curving of protofilaments at the + end of MTs to form “rams’ horns” structures correlate with experimental data [46], [47], and MT tip shape and length agree with experimental EM images [47], [48]. To help understand the nature and origins of dynamic instability exhibited by MTs at the αβ-tubulin-dimer level of molecular detail, we used the MADDY model to carry out very long simulations (4–5 s) accelerated on Graphics Processing Units (GPU). We observed different kinetic states in the dynamic instability profiles of the MTs, i.e. MT growth, shortening, catastrophe and rescue. Next, we employed a range of Machine Learning techniques, including the Support Vector Machines, Logistic Regression, Random Forest and XGBoost, to identify the molecular parameters that can effectively discriminate between these different kinetic states based upon 14 different MT characteristics (see Results) quantitated for structures within the individual kinetic states. For the experimentally relevant and intracellular concentration of soluble -tubulin heterodimers, the most important features are: the MT length, MT tip width, the total energies of lateral and longitudinal interactions in MT lattice, and the energies of lateral and longitudinal interactions required to complete the MT to a full cylinder. Further refining the importance of these identified MT parameters to characterize the specific kinetic states occurring in dynamic instability of MTs can help guide new experiments to prove these parameters’ importance in defining the molecular causes of state stability and of transitions between specific states.

Results

Microtubule assembly and disassembly DYnamics (MADDY) model

In the molecularly detailed MADDY model, an MT structure is represented by a cylindrical array of tubulin heterodimers arranged into a 13_3 lattice. Each α- and β-tubulin monomer is modeled by a spherical bead with radius 2 nm. Position and orientation of each monomer in the MT lattice is fully defined by six degrees of freedom: three translational for the coordinates of the subunit’s center of mass {, , } and three rotational for the orientation angles {, , }. These angles correspond to rotations around the MT cylinder axis , and the radial axes and , respectively (Fig. 1A). Each α- and β-tubulin monomer has four non-covalent interaction sites, enabling them to form bonding interactions with their next-neighbor subunits, which are described by the bond potential (first term in Eq. (1) in Materials and methods). Each monomer can form two longitudinal (intra-protofilament) non-covalent bonds and two lateral (inter-protofilament) non-covalent bonds (Fig. 1A). These are described by the longitudinal and lateral Morse potentials and , respectively (Fig. 1B), expressed in terms of the binding strength and and interaction range and (Eq. (2)). In addition, there is a non-covalent bond between the α- and β-tubulin monomers within the αβ-tubulin dimer described by the harmonic potential (the inset in Fig. 1B). The total potential energy for non-covalent bonding interactions between the - and -tubulin monomers within the protofilament and between the next-neighbor protofilaments, and within the -tubulin dimer is, therefore, the sum of three potentials , and (Eq. (2)). The MADDY model also takes into account protofilament bending described by the potential energy (second term in Eq. (1); see also Eq. (3)), which includes the bending potential for monomer rotation around the longitudinal -axis, and bending potentials and for monomer rotation around the - and -axes, respectively (Fig. 1A). describes protofilaments’ splaying outward resulting in formation of “ram’s horns” structures (Fig. 1D). and account for monomers’ rotation around the other orthogonal directions (the inset to Fig. 1C). All three potentials are expressed in terms of the trigonometric cosine function with the bending rigidities , and and equilibrium bending angles , , and (Eq. (3)), which for small angle differences, , , and reduce to the harmonic potentials , , and (Fig. 1C, D). To reflect the difference in the equilibrium bending angles for the GTP-bound vs GDP-bound αβ-tubulin dimer, we set  = 0.1 rad and 0.2 rad for the respective GTP-bound and GDP-bound tubulins (Fig. 1D). The MADDY model also accounts for the excluded volume interactions between the tubulin monomers (Fig. 1C) using the repulsive Lennard-Jones potential, , which depends on the strength of repulsion and the range of repulsion (third term in Eq. (1); see also Eq. (4)). The dynamics of MT assembly-disassembly is generated using the Langevin dynamics in the overdamped (high-friction) limit. This amounts to solving numerically 6 Langevin equations of motion for the coordinates {, , } (see Eq. (5)) and angles {, , } (see Eq. (6)) for all particle 1, 2, …, (α- or β-tubulin monomers). To speed up the simulations, the Langevin dynamics are implemented on a Graphics Processor Unit (GPU) using the numerical algorithms developed in our previous studies [49], [50], [51].
Fig. 1

MADDY model: A) The 13 protofilament helical microtubule with splayed protofilament ends forming “ram’s horns” structures is shown along with a representation of the α- and β-tubulin monomers displayed as rigid spheres with 3 coordinates, , , , and 3 angles, , , . The α- and β-tubulin monomers possess 4 interaction sites: 2 lateral (shown in green) and 2 longitudinal (yellow). The potential energy terms are described in Materials and methods. B) Longitudinal and lateral energies and vs distance between adjacent interaction sites: (intra-protofilament bond) and (inter-protofilament bond), respectively (Eq. (2)). and are described by the bond-energy scale and and interaction-length scale and to account for the interactions between monomers from different tubulin dimers (Eq. (2)). The inset shows the potential vs distance between longitudinal interaction sites in the same dimer (Eq. (2)). C) Repulsive Lennard-Jones potential vs distance between monomers’ centers of mass (see Eq. (4)). The inset shows bending potentials and vs angles and (Eq. (3)). D) Potential energy functions for protofilament outward bending for the GTP- and GDP-bound tubulin dimers vs bending angle between tubulin dimers (Eq. (3)). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

MADDY model: A) The 13 protofilament helical microtubule with splayed protofilament ends forming “ram’s horns” structures is shown along with a representation of the α- and β-tubulin monomers displayed as rigid spheres with 3 coordinates, , , , and 3 angles, , , . The α- and β-tubulin monomers possess 4 interaction sites: 2 lateral (shown in green) and 2 longitudinal (yellow). The potential energy terms are described in Materials and methods. B) Longitudinal and lateral energies and vs distance between adjacent interaction sites: (intra-protofilament bond) and (inter-protofilament bond), respectively (Eq. (2)). and are described by the bond-energy scale and and interaction-length scale and to account for the interactions between monomers from different tubulin dimers (Eq. (2)). The inset shows the potential vs distance between longitudinal interaction sites in the same dimer (Eq. (2)). C) Repulsive Lennard-Jones potential vs distance between monomers’ centers of mass (see Eq. (4)). The inset shows bending potentials and vs angles and (Eq. (3)). D) Potential energy functions for protofilament outward bending for the GTP- and GDP-bound tubulin dimers vs bending angle between tubulin dimers (Eq. (3)). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

MADDY force field parameterization:

The MADDY force-field has 11 parameters described in Materials and methods, which must first be determined before running Langevin simulations. To achieve this goal, we varied the parameters of the MADDY force-field, using the procedure described below, until we found the best match between the experimental and simulation results. We assumed that the equilibrium conformations of the GTP- and GDP-bound tubulin dimer is nearly straight and slightly bent with the equilibrium bending angle 0.1 rad and 0.2 rad, respectively (Table 1, Fig. 1D). The choice of 0.2 rad is consistent with the configuration of the “ram’s horns” structures at the end of a depolymerizing MT [46], [48]. The value of 0.1 rad for the GTP-bound tubulin comes from the analysis of experimental GMPCPP-containing tubulin structures [48], [52]. Rotations of monomers around - and -angles are of small amplitude, and so we set 0 and 0 [28] which agrees with electron crystallography [53] and Cryo-EM data [54]. The initial estimate for bending rigidity range  = 1.2103–1.7103 pNnm/rad2 was based upon our previous work on MT indentation in silico [35]. Both for GTP- and GDP- bound tubulin dimers, we searched for the best value of in the 1.4103–2.1103 pNnm/rad2 range using a ∼140-pNnm/rad2 step. Because the other bending rigidities , are not known, we varied the values of these parameters in the 7.0102–7.0105-pNnm/rad2 range. In our in silico indentation studies [35], the longitudinal bonds were found to be stronger than the lateral bonds, and the binding energies come to 14.9 ± 1.5 kcal/mol (longitudinal bond) versus 6.9 ± 0.4 kcal/mol (lateral bond). On the other hand, the tubulin-tubulin bond energies are unlikely to exceed the 20-kcal/mol value known for the biotin-streptavidin bond, which is considered to be the strongest non-covalent bond occurring in biological systems [55]. Therefore, we varied and in the 10–17-kcal/mol range with a 1 kcal/mole step and in the 5–9-kcal/mol range with a 0.5-kcal/mol step, respectively. These parameters define the depth of potential energy wells for the longitudinal and lateral binding energies. The ranges of longitudinal and lateral interactions are given by and . The typical protein–protein interaction range, over which the strength of interaction becomes negligible, does not exceed 2 nm [26], [56]. Therefore, we spanned the 0.5–2.0-nm interaction range using a 0.5-nm step. We set 2.1103 pN/nm for interaction between the α- and β-tubulin monomers in the same heterodimer [28].
Table 1

Optimized parameters of MADDY force field: the strength of longitudinal and lateral non-covalent bonds between the tubulin monomers and ; characteristic length scale for longitudinal and lateral interactions and ; flexural rigidities of protofilament bending around the -axis (rotation angle ), -axis (angle ), and -axis (angle ); equilibrium angles for rotation around the -axis for GTP- and GDP-bound tubulin dimers and ; equilibrium angles for rotation around the - and -axes, respectively, and ; and spring constant for non-covalent bond between the - and -tubulins in the -tubulin heterodimer (see Materials and methods). The values of and used in the simulations of MT shortening and MT growth are separated by a slash.

Dlong, kcal/molDlat, kcal/molalong, nm−1alat, nm−1Bθ, pN·nm/rad2Bψ, pN·nm/rad2Bφ, pN·nm/rad2θ0GTP, radθ0GDP, radψ0, radφ0, radK, pN/nm
12.0/16.56.7/8.53.04.01.7×1035.6×1055.6×1050.10.2002.1×103
Optimized parameters of MADDY force field: the strength of longitudinal and lateral non-covalent bonds between the tubulin monomers and ; characteristic length scale for longitudinal and lateral interactions and ; flexural rigidities of protofilament bending around the -axis (rotation angle ), -axis (angle ), and -axis (angle ); equilibrium angles for rotation around the -axis for GTP- and GDP-bound tubulin dimers and ; equilibrium angles for rotation around the - and -axes, respectively, and ; and spring constant for non-covalent bond between the - and -tubulins in the -tubulin heterodimer (see Materials and methods). The values of and used in the simulations of MT shortening and MT growth are separated by a slash. The entire parameter space consists of 6×10×10×8×9×4×4 = 691,200 combinations of all possible values of all 11 parameters of the MADDY force field. However, after running test simulations of MT disassembly for the first few hundred sets, we gained intuition on how to correctly tune these parameters, which helped us reduce the number of test simulations from a few hundred thousand to a few thousand tests. For each set, we carried out 10 simulation runs of MT disassembly to attain an acceptable level of statistical significance (Materials and methods). Our first goal was to obtain a reasonable parameterization of the MADDY model that would result in agreement with the experimental 417 nm/s (25 μm/min) rate of MT disassembly [14]. The length–time profiles for a shortening MT fragment are displayed in Fig. 2A (see also Movie S1). The best agreement with the experimental MT disassembly rate was obtained for the longitudinal and lateral bond energies 12.0 kcal/mol and 6.7 kcal/mol, respectively (Table 1). Curling protofilaments almost always form singlets [33], [46], [57], [58]. We used this fact to set the lower and upper bounds for and : 4.9105-6.3105 pNnm/rad2. Our next goal was to further improve the MADDY model parameterization to generate the dynamics of MT assembly (Materials and methods) with the experimental 17–67 nm/s (1–4 μm/min) rate for the 7–15.5 μM tubulin concentration range (the inset in Fig. 2B) [14]. The length–time profiles for a growing MT are displayed in Fig. 2B (see also Movies S2 and S3). The best agreement with the experimental MT growth rate was obtained for the longitudinal and lateral bond energies, 16.5 kcal/mol and 8.5 kcal/mol, respectively (Table 1). The optimized parameters of MADDY force field are displayed in Table 1, which lists the final values of , , , , , , , , , , and .
Fig. 2

Dynamics of MT assembly and disassembly: A) MT length vs time plots for MT shortening (see Table 1 for force-field parameters). Each curve represents the results from a MADDY based simulation run. Snapshots of an MT fragment for the black curve are from four different frames, corresponding to the time points indicated by the circled numbers 1–4. The average disassembly rate is ∼410 nm/s (∼24.6 μm/min; 10 runs; see Movie S1). B) MT length vs time plots for MT growth for 15, 20 and 25 μM concentration of soluble tubulin ( 3 simulation runs for each tubulin concentration). The top inset shows a comparison of the experimental and theoretical rates of MT growth (black squared) for the 15–30-μM range of soluble tubulin concentration. The average growth rate for 15 μM is 20 nm/s (1.2 μm/min); for 20 μM is 30 nm/s (1.8 μm/min); for 25 μM is 35 nm/s (2.1 μm/min); and for 30 μM is 40 nm/s (2.4 μm/min); 10 runs for each concentration; see Movies S2 and S3). The experimental data points are taken from Refs. [33], [42], [43]. The bottom inset shows ∼1-s long dynamics of MT self-assembly from 3 simulation runs for 25-μM tubulin concentration with transient stoppage and acceleration of growth.

Dynamics of MT assembly and disassembly: A) MT length vs time plots for MT shortening (see Table 1 for force-field parameters). Each curve represents the results from a MADDY based simulation run. Snapshots of an MT fragment for the black curve are from four different frames, corresponding to the time points indicated by the circled numbers 1–4. The average disassembly rate is ∼410 nm/s (∼24.6 μm/min; 10 runs; see Movie S1). B) MT length vs time plots for MT growth for 15, 20 and 25 μM concentration of soluble tubulin ( 3 simulation runs for each tubulin concentration). The top inset shows a comparison of the experimental and theoretical rates of MT growth (black squared) for the 15–30-μM range of soluble tubulin concentration. The average growth rate for 15 μM is 20 nm/s (1.2 μm/min); for 20 μM is 30 nm/s (1.8 μm/min); for 25 μM is 35 nm/s (2.1 μm/min); and for 30 μM is 40 nm/s (2.4 μm/min); 10 runs for each concentration; see Movies S2 and S3). The experimental data points are taken from Refs. [33], [42], [43]. The bottom inset shows ∼1-s long dynamics of MT self-assembly from 3 simulation runs for 25-μM tubulin concentration with transient stoppage and acceleration of growth.

MADDY model validation:

We explored the statistics of MT protofilaments’ length and curvature by analyzing the structure of an MT tip from 20 simulation runs (a total of 20 s of biological time) of disassembling MT fragments (Materials and methods). First, we analyzed the statistics of MT protofilaments’ length. The protofilaments were aligned along the MT axis (-axis; Fig. 1A) such that they bend outward starting from the same point on the -axis (∼30 nm on the -axis). The histogram of lengths of splayed MT protofilaments is directly compared with the experimental histogram of the same quantity from in vitro data reported in Refs. [46], [47] and is presented in Fig. 3A. We see that the experimental data and theoretical results for MT protofilament lengths agree very well. The average protofilament length in a disassembling MT come to: 39 ± 17 nm for the MADDY based simulations ( 1280 protofilaments used in analysis), versus 40 ± 15 nm from experiments ( 72 protofilaments), which is a very good agreement. The two broad histograms also agree in that both are skewed to the right. Next, we analyzed the statistics of MT protofilaments’ curvature. The scatterplots of the protofilaments’ profiles, i.e. the length values measured along the longitudinal -axis as a function of the distance from the MT end, from the MADDY based simulations are compared to those from experimental measurements [43], [44] in Fig. 3A, which shows very good agreement. The MT protofilaments are capable of bending outward as single structures (singlets), as dimeric structures (doublets), and occasionally as trimeric structure bundles (triplets), where the lateral interactions maintain the nearest neighbor associations along the doublet and triplet protofilaments (Fig. 3B). This agrees with experiment [33], [46], [57], [58], [59].
Fig. 3

Length statistics of curled protofilaments: A) Experimental (dashed bars) vs theoretical (solid bars) histogram-based estimates of the distribution of lengths of the curled protofilaments as observed in ‘ram’s horn’ structures. The inset shows a comparison of the colored dot tracings from the curled protofilaments observed in vitro and in silico (experimental data were taken from Ref. [47] based on the experiment reported in Ref. [46]). B) Structural snapshots showing transient conformations of shortening MT tip. Most of the protofilaments are singlets (single MT protofilaments), albeit doublets (i.e. two laterally associated protofilaments) were also detected in half of the simulation runs ( 50 runs). C) Schematic of the simulation setup used the simulations of depolymerizing MTs (colored cylinder) coupled to the Dam1 ring (white ring) for the calculation of stalling forces. In these simulations, is the force generated by depolymerizing MT when PFs push on the Dam1 ring and is the opposing stalling force due to the cantilever tip displacement. The equilibrium is reached when the MT disassembly comes to a stall, i.e., . D) Histogram based estimate (bars) of the distribution of stalling forces , generated by a depolymerizing MT on the Dam1 ring, from 80 simulation runs. Freedman-Diaconis rule was used to select the bin size in panels A and D[94].

Length statistics of curled protofilaments: A) Experimental (dashed bars) vs theoretical (solid bars) histogram-based estimates of the distribution of lengths of the curled protofilaments as observed in ‘ram’s horn’ structures. The inset shows a comparison of the colored dot tracings from the curled protofilaments observed in vitro and in silico (experimental data were taken from Ref. [47] based on the experiment reported in Ref. [46]). B) Structural snapshots showing transient conformations of shortening MT tip. Most of the protofilaments are singlets (single MT protofilaments), albeit doublets (i.e. two laterally associated protofilaments) were also detected in half of the simulation runs ( 50 runs). C) Schematic of the simulation setup used the simulations of depolymerizing MTs (colored cylinder) coupled to the Dam1 ring (white ring) for the calculation of stalling forces. In these simulations, is the force generated by depolymerizing MT when PFs push on the Dam1 ring and is the opposing stalling force due to the cantilever tip displacement. The equilibrium is reached when the MT disassembly comes to a stall, i.e., . D) Histogram based estimate (bars) of the distribution of stalling forces , generated by a depolymerizing MT on the Dam1 ring, from 80 simulation runs. Freedman-Diaconis rule was used to select the bin size in panels A and D[94]. We also analyzed the force-generating properties of depolymerizing MTs. A disassembling MT must generate sufficiently strong forces to pull apart chromosomes via kinetochore-binding protein complexes such as the Dam1 in yeast. These forces are estimated in vivo to be as high as ∼20 pN with 50% of experiments showing forces <10 pN [19]. We carried out 80 simulation runs to explore the statistics of stalling forces of disassembling MTs upon the Dam1 ring (Materials and methods). The simulation setup is displayed in Fig. 3C and the histogram of equilibrium stalling forces from a single depolymerizing MT is shown in Fig. 3D. With the flexural rigidity 1.7103 pNnm/rad2 used, a disassembling MT polymer produces an average stalling force of 8.0 0.6 pN ( 80) with the forces reaching as high as 10 pN (Fig. 3D). Thus, the MADDY model describes the force-generating properties of MTs coupled to the Dam1 complex reasonably well. These multiple points of agreement between the results of MADDY based simulations and experimental data, namely the rates of MT growth and shortening, the structure of MT tip, statistics of MT protofilaments’ length and curvature, and the force-generating properties of depolymerizing MTs, validate the MADDY model.

Dynamics of MT disassembly and assembly in silico:

Next, we explored the dynamics of MT growth and shortening at the tubulin-monomer level of detail. We first focused on the dynamics of MT disassembly, carrying out a total of 10 simulation runs of MT shortening using the longitudinal bond with the strength 12.0 kcal/mol and the lateral bond with the strength 6.7 kcal/mol (Table 1). Distance vs time curves for MT disassembly are displayed in Fig. 2A. Overall, MT disassembly is non-monotonic and highly variable with phases of shortening followed by episodes of transient arrest of MT shortening. The duration of stoppage times and transient shortening rates are positively correlated. The longer duration MT shortening-arrested episodes (regions with a near-zero slope in Fig. 2A) are followed by the more rapid disassembly periods (regions with a large negative slope), whereas the shorter length arrested fragments correspond to slower disassembly periods. Hence, these results imply that MT disassembly is a cooperative process. We also observed outward curled “rams’ horns” in agreement with the prior studies [46], [47] (Fig. 2A, see also Movie S1). The presence of “rams’ horns” structures at the end of shortening MT polymers is evident in the structure snapshots 2–4 shown in Fig. 2A and Fig. 3B. This fraying behavior of mostly individual protofilaments results from the lateral bonds being weaker than the longitudinal bonds (Table 1), and also agrees with the previous studies [28], [41]. When the MT protofilaments are detached from each other over longer lengths (long “rams’ horns”) – the most metastable configuration of an MT tip taking a long time to form, the disassembly rate and extent are the largest. From model parameterization, flexural rigidities and affect the frequency of observing doublets and triplets. The higher the values of these quantities, the higher the propensity for singlets’ formation in the simulations. Next, we explored the dynamics of MT growth. We performed a total of 10 simulations of MT assembly in the 15–30 μM range of soluble tubulin concentration, where the binding energies for longitudinal and lateral bonds were set to 16.5 kcal/mol and 8.5 kcal/mol, respectively (Table 1). MT length vs time profiles are displayed in Fig. 2B. Although the in silico rate of MT growth as a function of tubulin concentration has a lower slope than the experimental dependence (top inset in Fig. 2B), given the bead-per-monomer nature of the MADDY model the agreement is very good, especially at lower tubulin concentrations. Similar to MT disassembly behavior, the MT growth is non-monotonic and it proceeds with a variable growth rate, with phases of growth followed by frequent stoppage times (bottom inset in Fig. 2B). To summarize, achieving good agreement with the experimental rates of MT assembly and disassembly (Fig. 2) led us to next explore additional aspects of MT structure and dynamics described below.

MT lattice defects, lattice vacancies, and MT tip structure

Next, we examined dynamic remodeling of the tip of a growing MT. Snapshots of an MT tip from transient MT structures are displayed in Fig. 4 (see also Movie S4). A tip of growing MT is formed by outwardly curved single protofilaments, which gradually straighten when the number of laterally coupled protofilaments increases. We observed formation of “lattice defects”, i.e. erroneous attachments formed by tubulin dimers on the MT lattice, and “lattice vacancies”, i.e. empty lattice sites lacking tubulin dimers (Fig. 4 and Movie S4). These lattice defects and lattice vacancies occur dynamically; they form and disappear over time when locally some protofilaments elongate at a more rapid rate than others. These dynamic error features are more frequently observed at higher tubulin concentrations. The lattice defects and vacancies are transient features of MT lattice structure that accumulate within the MT wall and subsequently self-correct over time. A large number of accumulated improper attachments and empty lattice sites results in a transient slowdown or a complete stoppage of MT growth, but when these lattice defects and vacancies are corrected dynamically, through small magnitude, short term fluctuations between assembly and disassembly, normal MT growth resumes.
Fig. 4

MT assembly and growing MT tip structures: Snapshots of growing MTs from 8-s simulation runs for 15 μM (panel A) and 30 μM (panel B) concentration of soluble tubulin, taken at the times points: 0.3, 1.7, 3.1, 4.2, 6.0, and 7.7 s. The insets show detailed views of transient structure defects: from the inter-protofilament closure error in the MT lattice and resulting in site vacancy and oligomer erroneously attaching to the MT lattice at a site of vacancy (panel A), and from the assembly error resulting in formation of large vacant sites lacking tubulin dimers before and the same site after dynamic error correction occurs (panel B).

MT assembly and growing MT tip structures: Snapshots of growing MTs from 8-s simulation runs for 15 μM (panel A) and 30 μM (panel B) concentration of soluble tubulin, taken at the times points: 0.3, 1.7, 3.1, 4.2, 6.0, and 7.7 s. The insets show detailed views of transient structure defects: from the inter-protofilament closure error in the MT lattice and resulting in site vacancy and oligomer erroneously attaching to the MT lattice at a site of vacancy (panel A), and from the assembly error resulting in formation of large vacant sites lacking tubulin dimers before and the same site after dynamic error correction occurs (panel B). Next, we examined the MT tip structure, since ideas regarding MT dynamic instability behavior are often thought to originate in properties of the tip or tip-proximal region of MT filament. The uneven length of an MT tip, owing to disparity in the rates of growth of individual protofilaments, are reflected in the probability distribution of extensions of the tip of a growing MT. We defined a tip extension as a length-difference between the shortest and the longest protofilaments at the MT tip. The histogram-based estimates of the probability distributions of MT tip extensions are compared in Fig. 5A for 15 μM and 30 μM tubulin concentration. At higher 30 μM tubulin concentration, the MT-tip extension is more variable because the distribution has a longer tail compared to that for the lower 15 μM tubulin concentration. The average length of an MT tip extension is ∼65 nm (65.6 ± 30.3 nm) for 15 μM tubulin solution vs ∼80 nm (80.5 ± 49.6 nm) for 30 μM solution (Fig. 5A). Representative trajectories of tip extension for 15 μM and 30 μM tubulin solution are compared in the inset to Fig. 5A. In the simulations, we observed the curved tips reaching ∼160-nm (∼20 tubulin dimers) for 15 μM tubulin solution and to ∼245-nm in length (∼30 tubulin dimers) for 30 μM tubulin solution. These maximal extensions agree with experimental findings [59], which show that the average tip extension is 180 nm long and that extensions shorter than 75 nm represent ∼66% of the whole population. We analyzed transient conformations of the MT tip. The most typical MT tip structures are compared for 15 μM and 30 μM tubulin solutions in Fig. 5B. At the lower 15 μM concentration of soluble tubulin, the MT tip is more blunt, whereas at the higher 30 μM tubulin concentration the MT is more sharp. Hence, the shape of the MT tip is tubulin concentration dependent.
Fig. 5

Evolution of protofilament extensions at the growing MT tip: A) Histogram-based estimates of the distributions of tip extensions for 15 μM (solid bars) and 30 μM (dashed bars) concentration of soluble tubulin defined as the length-difference between the shortest and the longest protofilaments at the end of the growing MT. The Freedman-Diaconis rule was used to select the bin size [94]. The average extension (mean ± standard deviations) comes to 65.6 ± 30.3 nm for 15 μM tubulin concentration ( 970 data points) and to 80.5 ± 49.6 nm for 30 μM tubulin concentration ( 780). The inset shows the temporal evolution of MT tip extension averaged over 10 simulation runs for 15 μM and 30 μM tubulin concentrations. B) Conformations of a growing MT tip with long sharp extensions curving outward. In the simulations, the curved tips were observed to grow up to 50 nm in length (∼6–7 dimers).

Evolution of protofilament extensions at the growing MT tip: A) Histogram-based estimates of the distributions of tip extensions for 15 μM (solid bars) and 30 μM (dashed bars) concentration of soluble tubulin defined as the length-difference between the shortest and the longest protofilaments at the end of the growing MT. The Freedman-Diaconis rule was used to select the bin size [94]. The average extension (mean ± standard deviations) comes to 65.6 ± 30.3 nm for 15 μM tubulin concentration ( 970 data points) and to 80.5 ± 49.6 nm for 30 μM tubulin concentration ( 780). The inset shows the temporal evolution of MT tip extension averaged over 10 simulation runs for 15 μM and 30 μM tubulin concentrations. B) Conformations of a growing MT tip with long sharp extensions curving outward. In the simulations, the curved tips were observed to grow up to 50 nm in length (∼6–7 dimers).

Dynamics of MT assembly-disassembly in silico

Next, we explored the dynamics of MT polymerization-depolymerization (Materials and methods). We carried out 6 sets of simulations – Case Studies 1–6, in which we varied the concentration of soluble tubulin dimers and reaction rate constant for GTP hydrolysis (5 simulation runs for each Case Study; see Table 2). For 25 μM tubulin concentration, we set the GTP hydrolysis rate constant to 2 s−1 (Case Study 1), 4 s−1 (Case Study 2), and 6 s−1 (Case Study 3). For 250 μM tubulin concentration, we used 5 s−1 (Case Study 4), 8 s−1 (Case Study 5), and 10 s−1 (Case Study 6). The experimental value is ∼0.5–1.0 s−1 [28], [60], but in our experiments in silico we increased  ∼4–40 fold. Experimental concentration of soluble tubulin is 10–30 μM, but in our simulations we increased tubulin concentration 1–10-fold (Table 2). Carrying out Case Studies 1–6 at higher tubulin concentration and larger GTP hydrolysis rate constant enabled us to access many transitions between different kinetic states, i.e., from MT growth to MT shortening and from MT shortening back to MT growth, and to observe the MT catastrophe and rescue.
Table 2

MT Feature analysis and best prediction models: Feature analysis performed on the numerical output from 5 simulation runs for each Case Study 1–6. Case Studies 1–3 correspond to 25 μM concentration of soluble tubulin dimers and the GTP hydrolysis rate constants of 2, 4, and 6 s−1. Case Studies 4–6 correspond to 250 μM concentration of soluble tubulin dimers and the hydrolysis rate constants of 5, 8, and 10 s−1. Reported for each Case Study are the total number of data points , the number of catastrophes and rescues observed (separated by a slash). Summarized are Machine Learning models that have resulted in the highest accuracy compared to all other classification and ensemble methods used.

Case studyNtrajNdpNcat/Nresαβ-tubulin concentration, μMGTP hydrolysis rate constant khyd, s−1Best modelAccuracy, %
15324916/16252RF74.0
2532779/7254RF/XGB87.1
3532469/6256XGB76.1
45171811/102505BAG87.3
55231911/82508XGB77.2
65278119/1725010XGB76.9
MT Feature analysis and best prediction models: Feature analysis performed on the numerical output from 5 simulation runs for each Case Study 1–6. Case Studies 1–3 correspond to 25 μM concentration of soluble tubulin dimers and the GTP hydrolysis rate constants of 2, 4, and 6 s−1. Case Studies 4–6 correspond to 250 μM concentration of soluble tubulin dimers and the hydrolysis rate constants of 5, 8, and 10 s−1. Reported for each Case Study are the total number of data points , the number of catastrophes and rescues observed (separated by a slash). Summarized are Machine Learning models that have resulted in the highest accuracy compared to all other classification and ensemble methods used. For Case Studies 1 and 6, the MT length vs time trajectories from 4 runs of the MADDY based simulations of MT assembly-disassembly are displayed in Fig. 6, which shows the results of long (∼1–4 s) simulations of the MT dynamics in all four kinetics states – assembly, catastrophe, shortening, and rescue. We detected a total of 34 MT catastrophes and 21 MT rescues for 25 μM tubulin solution, and 41 MT catastrophes and 35 MT rescues for 250 μM tubulin solution (Table 2). We posed a question: what structural and energy properties of MT filament in general and MT tip in particular best describe the MT kinetic states: assembly, catastrophe, shortening, and rescue? We surveyed the following 14 quantitative features: MT length (feature 1), number of hydrolyzed αβ-tubulin dimers in MT lattice (feature 2), total energy of lateral interactions in MT lattice (feature 3), total energy of longitudinal interactions in MT lattice (feature 4), energy of lateral interactions in MT tip (feature 5), energy of longitudinal interactions in MT tip (feature 6), average longitudinal curvature (feature 7), average lateral curvature (feature 8), MT tip length (feature 9), MT tip width (feature 10), average number of longitudinal interactions per protofilament in MT lattice (feature 11), average number of lateral interactions per helical pitch in MT lattice (feature 12), energy of lateral interactions required to complete MT to full cylinder (feature 13), and energy of longitudinal interactions required to complete MT to full cylinder (feature 14). These 14 features were chosen based upon current ideas of dynamic instability concerning the control of MT growth and shortening. The time-dependent profiles for all 14 features listed above for Case Studies 1 and 6 are presented in Figs. S1 and S2.
Fig. 6

Dynamics of MT assembly-disassembly: MT length vs time profiles for Case Study 1 for soluble tubulin concentration 25 μM and GTP hydrolysis rate constant 2 s−1 (panel A), Case Study 6 for tubulin concentration 250 μM and 10 s−1 (panel B), Case Study 3 for tubulin concentration 25 μM and 6 s−1 (panel C), and Case Study 5 for tubulin concentration 250 μM and 8 s−1 (panel D). The profiles are shown for 4 independent runs of MT assembly-disassembly for each Case Study. Each profile was divided into portions that correspond to the following kinetic states (classes): MT growth (shown in blue color), MT catastrophe (red), MT shortening (yellow), and MT rescue (cyan); see Materials and methods. In panels A and B, snapshots numbered 1–5 display the time moments of MT for growth (1), MT catastrophe (2), MT shortening (3), MT rescue (4) and MT growth after the rescue (5), as observed in reaction volume. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Dynamics of MT assembly-disassembly: MT length vs time profiles for Case Study 1 for soluble tubulin concentration 25 μM and GTP hydrolysis rate constant 2 s−1 (panel A), Case Study 6 for tubulin concentration 250 μM and 10 s−1 (panel B), Case Study 3 for tubulin concentration 25 μM and 6 s−1 (panel C), and Case Study 5 for tubulin concentration 250 μM and 8 s−1 (panel D). The profiles are shown for 4 independent runs of MT assembly-disassembly for each Case Study. Each profile was divided into portions that correspond to the following kinetic states (classes): MT growth (shown in blue color), MT catastrophe (red), MT shortening (yellow), and MT rescue (cyan); see Materials and methods. In panels A and B, snapshots numbered 1–5 display the time moments of MT for growth (1), MT catastrophe (2), MT shortening (3), MT rescue (4) and MT growth after the rescue (5), as observed in reaction volume. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Machine Learning: Identifying features that distinguish microtubules’ growth, catastrophe, shortening, and rescue in MTs

Next, we employed statistical modeling to identify structural and energetics features of MT filaments described above capable of distinguishing between the four kinetic states of MT dynamics (assembly, catastrophe, shortening, and rescue); see Materials and methods. Classifier methods such as Random Forests (RF), Gradient Boosting (XGB), and Support Vector Machines (SVMs) have been used in the following biological applications i) to learn the features that distinguish protein sub-cellular localization [61], [62], ii) to characterize biological activities of small molecules [63], iii) to predict molecular properties and determine binding drug partners [63], [64], and iv) to predict hot spots on protein interfaces that control the binding free energy [65]. For each Case Study 1–6, we used the distributions of the 14 features from the output of MADDY based simulations (coordinate and energy files) and the information about the four kinetic states. The distributions of these features are displayed in Fig. 7 for Case Study 2. The first step was to determine the prediction accuracy of several classification and ensemble methods in order to find the best method for distinguishing the four kinetic states of MT dynamics. The baseline accuracy, which we evaluated using the Dummy Classifier method, ranged from 33 to 59% for all Case Studies 1–6 based on the size of the largest class, which was typically the state of growth or the state of catastrophe. Since these two classes represented a large percentage of the data compared to the other kinetic states (shortening and rescue), we applied the SMOTE technique to account for the uneven spread of data by replicating samples from the less populated classes [62]. The classifier methods (LDA, SVM, LR, KNN), with and without SMOTE, gave a range of prediction successes. Compared to the baseline accuracy for each Case Study, LR without SMOTE performed the worst (43.1–65.1% accuracy), while SVM and KNN with SMOTE gave the highest accuracy (64.2–84.1%). Next, we used ensemble methods (BAG, RF, XGB), which incorporate the grouping of multiple decision trees. Application of these methods resulted in the highest accuracies out of all methods. The results obtained for Case Study 2 are displayed in Fig. 8 for LDA, SVM, BAG, RF, ET, and XGB (for Case Studies 1, and 3–6 see Figs S3-S7). The accuracy level attained with RF, BAG, and XGB ranged between 74.0 and 87.3% (Table 2). The best models with their associated accuracy level for each Case Study 1–6 are accumulated in Table 2.
Fig. 7

Distributions of numerical values of features 1–14 for Case Study 2: Plots are for each of the following 14 features in statistical inference: (1) MT length , (2) number of hydrolyzed αβ-tubulin dimers in MT lattice , (3) total energy of lateral interactions in MT lattice , (4) total energy of longitudinal interactions in MT lattice , (5) energy of lateral interactions in MT tip , (6) energy of longitudinal interactions in MT tip , (7) average longitudinal curvature , (8) average lateral curvature , (9) MT tip length , (10) MT tip width , (11) average number of longitudinal interactions per protofilament in MT lattice , (12) average number of lateral interactions per helical pitch in MT lattice , (13) energy of lateral interactions required to complete MT to full cylinder , and (14) energy of longitudinal interactions required to complete MT to full cylinder . These 14 features were chosen based upon current ideas of dynamic instability concerning the control of MT growth and shortening.

Fig. 8

Machine learning based classification of MT features for Case Study 2: A) Box and whisker plots for comparison of the prediction accuracy for various classification and ensemble methods (LDA, SVM, BAG, RF, ET, and XGB). The minimum value at the bottom of each box represents the first quartile. The third quartile is at the top of the box, which ends at the maximum value. Green lines indicate the median for each method, while red arrowheads indicate the mean values. B) Feature importance plots obtained with the XGB method ranking the features that were the most and the least important at predicting the kinetic states using the mean absolute SHAP values. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Distributions of numerical values of features 1–14 for Case Study 2: Plots are for each of the following 14 features in statistical inference: (1) MT length , (2) number of hydrolyzed αβ-tubulin dimers in MT lattice , (3) total energy of lateral interactions in MT lattice , (4) total energy of longitudinal interactions in MT lattice , (5) energy of lateral interactions in MT tip , (6) energy of longitudinal interactions in MT tip , (7) average longitudinal curvature , (8) average lateral curvature , (9) MT tip length , (10) MT tip width , (11) average number of longitudinal interactions per protofilament in MT lattice , (12) average number of lateral interactions per helical pitch in MT lattice , (13) energy of lateral interactions required to complete MT to full cylinder , and (14) energy of longitudinal interactions required to complete MT to full cylinder . These 14 features were chosen based upon current ideas of dynamic instability concerning the control of MT growth and shortening. Machine learning based classification of MT features for Case Study 2: A) Box and whisker plots for comparison of the prediction accuracy for various classification and ensemble methods (LDA, SVM, BAG, RF, ET, and XGB). The minimum value at the bottom of each box represents the first quartile. The third quartile is at the top of the box, which ends at the maximum value. Green lines indicate the median for each method, while red arrowheads indicate the mean values. B) Feature importance plots obtained with the XGB method ranking the features that were the most and the least important at predicting the kinetic states using the mean absolute SHAP values. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Because the application of RF and XGB methods resulted in the best or next to the best prediction accuracy for the majority of Case Studies, we next analyzed the quality of RF and XGB based predictions. To determine the Case Studies, for which the classification of the kinetic states is the most robust, we extracted the confusion matrices, which label the class chosen for each prediction against the true (known) class (Materials and methods). Fig. 9 shows the confusion matrices for XGB for all Case Studies 1–6 (confusion matrices for RF are displayed in Fig. S8). Each matrix compares the predictions to their correct classes, and, for each matrix entry, we report the number of predictions, the total number of samples per class, and the percentage of predictions per total number of samples. Overall, we found that the population of a state for any Case Study correlates with the goodness of prediction from the classification methods. For example, we find a high precision in predicting the MT growth or MT catastrophe, depending on which class had the largest population in the respective case. The MT rescue and MT shortening were repeatedly found to be difficult to predict. Case Studies 2 and 4 received the best comprehensive classification when both XGB or RF were used. Next, we calculated the Receiver Operating Characteristic (ROC) curves and the Precision-Recall (PR) curves as performance measures (Materials and methods). Fig. 10 and Fig. S9 display the PR curves for Case Study 2 and Case Study 4, respectively. The ROC curves for Case Study 2 and Case Study 4 are shown in Figs. S10 and S11, respectively. The corresponding values of area under curve (AUC values) are accumulated in Table 3 for the PR curves and in Table S1 for the ROC curves. We constructed the PR and ROC curves for two datasets (Case Study 2 and Case Study 4) to make a comparison between RF, XGB, and SVM methods. Because of the imbalance among the populations of the four states, the results obtained from the PR curves were made a priority, since the PR curves focus on the smaller sets of true positives. The AUC values from the PR plots, which we used to quantify the overall prediction accuracy level, showed that RF and XGB methods classified the kinetic states better than SVM (Table 3). For example, RF and XGB produced excellent classification for the MT growth and MT catastrophe (>0.95) for several Case Studies, with XGB having slightly higher AUC values overall. Also, from the confusion matrices XGB predicted all classes (i.e. kinetic states) more consistently compared to RF. Therefore, we selected XGB for identification of the most important (critical) features for class (kinetic state) discrimination.
Fig. 9

Confusion Matrices obtained using XGBoost for all Case Studies 1–6: Confusion matrices describing the number of predictions made for each of the 5 k-folds (repeated 3 times) using XGB compared to every predictions’ true class. Values in the center of each box indicate the number of predictions. In parenthesis, the first value refers to the sum of the number of samples for a class found in each k-fold test set. The second value is the percent of predictions made for a certain class compared to the overall number of true class samples. The heatmap color range is based on this relative percentage for Case Study 1 (A), Case Study 2 (B), Case Study 3 (C), Case Study 4 (D), Case Study 5 (E), and Case Study 6 (F).

Fig. 10

Classification quality analysis for Case Study 2 with Precision-Recall curves: PR curves comparing the overall quality value of classification attained using Support Vector Machines with RBF kernel (red curves), Random Forest (blue curves), and XGBoost (green curves) for MT catastrophe (panel A), MT growth (panel B), MT rescue (panel C), and MT shortening (panel D). The closer the curve is to the upper right-hand corner, the better is the model prediction. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Table 3

Area under the curve (AUC) for Precision-Recall (PR) curves for Case Studies 2 and 4: Shown are the AUC values obtained using the SVM (with RBF kernel), RF, and XGB methods. The corresponding PR curves are displayed in Fig. 10 for Case Study 2 and in Fig. S9 for Case Study 4 (AUC value of 1/0 represents a perfect/poor classification).


Case Study 2
Case Study 4
Kinetic StateSVM (RBF)RFXGBSVM (RBF)RFXGB
Catastrophe0.910.950.950.430.720.70
Growth0.860.960.960.910.960.96
Rescue0.640.900.910.340.640.63
Shortening0.600.840.860.560.730.74
Confusion Matrices obtained using XGBoost for all Case Studies 1–6: Confusion matrices describing the number of predictions made for each of the 5 k-folds (repeated 3 times) using XGB compared to every predictions’ true class. Values in the center of each box indicate the number of predictions. In parenthesis, the first value refers to the sum of the number of samples for a class found in each k-fold test set. The second value is the percent of predictions made for a certain class compared to the overall number of true class samples. The heatmap color range is based on this relative percentage for Case Study 1 (A), Case Study 2 (B), Case Study 3 (C), Case Study 4 (D), Case Study 5 (E), and Case Study 6 (F). Classification quality analysis for Case Study 2 with Precision-Recall curves: PR curves comparing the overall quality value of classification attained using Support Vector Machines with RBF kernel (red curves), Random Forest (blue curves), and XGBoost (green curves) for MT catastrophe (panel A), MT growth (panel B), MT rescue (panel C), and MT shortening (panel D). The closer the curve is to the upper right-hand corner, the better is the model prediction. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Area under the curve (AUC) for Precision-Recall (PR) curves for Case Studies 2 and 4: Shown are the AUC values obtained using the SVM (with RBF kernel), RF, and XGB methods. The corresponding PR curves are displayed in Fig. 10 for Case Study 2 and in Fig. S9 for Case Study 4 (AUC value of 1/0 represents a perfect/poor classification). Next, when using XGB for Case Studies 2 and 4 we applied the SHAP analysis [66] (Materials and methods) to determine which factors drive the prediction. Namely, SHAP enables an in-depth view of how the XGB prediction was made by calculating the Shapley values for each feature, which can take both positive and negative values. SHAP determines whether a feature helps with the prediction of a class based on the range and identity of the feature measurements that have positive SHAP values. Using the mean absolute value of all calculated Shapley values, we determined the importance of each feature for MT growth, catastrophe, shortening, and rescue. The plots of feature importance are displayed in Fig. 8 for Case Study 2, and in Figs. S3-S7 for the other Case Studies 1, and 3–6. This method of identifying the most important features was chosen to allow for a consistent and accurate assessment of feature contribution [67]. The most important parts of the analysis are summarized in Table 4, which lists the top most and least important predictors for Case Studies 2 and 4. For both low and high tubulin concentrations, the MT length (feature 1) and average longitudinal curvature (feature 7) were among the top important predictors, while the MT tip length (feature 9) and average number of longitudinal interactions per protofilament in MT lattice (feature 11) were among the least important features. Other features were also found to be important for Case Study 2, which include the MT tip width (feature 10), total energy of longitudinal interactions in MT lattice (feature 4), the energy of longitudinal interactions required to complete MT to full cylinder (feature 14) and the energy of lateral interactions required to complete MT to full cylinder (feature 13). For Case Study 4, we found that the number of hydrolyzed αβ-tubulin dimers in MT lattice (feature 2), average number of lateral interactions per helical pitch in MT lattice (feature 12), total energy of lateral interactions in MT lattice (feature 3), and energy of longitudinal interactions in MT tip (feature 6) are the features that had the largest impact in distinguishing between the four kinetic states.
Table 4

Feature importance comparison: Feature importance was calculated based on the results obtained with XGBoost (XGB) methods. The top six most and least important features for each of the Case Studies 2 and 4 are listed with bolded high importance features and italicized low importance features to indicate when the feature appeared in XGB method.

Case StudyFeatures of High ImportanceFeatures of Low Importance
2L(1), κlong(7), w (10), Ulong (4), Ulatadd (13), Ulongadd (14)ulat (5), ulong (6), l(9), nlong(11), nlat (12), Ulat (3)
4nhyd (2), L(1), nlat (12), κlong(7), Ulat (3), ulong (6)l(9), nlong(11), Ulongadd (14), Ulatadd(13), ulat (5), Ulong (4)
Feature importance comparison: Feature importance was calculated based on the results obtained with XGBoost (XGB) methods. The top six most and least important features for each of the Case Studies 2 and 4 are listed with bolded high importance features and italicized low importance features to indicate when the feature appeared in XGB method. A highly informative output of the SHAP analysis is represented by beeswarm plots, which are depicted for each kinetic state for Case Studies 2 and 4 in Figs. S12 and S13, respectively. Each plot is a dense summary of each feature’s contribution in classification of a state by visualizing the distribution and sign of Shapley values for all observations, while also indicating the magnitude of the feature measurements for the predicted class (see Materials and methods). The order of features agrees with the overall ranking seen in Fig. 8 and Fig. S5 described above. By focusing on the magnitude (color) of the data distribution for a feature in the region corresponding to positive Shapley values in a beeswarm plot of a given class, one can assign the importance of either high or low values of a feature in the prediction of the kinetic class. For example, from the beeswarm plots we found that a longer MT length in Case 2 is predictive for catastrophe, while shorter lengths would suggest either growth or rescue. To summarize the main trends found in the beeswarm plots that characterize the kinetic states at low and high soluble tubulin concentrations, we calculated the relative averages of the top six important features using the subset of data found to have positive SHAP values (Materials and methods) for MT growth, catastrophe, shortening, and rescue for Case Studies 2 and 4 (Fig. 11). We defined the relative average for a feature to be the ratio between the mean value and the maximum mean value for that feature for all four kinetic states, with the maximum value shown in Table S2.
Fig. 11

Summary of average values for top important features most capable of distinguishing between kinetic states of MT dynamics: Profiled are the relative averages (mean value for a given state divided by the maximal mean value among data that have positive SHAP values, i.e., −0.1) of the most important features identified using the XGBoost method for the MT catastrophe (red data points and lines), MT growth (blue), MT rescue (cyan), and MT shortening (yellow) for Case Study 2 (25 μM tubulin concentration; panel A), and for Case Study 4 (250 μM tubulin concentration; panel B). The mean values and standard deviations are listed in Table S2. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Summary of average values for top important features most capable of distinguishing between kinetic states of MT dynamics: Profiled are the relative averages (mean value for a given state divided by the maximal mean value among data that have positive SHAP values, i.e., −0.1) of the most important features identified using the XGBoost method for the MT catastrophe (red data points and lines), MT growth (blue), MT rescue (cyan), and MT shortening (yellow) for Case Study 2 (25 μM tubulin concentration; panel A), and for Case Study 4 (250 μM tubulin concentration; panel B). The mean values and standard deviations are listed in Table S2. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) For 25 μM tubulin concentration (Case Study 2), the MT catastrophe corresponds to the longest MT length (), largest average longitudinal curvature (), largest energy of lateral interactions required to complete MT to full cylinder () and smallest energy of longitudinal interactions required to complete MT to full cylinder (). MT shortening was characterized by having the largest MT tip width () and lowest total energy of longitudinal interactions in MT lattice () compared to the other kinetic states. MT growth had exactly the opposite trends compared to MT catastrophe, i.e. the shortest MT length (), smallest average longitudinal curvature (), and smallest energy of lateral interactions required to complete MT to full cylinder (). Finally, MT rescue had the smallest MT width () and highest total energy of longitudinal interactions in MT lattice (). For 250 μM soluble tubulin concentration (Case Study 4), MT catastrophe is characterized by the largest number of hydrolyzed αβ-tubulin dimers in MT lattice (), longest MT length (), highest average number of lateral interactions per helical pitch in MT lattice (), and most stable total energy of lateral interactions in MT lattice (), making it clearly distinguishable from the other kinetic states. MT growth is described by the smallest number of hydrolyzed αβ-tubulin dimers in MT lattice (), lowest average number of lateral interactions per helical pitch in MT lattice (), lowest average longitudinal curvature (), most stable total energy of lateral interactions in MT lattice (), and most stable energy of longitudinal interactions in MT tip (). We note that the ability of the top features in Case Study 4 to classify MT growth better than in Case Study 2 also agrees with the results from confusion matrices (Fig. 9B and 9D). MT rescue is distinguished from the other states by having the shortest MT length () and least stable energy of longitudinal interactions in MT tip (). MT shortening is characterized by the largest average longitudinal curvature ().

Discussion and conclusion

The various important functional roles played by MTs in eukaryotic cells necessitate rigorous quantitative analysis of their physico-chemical, dynamic, and energetic properties. Detailed understanding of the mechanisms of catastrophe and rescue, two of the four kinetic states observed during the dynamic instability behavior exhibited by MTs, is impeded by a lack of quantitative knowledge about the thermodynamic, biomechanical, and chemical factors which control the onset of transitions between these kinetic states associated with MT assembly and disassembly. Here, we developed the molecularly detailed, realistic, computational 3D model of Microtubule Assembly and Disassembly DYnamics (MADDY; Fig. 1). The atomic-level details of the α- and β-tubulins and of the larger αβ-tubulin complexes they form were coarse-grained into the α- and β-tubulin based MADDY model, which was then used to explore the long timescale dynamics of MT growth and shortening. We utilized this coarse-grained modeling approach because the physico-chemical properties and thermodynamic characteristics of the αβ-tubulin dimers, symmetries of the αβ-tubulins packing into an MT lattice, as well as the geometry and native topology of MT filaments, rather than the atomic-level detail of the α- and β-tubulin structure, are what drives the dynamics of cycles of MT growth and shortening. Similar multiscale modeling approaches have been used by researchers to explore the dynamics of complex biomolecular assemblies such as MT filaments and virus shells on very long timescales spanning several decades of biological time [35], [68], [69], [70]. In addition, when the coarse-grained modeling is implemented on Graphics Processing Units (GPUs), the long second timescales become accessible in computer simulations [49]. By taking advantage of the computational acceleration on a GPU, we were able to carry out detailed exploration of MT dynamics on a long time scale spanning tens of seconds of biological time. The MADDY model uniquely combines the implicit description of chemical kinetics (i.e. GTP-to-GDP hydrolysis) described using the Monte Carlo (MC) algorithm, and the explicit description of the dynamics of MT self-assembly and disassembly propagated forward in time using the Langevin Dynamics (LD) in the overdamped limit (Brownian diffusion). The model was mapped into a standard CUDA code, fully implemented on a GPU. Numerical routines for the generation of (pseudo)random numbers (Hybrid Taus method) for MC and LD are described in the previous publications [50], [71]. We implemented the first order integration scheme (McCammon algorithm [72]) to propagate forward in time the Langevin equations of motion for all particles (αβ-tubulin heterodimers). In Langevin Dynamics, the particle–particle interactions e.g. excluded volume interactions, stretching and bending of αβ-tubulin dimers forming the MT lattice, and formation and dissociation of the lateral and longitudinal bonds, are the computational bottleneck. These interactions are described by the same empirical potential energy function (force field; see Eqs. (1), (2), (3), (4), and the MT dynamics is obtained by solving numerically the same Langevin equations of motion for all αβ-tubulin dimers. Therefore, when running Langevin Dynamics on a GPU, the MADDY program executes the same operation (e.g. generation of random forces, evaluation of molecular forces, integration of equations of motion) for all particles at the same time. By the same token, the Monte Carlo algorithm estimates the probability of GTP hydrolysis for all αβ-tubulin dimers in the MT lattice and rejects or accepts the MC move for all αβ-tubulin dimers at the same time. Therefore, when mapping the MADDY model into the CUDA code, we took advantage of the GPU architecture to accelerate both the LD and MC components of numerical modeling. The contemporary GPUs helps speed up MADDY based simulations ∼50-fold compared to the CPU based implementation. For example, it takes ∼7 days of wall-clock time to generate a single ∼1-s trajectory of the MT disassembly of a 400-nm long fragment of MT on a GPU, which corresponds to the experimental 417 nm/s (25 μm/min) disassembly rate. It takes ∼14 days of computational time to generate a single ∼3-s trajectory of the MT growth by 30-nm on a GPU, which corresponds to the experimental 42 nm/s (2.5 μm/min) rate of MT assembly. To achieve top performance on a GPU, the LD and MC algorithms were recast into a data-parallel form so that the computational threads run the same instruction stream, but on different data sets (i.e. particles). The tasks were made compute-intensive so that the GPU performs computations rather than reading and writing data. These efforts enabled us to reach the biologically important seconds timescale. The simulation output is organized into the coordinate file for 3D-coordinates (, and ) and the angle file for 3D-angles (, and ) of all αβ-tubulin dimers in the MT lattice and free tubulin solution, and the energy file for the entire reaction volume (MT lattice plus solution) recorded for each time step. In addition to storing numerical values of various energy terms, , , , , , and (see Eqs. (1), (2), (3), (4), the energy file also provides information about the identity of GTP-to-GDP-hydrolyzed tubulin dimers and moments of time when these reactions occurred. The coordinate, angle, and energy files can then be used in data visualization and processing, and in analyses of the results of simulations. To optimize numerical values of parameters defining the potential energy (force field) , (Eqs. (1), (2), (3), (4)), we used results from the prior molecular-mechanical studies of MT disassembly [28], our study of MT indentation in silico [35], experimental kinetic data on the rates of MT growth and shortening in vitro [14], experimental electron crystallography data [53], cryo-EM data [52], as well as our own intuition built step-by-step to span and explore the full parameter space for MT assembly-disassembly. It took ∼2 years of research to obtain an accurate parameterization of the MADDY model. At this stage, our goals were to obtain the MADDY model parameterization that would result in agreement i) with the experimental 417 nm/s (25 μm/min) rate of MT disassembly [14] and ii) with the experimental 17–67 nm/s (1–4 μm/min) rate of MT assembly which corresponds to the 7–15.5 μM concentration range of soluble tubulin [14] (Fig. 2). These efforts have resulted in the optimized parameters of the MADDY force-field (Table 1). There is a small discrepancy between the experimental rates of MT growth and MT shortening and rates of these processes simulated with the MADDY model (Fig. 2). This might be due to the fact that in vitro experiments might involve accessory proteins and might involve tubulin sequence variants or post translational modifications of tubulin, whereas in the in silico experiments we used the drug-stabilized MT structure solved based on X-ray crystallography data. In addition, the MADDY model employs a simplified bead per tubulin monomer-based representation of the MT lattice. Notwithstanding these model limitations, subsequent analysis of the statistics of length of MT protofilaments and curvature of depolymerizing MTs revealed iii) that the experimental histograms of lengths of MT protofilaments [46], [47] and simulated distributions of this quantity agree very well, and iv) that the simulated and experimental scatterplots of the protofilaments’ profiles show very good agreement (Fig. 3). Moreover, we analyzed the force-generating properties of depolymerizing MTs. The multi-subunit Dam1 protein complex, bound to the MT, is essential for chromosome segregation and it plays a major role in coupling yeast MTs to kinetochores. The Dam1 ring-like complex was demonstrated to transduce force from depolymerizing MTs and was shown to withstand force loads up to 30 pN [19]. The statistics of stalling forces of disassembling MTs upon the Dam1 ring from the MADDY based simulations and experiment [19] agree (Fig. 3). These multiple points of agreement validate the MADDY model. While exploring the full parameter space, we found out that the rate of MT disassembly is primarily defined by the interplay between the strength of lateral interaction and flexural rigidity (Table 1). Indeed, in the depolymerizing MT (with primarily GDP-bound tubulin dimers), the lateral interactions between neighboring PFs are disrupted as the PFs bend outwards from the MT axis, and so higher flexural rigidity is necessary to disrupt stronger lateral non-covalent bonds. On the other hand, disassembling MTs are known to be efficient force transducers, and so flexural rigidity directly affects pushing forces produced by splaying PFs. Additionally, the strength of longitudinal interactions affects the length statistics of curled “rams’ horns” at the end of shortening MT polymers. Interestingly, flexural rigidities and define the statistics of singlets and doublets. It is worth mentioning that in a previous modeling study [41], the splaying outward PFs from a depolymerizing MD were restricted to bend in a plane, not in full 3D-space. In the MADDY model, the PFs are capable of bending around all three , - and -axes (Fig. 1), which corresponds to the bending angles , and (Fig. 1) and bending rigidities , and (Table 1). This enables the PFs to bend out of MTs as monomeric structures (singlets), as dimeric structures (doublets), or even occasionally as trimeric structure bundles (triplets). In the doublets and triplets, the lateral interactions maintain the nearest neighbor associations along the PFs (Fig. 3B). In the polymerizing MT (with the GTP-bound dimers), the strength of longitudinal interactions primarily affects the rate of MT assembly. We found that the average length of curled protofilaments is determined by the strength of longitudinal interactions , and that good statistics of curled PF lengths (Fig. 3A) is observed for the 12.0–16.5 kcal/mol range . The rate of disassembly depends on the ratio between the strength of the lateral potential and the value of protofilament stiffness : higher stiffness makes a PF bend outward and disrupt lateral bonds more rapidly. When the protofilament stiffness is in the 1.6103-1.9103 pNnm/rad2 interval, the best match in disassembly rate is found when is in the 6.5–8.5 kcal/mol range. Performing simulations in the narrower parameter space allowed us to obtain higher resolution values of parameters for GDP-bound tubulin depolymerization (Table 1). The reasons why the lateral and longitudinal bond energies are different for MT growth and MT shortening are the following: i) we used the drug-stabilized MT structure solved based on X-ray crystallography data, which might create an energy bias; ii) the MADDY model uses a bead per tubulin monomer-based description of the MT lattice and assumes that the α- and β-tubulin monomers are rigid and, hence, the model ignores conformational fluctuations within α- and β-tubulin monomers; in addition, iii) the MADDY model ignores the hydrodynamic interactions, which destabilize the MT lattice when the MT shortening sets in (unpublished data). Using the MADDY model, we carried out in silico experiments of MT growth. One of the highlights from this study (Fig. 4) is the formation of erroneous attachments formed by αβ-tubulin dimers on the MT lattice (lattice defects) and empty lattice sites lacking tubulin dimers (lattice vacancies; see Movie S4). These MT lattice defects and lattice vacancies are more manifest at higher concentration of free tubulin, which corresponds to more far-from-equilibrium conditions of the MT growth process. An immediate consequence of these effects is that a large density of accumulated improper attachments and empty lattice sites results in a slowdown or even a complete stoppage of MT growth (the inset in Fig. 5A). Once lattice defects and vacancies around the MT tip area are corrected dynamically, normal MT growth resumes. This accumulation of improper incorporations and empty lattice sites followed by their dynamic repair may be a significant contributor to the non-monotonic, variable growth rate behavior observed in the MT assembly kinetics. The driving force for correction of lattice defects and lattice vacancies is the propensity of a growing MT to attain its free-energy minimum state with correct attachments formed by all incorporated αβ-tubulin dimers and all MT lattice sites fully occupied, which is a restatement of the Second Law of Thermodynamics. Defects in the microtubule lattice such as voids mean that there are tubulin dimers that neighbor the void that have missing lateral or/and longitudinal bonds, so these dimers are searching for a free dimer to form these missing bonds. We cannot say with certainty that all lattice defects in the growing MT are corrected dynamically. Some defects may be corrected by alternative mechanisms that involve proofreading proteins or enzymes. Permanent MT errors would have deleterious consequences for MT function. Another highlight from in silico experiments on MT growth is the large variability of MT tip extensions, which is positively correlated with the variable (i.e. non-monotonic) rate of MT growth (Fig. 5). Another finding from this study is that at lower (higher) tubulin concentration, the tip of a growing MT is more blunt (more sharp), which implies that the MT tip shape is tubulin concentration dependent. The stronger longitudinal bonds form more rapidly than the weaker lateral bonds (Table 1), and so at higher tubulin concentration this strength difference is manifest at the MT tip. Next, we performed long MADDY based simulations of the dynamics of MT polymerization-depolymerization, in which we observed and analyzed the kinetic states of MT growth, MT shortening, MT catastrophe, and MT rescue (Fig. 6). In these simulations, we set 14.2 kcal/mol and 7.6 kcal/mol. These values of the strength of longitudinal and lateral non-covalent bonds are arithmetic averages of the values of and used in the simulations of MT growth and MT shortening (see Table 1). To simulate and study multiple transitions between these kinetic states, i.e., from MT growth to MT shortening (through MT catastrophe) and from MT shortening back to MT growth (through MT rescue), in Case Studies 1–6 (Table 2) we varied the concentration of soluble tubulin between the experimentally relevant 25-μM soluble tubulin concentration and 10-fold higher 250-μM tubulin concentration, and increased ∼4–40 fold the kinetic rate constant for GTP hydrolysis in the 2–20-s−1 range. Tubulin concentrations in different cells vary considerably and have been measured numerous times; they range from about 5 to 40 μM [73], but it might vary depending on a location in the cell. For example, in C. elegans cells, the overall tubulin concentration is about 47 μM, and the soluble tubulin within centrosomes could be as high as 470 μM [74], and so both 25 and 250 μM soluble tubulin concentrations are biologically relevant. While a higher tubulin concentration is expected to speed up MT growth and a larger GTP hydrolysis rate is expected to promote more rapid MT shortening, accelerating the kinetics in this way does not change the fundamental mechanisms of MT assembly and disassembly, as well as any of the features we have identified through Machine Learning as significant for the different kinetic states, including the MT catastrophe and rescue. The results of MADDY based experiments in silico revealed a complex physical picture underlying the dynamics of MT assembly-disassembly, and so in order to clarify these processes we turned to statistical modeling. Using the simulation output, we surveyed 14 different quantitative characteristics (features) of structure and thermodynamics of the MT lattice and MT tip (Fig. 7). These features were chosen based upon the current understanding of MT dynamic instability, since they involve metrics of the MT structure and energetic state and its propensity for a change of state [74]. Extensive Machine Learning analysis of the space of predictors for various kinetic states of MT dynamic instability (classes) revealed, quite surprisingly, that the MT structural and energetic characteristics that discriminate between the classes (kinetic states) depend on the experimental conditions, i.e. sets of top important MT features for lower and higher tubulin concentrations overlap, but only partially (Table 5). For example, the MT length L and average longitudinal curvature are important for both higher and lower tubulin concentration. However, the total energy of longitudinal interactions on MT lattice , energy of longitudinal interactions required to complete MT to full cylinder , energy of lateral interactions required to complete MT to full cylinder , and MT tip width w are important at low tubulin concentration, whereas the number of hydrolyzed αβ-tubulin dimers in MT lattice , average number of lateral interactions per helical pitch in MT lattice , total energy of lateral interactions on MT lattice , and energy of longitudinal interactions in MT tip are important at high tubulin concentration. Our finding that the energy of longitudinal interactions in the MT tip is crucial at high concentration is not surprising, since stronger longitudinal bonds form more rapidly than weaker lateral bonds, and at high tubulin concentration this strength difference plays a decisive role. Interestingly, some important features identified for low tubulin concentration, that drop from the list of top important features found for high concentration, move to the list of the least important features at high tubulin concentration. To better understand this trend in predictors’ importance, we compared the distributions of these features (w, , , and ) for Case Studies 2 and 4. We found that the distributions of and are broader at low tubulin concentration (Fig. S14), compared to the distributions of these features for high tubulin concentration (Fig. S16 A-C). The overall distributions of w and are more similar for the two states, however the distributions based on the kinetic states (Fig. S14) showed that the mean feature values for each class differs more for Case Study 2 than for Case Study 4, which explains why these features are important only for Case Study 2. Conversely, the distributions of MT features that are important for high, but not for low tubulin concentration (, , and ; Fig. S15) are much broader for high tubulin concentration as compared to low tubulin concentration (Fig. S16 E-G). At the same time, the distributions of the same top important features identified for the lower and higher tubulin concentrations are similarly broad in Case Studies 2 and 4.
Table 5

Summary of the most important MT features: Listed are the top important features which distinguish between phases of MT growth, MT shortening, MT catastrophe and MT rescue for Case Studies 2 and 4 for low concentration (25 μM) and high concentrations (250 μM) of soluble tubulin, respectively. These case studies have produced the best overall predictions. The relative importance values ranging from 0 (least important feature) and 1 (most important feature) are calculated using bar plot in Fig. 8B for each feature importance and bar plot in Fig. S5C for each feature importance with respect to the top feature.

ConcentrationFeatureFeature DescriptionImportance
25 μM1MT length L1
7Average longitudinal curvature κlong0.76
10MT tip width w0.67
4Total energy of longitudinal interactions in MT lattice Ulong0.46
13Energy of lateral interactions required to complete MT to full cylinder Ulatadd0.43
14Energy of longitudinal interactions required to complete MT to full cylinder Ulongadd0.43



250 μM2Number of hydrolyzed αβ-tubulin dimers in MT lattice nhyd1
1MT length L0.73
12Average number of lateral interactions per helical pitch in MT lattice nlat0.29
7Average longitudinal curvature κlong0.27
3Total energy of lateral interactions in MT lattice Ulat0.26
6energy of longitudinal interactions in MT tip ulong0.21
Summary of the most important MT features: Listed are the top important features which distinguish between phases of MT growth, MT shortening, MT catastrophe and MT rescue for Case Studies 2 and 4 for low concentration (25 μM) and high concentrations (250 μM) of soluble tubulin, respectively. These case studies have produced the best overall predictions. The relative importance values ranging from 0 (least important feature) and 1 (most important feature) are calculated using bar plot in Fig. 8B for each feature importance and bar plot in Fig. S5C for each feature importance with respect to the top feature. Furthermore, analysis of the mean values for each of the features that alter their importance from the lower tubulin concentration case to the higher tubulin concentration case and from the higher tubulin concentration case to the lower tubulin concentration case, in each of the four kinetic states (Tables S2 and S3), revealed that the top important features take on distinctly different mean values in all four kinetic states (Table S2). By contrast, the MT features are no longer important when they reach comparable mean value in all four states (Table S3). As an example, consider the tip width , which is found to be important for lower tubulin concentration, but not for higher concentration of free tubulin. As discussed above, this is because for the MT growth to occur, the MT tip should become sharper and, hence, the MT tip extension should be more variable at higher tubulin concentration, where the tip width is expected to play an important role in determining the MT kinetic state. The average width of MT tip is small (∼3–4 dimers) and changes between all four states at lower tubulin concentration; by contrast, at higher tubulin concentration is large (∼6–7 dimers) and it takes on similar values in all four kinetic states. Thus, while the tip width experiences a substantial increase at higher concentration versus lower tubulin concentration, which is expected based on the broadening of the distribution of MT tip extensions, the fact that there is no variation in values of between the kinetic states for higher tubulin concentration renders this feature not important for higher tubulin concentration. We found that both for higher and lower soluble tubulin concentrations, the MT catastrophe is the most distinguishable MT dynamic state from the other kinetic states based on the top important features, while the MT shortening is the least distinguishable kinetic state. We found for MT catastrophe that the majority of top important features are characterized by the maximum mean value among the four kinetic states. The ability of the top important features to distinguish the other two kinetic states, i.e. MT rescue and MT growth, depends on the concentration of soluble tubulin. The MT rescue is easily distinguishable at the lower tubulin concentration, while the MT growth is easily distinguishable at the higher tubulin concentration. Using the mean values of the top important features predicted by the ML analysis for each case (Fig. 11B vs Fig. 11A), we found that, overall, the distinction between the four kinetic states is more pronounced for low rather than high tubulin concentration. This finding is important since 25 μM is the concentration of soluble tubulin in cytoplasm [75], [76]. These ML analysis results, revealing the most important features distinguishing the 4 kinetic states of MTs during dynamic instability, represent significant findings. However, they are limited by the fact that the MADDY model is not a complete representation of the MT’s dynamic instability behavior in its cellular environment. As such, the most important features we identified may be thought to represent the features intrinsic to the MT structure itself that can affect kinetic state stability and propensity for change. Thinking more broadly, in the cellular environment the 4 kinetic states of MTs are expected to be affected by additional factors not yet incorporated into this first version of the MADDY model. In fact, experimental results have revealed a number of such cellular factors interacting with MTs that are also thought to affect kinetic state stability and transitions. A variety of MAP proteins have been shown to affect kinetic state behavior through their interaction with MTs. For example, End Binding (EB) proteins were shown to bind in vivo to the MT + end GTP cap structure and affect stability [29], [77], [78]. Furthermore, a number of molecular motors affect kinetic states. For example, two microtubule catastrophe factors have different effects on in vitro MT length distributions. Kip3, accumulates at MT + ends [79] and acts as a depolymerase, destabilizing MTs and shortening them more rapidly during catastrophe [80]. Another motor protein, Kif18A, appears to be involved in the in vivo shaping of microtubule length distributions [81] and chromosome oscillations, correlating with K-fiber catastrophes [82]. The MCAK motor protein, a Kinesin-13 family member, affects microtubule lengths by a destabilizing interaction, increasing the catastrophe rate [83]. Conserved members of the CLASP protein family have been shown to affect MT tip structure, helping to modulate catastrophe and rescue via the influence of EB proteins [84]. Interestingly, CLASP proteins may serve a nucleating function during rescue [85], potentially facilitating a nucleation event at the MT’s remnant GTP islands to initiate rescue. Furthermore, recent studies showed that MT severing proteins, such as spastin and katanin, can amplify MT arrays in vivo by catalyzing exchange of GTP-tubulin subunits along a MT filament [86]. Finally, a different type of cellular environment factor affecting MT properties are the varying states of tension that MTs are exposed to during its dynamic instability trajectory. For example, MT – end-directed molecular motors can create mechanical tension acting to promote rescue [87], [88]. From the few examples we have just discussed, it is clear that there are many cellular factors that can affect MT kinetic state occupancy observed within the dynamic instability framework. Our point is to emphasize the limitations of using the current MADDY model to describe a complete picture of the most important features differentiating the four kinetic states in vivo. Perhaps in future developments of the MADDY model, these types of additional cellular environmental factors can be included as model features. This would allow a more accurate description of in vivo MT behavior, thereby allowing a better statistical modeling-based description of the most important features distinguishing the 4 kinetic states in dynamic instability. To conclude, here we show that experiments in silico are capable of providing a complete and high-resolution simulation view of the entire process of MT growth and shortening. We have presented the MADDY model to carry out such high-resolution simulations and using MADDY we observed, for the first time, all four of MT’s kinetic states: growth, catastrophe, shortening and rescue. The most important physical characteristics for low tubulin dimer concentration, according to the highest scoring statistical model, are: the MT length, average longitudinal curvature, MT tip width, total energy of longitudinal interaction in MT lattices, and energies of lateral and longitudinal interactions required to complete MT to full cylinder. The number of hydrolyzed αβ-tubulin dimers in MT lattice, MT length, average number of lateral interactions per helical pitch in MT lattice, average longitudinal curvature, total energy of lateral interactions in MT lattice, and energy of longitudinal interactions in MT tip were found to be important in identifying the kinetic states of dynamic instability for MTs under high tubulin dimer concentration conditions. The current version of the MADDY model does not take into account the wide range of possible variations in tubulin sequence and post-translational modifications. Since the majority of published in vitro assays are performed with tubulin purified from brain tissue, we have parameterized the model using experimental data from the literature sources of published assembly and disassembly kinetics for brain origin tubulin. The structure of a finite-length fragment of the MT lattice was obtained from the 13-subunit ring structure of αβ-tubulin dimers for bovine brain tubulin. If we consider another source of tubulin differing significantly in properties, we will have to parameterize the MADDY model again using experimental data for that type of tubulin we want to model. The MADDY model can also be extended to incorporate any and all microtubule associated proteins (MAPs) in order to explore the role of MAPs in the dynamics of MT growth and shortening. One thing that would have to be taken into account in such studies is that MT surface bound MAPs (i.e., motor proteins) could affect the magnitude of the lateral/longitudinal tubulin interactions.

Materials and methods

MADDY force field

In the MADDY model, each -th tubulin monomer is described by a spherical bead (of radius 2 nm; see Fig. 1A) using three translational coordinates , , , and three rotational coordinates , , , 1, 2, …, , where is the total number of monomer subunits (Fig. 1A). The total potential energy function for a (dis)assembling MT, , depends on the conformations of tubulin monomers forming the MT lattice. involves contributions from the bonding energy, bending energy and repulsive energy, In Eq. (1), the potential energy describing formation and dissociation of tubulin-tubulin bonds, , is the sum of three energy terms, each depending on the distance between the longitudinal or lateral interaction sites and in the α- and β-tubulins in different heterodimers, or in the same dimer (Fig. 1B), In Eq. (2), the longitudinal potential describes the non-covalent head-to-tail attachments of tubulin dimers in an MT protofilament, whereas the lateral potential determines the non-covalent interactions between the protofilaments. For longitudinal and lateral bonds, we used the Morse potential with parameters and , which determine the strength of interaction, and and which define the range of interaction. In and , denotes the distance between adjacent longitudinal or lateral interaction sites and that belong to different αβ-tubulin dimers (Fig. 1B). The harmonic potential takes into account the non-covalent interactions between the - and -tubulin subunits within the tubulin dimer with the spring constant (Fig. 1B). In Eq. (1), the angle-dependent potential energy for protofilament bending is given by Protofilament bending occurs through the α- and β-subunits’ rotations around the -, -, and -axes at the longitudinal interaction sites in the same protofilament (Fig. 1A). In Eq. (3), the bending potential is expressed for each orientation angle in terms of the difference between the angles for the next-neighbor monomers , , and and the equilibrium angles , , and . We used 0.1 rad and 0.2 rad for the GTP-bound and GDP-bound conformations of β-tubulins, respectively, depending on their nucleotide state. , , and are the flexural rigidities for the protofilament bending around the angles , , and , respectively (Fig. 1D). In Eq. (1), the repulsive energy term is given by the Lennard-Jones potential: which takes into account the excluded-volume interactions; is the distance between the centers of any pair of tubulin monomers, and 0.5 kcal/mol and 3.5 nm determine the strength and the range of repulsion (Fig. 1C). Values of all constant parameter of the MADDY model are accumulated in Table 1.

Langevin dynamics of MT growth and shortening

The dynamics of tubulin subunits are propagated forward in time using the Langevin equations of motion in the overdamped limit (Brownian dynamics). The coordinates {, , } (translational degrees of freedom) and angles {, , } (rotational degrees of freedom) for each particle (α- or β-tubulin monomer) at the simulation step are calculated using the coordinates computed at the previous step [72], i.e. In Eqs. (5) and (6), and are the translational and rotational friction coefficients, respectively, is the solvent viscosity, is the integration time step, and is absolute temperature ( is Boltzmann’s constant). Also, is the zero-average random number sampled from the normal distribution describing random forces for translations and rotations. The derivatives and define, respectively, the translational and rotational components of the molecular forces acting on the -th particle. Eqs. (5), (6) were integrated numerically with the time step 200 ps [89] at 300 K temperature using the solvent viscosity (cell cytoplasm) 0.2 Pas [90].

Modeling GTP-to-GDP hydrolysis

The GTP hydrolysis is modeled using Monte-Carlo algorithm implemented on a CPU. The probability that a hydrolysis reaction in an αβ-tubulin dimer occurs at time is , where is the GTP hydrolysis rate constant, and the probability that a reaction occurs in the time interval (Monte Carlo timestep) is given by . For each value of (Table 2), we selected the timestep so that the probability 0.02 (i.e. 2% probability of transition): 0.01 s, 0.005 s, 0.004 s, 0.003 s, 0.0025 s, 0.002 s and 0.001 s for 2 s−1, 4 s−1, 5 s−1, 6 s−1, 8 s−1, 10 s−1 and 20 s−1, respectively.

Building MT structures:

The structures of MT polymers used in the simulations of MT growth and shortening were obtained from the 13-subunit ring structure of αβ-tubulin dimers modeling a 12-nm (3 monomers) helical pitch [35]. The ring structure uses atomic coordinates of the αβ-tubulin dimer (PDB entry: 1JFF; [91]) with the E-site in β-tubulin monomer occupied by GDP. The short and long finite-length fragments of MT polymers with GDP-bound tubulin dimers were constructed by replicating the ring structure 3 and 50 times using the shift distance of 8 nm (length of a tubulin dimer) to obtain an MT polymer of 3 and 50 tubulin dimers in length (MT3/13 and MT50/13). The bead-per-monomer structural models of MT polymers MT3/13 and MT50/13 were obtained by placing soft bead particles of radius 2 nm at the centers-of-mass of tubulin monomers. The bead-per-monomer structure of MT3/13 is shown in Fig. 1A. The short 12-nm MT fragment MT3/13 was used in the in silico experiments of MT growth, whereas the 400-nm long MT fragment MT50/13 was used in the simulations of MT shortening. The GTP- and GDP-bound tubulin subunits were modeled similarly, except for the difference in the equilibrium bending angle (Table 1; see also Fig. 1D).

Performing in silico experiments

In the simulations of MT disassembly, all the β-tubulin monomers in the MT lattice were in the GDP-bound form and αβ-tubulin dimers tended to attain the “bent conformation” ( 0.2 rad). In these simulations, we set the concentration of soluble tubulin to zero. The first helical pitch formed by the αβ-dimers at the MT minus end was constrained to prevent rotations of an MT polymer. This mimics connection of an MT cylinder with the centrosome pole. We completed a total of 10 independent simulation runs of 1 s biological time (Fig. 2A, see also Movie S1). In the simulations of MT growth, the β-tubulin monomers in free tubulin dimers were in the GTP-bound form and αβ-tubulin dimers formed the “straight conformation”. In these simulations, we used the short MT fragment MT3/13 with the tubulins in the GMPCPP-bound nucleotide state. In silico tubulin polymerization was carried out in a cylindrical box of 160-nm base diameter and 400-nm length containing free soluble tubulin dimers. The 15, 20, 25 and 30 μM concentrations of free tubulin were maintained constant during each simulation run. In the structure of the short MT fragment MT3/13, we constrained the αβ-tubulin dimers in the first three helical pitches’ portion of the MT lattice at the minus end. This mimicked clamping of the MT at the mitotic pole and prevented MT3/13 from rotations around the longitudinal -axis. A total of 10 simulation runs of 8-s duration were completed for 15, 20, 25 and 30 μM soluble tubulin concentration (see Fig. 2B, see also Movies S2 and S3). In the simulations of MT assembly-disassembly, stochastic switches between the GTP-bound form of β-tubulin monomers (straight conformation with equilibrium bending 0.1 rad) and the GDP-bound form (bent conformation with equilibrium bending angle 0.2 rad) were modeled using information about the hydrolysis state of β-tubulin monomers. We used the short MT fragment MT3/13 with the tubulins in the GMPCPP-bound nucleotide state. Because parameterization of the MADDY force field for MT growth and MT shortening resulted in slightly different values of and , in the simulations of MT assembly-disassembly we set the values of and to be equal to the arithmetic average of the values used in the simulations of MT growth and MT shortening. For each Case Study 1–6 (Table 2), we performed 5 independent runs (2–4 s duration for Case Studies 1–3 and 1 s duration for Case Studies 4–6; see Fig. 6).

Force generating properties of MT

Dam1 was modeled as a ring encircling an MT cylinder, with the outer diameter of 48 nm composed by 13 subunits (spherical beads) of 4.5 nm radius (Fig. 3C). Dam1 subunits were described by translational coordinates , , , and rotational coordinates , , ( 1, 2, …, 13). The total potential energy for Dam1, , is the sum of the bonding energy due to interactions between a pair of the nearest neighbor Dam1 subunits and between the Dam1 subunits and optical trap (virtual bead), the bending energy due to bending in any triplet of the Dam1 subunits, and the repulsive energy due to interaction between Dam1 subunits and tubulin monomers forming the MT lattice. The bonding energy is described by the harmonic potentials for coupled Dam1 subunits , for the optical trap , and for the interaction between the optical trap and subunits of Dam1 ring (Fig. 3). In , the spring constants are 2.0103 pN/nm for and , and 0.1 pN/nm for . Equilibrium distances were 9 nm for and 150 nm for . The bending potential is same as Eq. (3) with the flexural rigidities 5.5105 pNnm/rad2 and equilibrium angles 0. The repulsive potential is the same as in Eq. (4) with being the distance between any pair of beads; 0.5 kcal/mol is the repulsion strength, and 8 nm and 6 nm are, respectively, the ranges of repulsion between Dam1 subunits and between Dam1 subunits and tubulin monomers (Fig. S1).

Simulations of MT disassembly stalled by Dam1 ring

The optical trap was represented by a spherical bead of radius 80 nm (Fig. 3C). A harmonic potential along the MT axis (-axis) was applied to the center-of-mass of the optical trap. The initial MT length was 240 nm. Using the output from the simulations of MT disassembly in the presence of the Dam1 ring, we measured the equilibrium displacements of the Dam1 ring from its initial position due to splaying outward of the protofilaments from the depolymerizing MT until the Dam1 ring became stalled. At the stall point, the values of were translated to the values of equilibrium force (stalling force) using the Hook’s law, (Fig. 3). A total of 80 independent 0.2-s trajectories were carried out to generate the distribution of stalling forces (Fig. 3D).

Analysis of output from Langevin simulations of MT assembly-disassembly

We used the energy and coordinate files from MADDY model-based simulations to analyze the time dependence of 14 features which characterize the MT structure and energetics, and the MT tip shape and energetics. The numerical output (i.e., coordinate and energy files from 5 independent runs for each Case Study 1–6) was divided into the data sets for each class – kinetic states associated with the MT growth, MT catastrophe, MT shortening, and MT rescue. Ascending and descending portions of the MT length versus time profiles were assigned to the true classes associated with the MT growth and MT shortening, respectively. The portions of the MT length vs time profiles between the ascending and descending parts and between the descending and ascending parts of the profiles were assigned to the true classes associated with the MT catastrophe and MT rescue, respectively. The corresponding numerical values of the other 13 features were calculated for each data point (value of ) and for each class using the coordinate and energy files. The MT length (feature 1) was calculated as , where 8 nm is the size of a single αβ-dimer, is the total number of αβ-dimers in the MT lattice and 13 is the number of protofilaments (PFs). The number of hydrolyzed αβ-tubulin dimers in MT lattice (feature 2), total energy of lateral interactions in MT lattice (feature 3) and total energy of longitudinal interactions in MT lattice (feature 4) are directly obtained from the output files. The energies of lateral interactions in MT tip (feature 5) and longitudinal interactions in MT tip (feature 6) are estimated as and , respectively, where and are the numbers of lateral and longitudinal bonds in the MT tip, respectively. Average longitudinal curvature (feature 7) is calculated as , where is a vector that connects the α- and β-monomers in the last αβ-tubulin dimer in the -th PF, and is a vector that connects the α- and β-monomers in the last αβ-tubulin dimer in the -th PF. The average lateral curvature (feature 8) is calculated using the same formula, , where is a vector that connect the α- and β-monomers in the first dimer in the -th helical pitch in the MT tip, and is now a vector that connect the α- and β-monomers in the second αβ-tubulin dimer in the -th helical pitch in the MT tip. The MT tip length (feature 9) and width (feature 10) are calculated, respectively, as (or if is given in units of tubulin dimers) and (or if is in units of tubulin monomers), where is the total number of αβ-dimers in the MT tip, 4 nm is the size of a tubulin monomer, and is the number of dimers in the longest PF in the MT tip. The average number of longitudinal interactions per protofilament in MT lattice (feature 11) is calculated as , where is the total number of longitudinal bonds in the MT lattice. The average number of lateral interactions per helical pitch in MT lattice (feature 12) is calculated as , where is the total number or lateral bonds in the MT lattice and is the number of dimers in the longest PF. Finally, the energies of lateral interactions required to complete the MT to a full cylinder (feature 13) and longitudinal interactions required to complete the MT to a full cylinder (feature 14) are estimated as and .

Machine learning methods

To characterize the four kinetic states of MT dynamic instability (growth, rescue, shortening, and catastrophe), we assessed the ability of 14 features, extracted from the numerical output of the MADDY model-based simulations, to account for factors responsible for state stability and the distinction between states. These features were measured over the course of several simulations for 6 distinct Case Studies 1–6, corresponding to variations in the tubulin concentration and the rate of GTP hydrolysis (see Table 2). The change in length of the MT lattice over the course of each trajectory was used to manually assign measurements of the 14 features to each particular kinetic state depending on whether the lattice was growing, shortening, or representing a sudden shift from growth to shortening (catastrophe) or from shortening to growth (rescue). We used the supervised classification and ensemble methods briefly described below to assess trends between features and states by comparing each algorithm’s prediction accuracy, since each kinetic state was treated as equally important for classification [92]. We used a k-fold cross-validation approach by splitting the data into 5 k-folds and this was repeated 3 times for each method to avoid overfitting and to provide a more generalized prediction model. The implementation of the various Machine Learning methods was based on the Python scikit-learn package. The python script for the ML analysis is freely available (https://github.com/DimaUClab/MADDY.git). Dummy Classifier: A baseline accuracy was needed in order to compare the rest of the methods’ ability to predict the most accurate kinetic state. The “most frequent” strategy was set to only predict the class that had the largest volume of data for that particular dataset. Linear Discriminant Analysis (LDA): Singular value decomposition was used due to the large number of features being studied. No shrinkage was applied because of the solver chosen. The remaining parameters were kept at the default setting. Support Vector Machines (SVM): Support vector classifier (SVC) was the algorithm chosen so that the kernel type could be adjusted. The data was normalized before the classifier was applied. Linear, polynomial (degree of 3), sigmoid, and radial basis function (RBF) kernels were tested to better fit the projected data in higher dimensions. Logistic Regression (LR): The solver chosen was ‘LBFGS’ (Limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm) since each of the 6 datasets were relatively small (∼3000 data points; see Table 2). The multi class parameter was set to ‘multinomial’ given that there were 4 prediction classes (i.e. growth, shortening, catastrophe and rescue). K-Neighbors Classifier (KNN): The number of neighbors set was 3, with all points having uniform weights. The remaining parameters were set to the default. Bagging Classifier (BAG): Various ensemble methods were tested in hopes that combining multiple techniques would provide higher prediction accuracies. BAG groups together multiple decision trees when training models on datasets. The number of base estimators was set to 1000, while the remaining parameters were kept at the default. Random Forest (RF) Classifier: RF Classifier is a bagging ensemble method that randomly assigns, or bootstraps, observations while training models on datasets. The method included 1000 estimators. Extra Trees (ET) Classifier: This is an ensemble method similar to RF, with the difference being that measurements cannot be replaced during training. 1000 estimators were also applied. XGB Classifier (XGB): Boosting ensemble method that corrects mislabeled observations throughout the training process with the parameters kept at default. The Synthetic Minority Oversampling Technique (SMOTE) [62] was applied to each classification model to increase the prediction accuracy by creating synthetic data points for the minority classes to compensate for possible skews found in some datasets. Feature importance was plotted for the RF and XGB models by ranking prediction coefficient values. The features found to be the least important for these two models were removed to test whether fewer features could result in increased accuracy. Imbalanced datasets might result in classification accuracies that represent the most populated class instead of 4 classes (for 4 kinetic states), and so multiclass confusion matrices were constructed for each dataset using either RF or XGB classifier. Confusion matrices visualize the number of predictions made per class while comparing them to their correct class in order to identify trends in mislabeled predictions. Diagonals in each matrix indicate true positives (TP) for a class, column values beside the unit found in the diagonal were false positives (FP), row values except the diagonal unit were false negatives (FN), and the remaining matrix included all true negatives (TN). Each unit was labeled with the number of predictions, the total number of samples for each class, and the relative percentage of predictions made for a particular case. Along with the confusion matrices, receiver operating characteristic (ROC) and precision-recall (PR) curves were plotted to further quantitate the value of the classification done by applying each method. Here, we compared SVM using the RBF kernel, RF, and XGB for each of the four kinetic states in Case Studies 2 and 4. Both ROC and PR curves highlight the ability of classification methods to make correct predictions. The ROC curves use the true positive rate (TPR) against the false positive rate (FPR) to show the ratio of correct prediction versus the rate of predicting any of the negative classes as positive. The PR curve plots the recall, which is equal to TPR, and precision focusing on the comparison of methods based on correct predictions. For the imbalanced data, the PR curve and area under the curve (AUC) values associated with each plot were prioritized over the ROC results since the PR plots take into account mainly the minority, or true positive, classes. AUC values increase with classification ability and range from 0 to 1. The best classification algorithms, along with confusion matrices and PR curves, address how well the four kinetic states can be correctly identified, but how each decision is made and the combination of features that drive the algorithm to make a certain decision are not easy to decipher. To shed light on this “black box” issue, we used SHAP (SHapley Additive exPlanations) as an explainable approach to understand the type and magnitude of MT parameters needed to distinguish among the kinetic states. This concept is used in cooperative game theory, which provides the contribution of each feature towards the classification of each localized observation [93]. Calculating the Shapley value by comparing subsets of the model with and without each feature and summing up the contributions is what drives the prediction of a particular class, and this was done for each observation separately. Feature importance plots were produced using the mean absolute Shapley value of all data entries (observations) for each feature and for each class. The top features that give the highest collective importance for the kinetic states were chosen as features of interest for low and high tubulin concentrations. As customarily done in SHAP analysis, we obtained beeswarm plots for each class per Case Study using the test set of the final k-fold, which depicts the ranking of features in the order of importance for a given class, the distribution of Shapley values on the -axis for all observations for each feature, and the magnitude of the feature measurements by coloring values either pink or blue depending if the data point represents a high or low feature value, respectively. The color of the distribution corresponding to positive SHAP values indicates the magnitude of the feature values that benefits the classification towards predicting the respective class. Relative averages of feature values within the distribution corresponding to positive SHAP values (defined as values −0.1) for each kinetic state were calculated and plotted for the selected features of interest.

CRediT authorship contribution statement

Evgenii Kliuchnikov: Formal analysis, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing. Eugene Klyshko: Formal analysis, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing. Maria S. Kelly: Formal analysis, Visualization, Writing – original draft, Writing – review & editing. Artem Zhmurov: Methodology. Ruxandra I. Dima: Methodology, Supervision, Writing – original draft, Writing – review & editing. Kenneth A. Marx: Conceptualization, Formal analysis, Supervision, Writing – original draft, Writing – review & editing. Valeri Barsegov: Conceptualization, Methodology, Supervision, Writing – original draft, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  87 in total

1.  Refined structure of alpha beta-tubulin at 3.5 A resolution.

Authors:  J Löwe; H Li; K H Downing; E Nogales
Journal:  J Mol Biol       Date:  2001-11-09       Impact factor: 5.469

Review 2.  Microtubule dynamic instability and GTP hydrolysis.

Authors:  H P Erickson; E T O'Brien
Journal:  Annu Rev Biophys Biomol Struct       Date:  1992

3.  Generation of random numbers on graphics processors: forced indentation in silico of the bacteriophage HK97.

Authors:  A Zhmurov; K Rybnikov; Y Kholodov; V Barsegov
Journal:  J Phys Chem B       Date:  2010-12-31       Impact factor: 2.991

4.  Kif18A and chromokinesins confine centromere movements via microtubule growth suppression and spatial control of kinetochore tension.

Authors:  Jason Stumpff; Michael Wagenbach; Andrew Franck; Charles L Asbury; Linda Wordeman
Journal:  Dev Cell       Date:  2012-05-15       Impact factor: 12.270

5.  All tubulins are not alike: Heterodimer dissociation differs among different biological sources.

Authors:  Felipe Montecinos-Franjola; Sumit K Chaturvedi; Peter Schuck; Dan L Sackett
Journal:  J Biol Chem       Date:  2019-05-20       Impact factor: 5.157

6.  GTPgammaS microtubules mimic the growing microtubule end structure recognized by end-binding proteins (EBs).

Authors:  Sebastian P Maurer; Peter Bieling; Julia Cope; Andreas Hoenger; Thomas Surrey
Journal:  Proc Natl Acad Sci U S A       Date:  2011-02-22       Impact factor: 11.205

7.  Cortical dynein controls microtubule dynamics to generate pulling forces that position microtubule asters.

Authors:  Liedewij Laan; Nenad Pavin; Julien Husson; Guillaume Romet-Lemonne; Martijn van Duijn; Magdalena Preciado López; Ronald D Vale; Frank Jülicher; Samara L Reck-Peterson; Marileen Dogterom
Journal:  Cell       Date:  2012-02-03       Impact factor: 41.582

8.  Microtubule assembly in cytoplasmic extracts of Xenopus oocytes and eggs.

Authors:  D L Gard; M W Kirschner
Journal:  J Cell Biol       Date:  1987-11       Impact factor: 10.539

9.  Structure of growing microtubule ends: two-dimensional sheets close into tubes at variable rates.

Authors:  D Chrétien; S D Fuller; E Karsenti
Journal:  J Cell Biol       Date:  1995-06       Impact factor: 10.539

10.  Mechanisms of microtubule dynamics and force generation examined with computational modeling and electron cryotomography.

Authors:  Nikita B Gudimchuk; Evgeni V Ulyanov; Eileen O'Toole; Cynthia L Page; Dmitrii S Vinogradov; Garry Morgan; Gabriella Li; Jeffrey K Moore; Ewa Szczesna; Antonina Roll-Mecak; Fazoil I Ataullakhanov; J Richard McIntosh
Journal:  Nat Commun       Date:  2020-07-28       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.