Bart M H Bruininks1, Albert S Thie2, Paulo C T Souza1,3, Tsjerk A Wassenaar1, Shirin Faraji2, Siewert J Marrink1. 1. Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, 9712 CP Groningen, The Netherlands. 2. Zernike Institute for Advanced Materials, University of Groningen, 9712 CP Groningen, The Netherlands. 3. Molecular Microbiology and Structural Biochemistry (MMSB, UMR 5086), CNRS, University of Lyon, 69367 Lyon Cedex 07, France.
Abstract
As molecular dynamics simulations increase in complexity, new analysis tools are necessary to facilitate interpreting the results. Lipids, for instance, are known to form many complicated morphologies, because of their amphipathic nature, becoming more intricate as the particle count increases. A few lipids might form a micelle, where aggregation of tens of thousands could lead to vesicle formation. Millions of lipids comprise a cell and its organelle membranes, and are involved in processes such as neurotransmission and transfection. To study such phenomena, it is useful to have analysis tools that understand what is meant by emerging entities such as micelles and vesicles. Studying such systems at the particle level only becomes extremely tedious, counterintuitive, and computationally expensive. To address this issue, we developed a method to track all the individual lipid leaflets, allowing for easy and quick detection of topological changes at the mesoscale. By using a voxel-based approach and focusing on locality, we forego costly geometrical operations without losing important details and chronologically identify the lipid segments using the Jaccard index. Thus, we achieve a consistent sequential segmentation on a wide variety of (lipid) systems, including monolayers, bilayers, vesicles, inverted hexagonal phases, up to the membranes of a full mitochondrion. It also discriminates between adhesion and fusion of leaflets. We show that our method produces consistent results without the need for prefitting parameters, and segmentation of millions of particles can be achieved on a desktop machine.
As molecular dynamics simulations increase in complexity, new analysis tools are necessary to facilitate interpreting the results. Lipids, for instance, are known to form many complicated morphologies, because of their amphipathic nature, becoming more intricate as the particle count increases. A few lipids might form a micelle, where aggregation of tens of thousands could lead to vesicle formation. Millions of lipids comprise a cell and its organelle membranes, and are involved in processes such as neurotransmission and transfection. To study such phenomena, it is useful to have analysis tools that understand what is meant by emerging entities such as micelles and vesicles. Studying such systems at the particle level only becomes extremely tedious, counterintuitive, and computationally expensive. To address this issue, we developed a method to track all the individual lipid leaflets, allowing for easy and quick detection of topological changes at the mesoscale. By using a voxel-based approach and focusing on locality, we forego costly geometrical operations without losing important details and chronologically identify the lipid segments using the Jaccard index. Thus, we achieve a consistent sequential segmentation on a wide variety of (lipid) systems, including monolayers, bilayers, vesicles, inverted hexagonal phases, up to the membranes of a full mitochondrion. It also discriminates between adhesion and fusion of leaflets. We show that our method produces consistent results without the need for prefitting parameters, and segmentation of millions of particles can be achieved on a desktop machine.
The increase in complexity
of molecular dynamics (MD) simulations
poses challenges for current analysis software. This is well illustrated
in the field of biological applications. Such simulations changed
from a single protein in vacuum[1] in the
late 1970s to complex membrane-based systems containing millions of
molecules such as (phospho)lipids, cholesterol, proteins, DNA, drugs,
water, etc.[2−7] The number of molecular aggregates in simulations has also increased,
and nowadays it is not rare to see a mixture of micelles, bilayers,
and vesicles in a single simulation.[4,8−11] This increase in complexity of both composition and aggregation
state requires a higher level of abstraction in analysis than typically
offered. During analysis, we would like to work with concepts such
as bilayers, vesicles, leaflets, protein aggregates, phases, etc.
on top of simple identifiers such as molecule names and indices, often
used in current tools. This would dramatically reduce the complexity
of coding the analysis software and would allow for a more user-friendly
interface with data. Current MD analysis software, such as MDAnalysis,[12] Pytim,[13] FATSLiM,[14] and VMD,[15] come with
built-in functions or have packages to detect, e.g., bilayer leaflets
and aggregates, and offer some of the desired level of abstraction.
However, these tools were developed to deal with relatively simple
compositions and topologies, and they may fail as systems contain
more than one or two segments (see Figure S1 in the Supporting Information). To avoid this problem, we developed
a robust method for consistent sequential segmentation with a voxel-oriented
analysis paradigm. No assumptions are required regarding the number
of segments or molecules, and it natively supports all-atom or coarse-grained
(CG) simulation data. Furthermore, voxel-based methods offer very
good scaling with the increase of particles and are excellent for
locality queries.[16−19]To guide the reader through this work, we will first present
pseudo
code to illustrate the general implementation of our leaflet segmentation
algorithm, which is implemented in a tool coined MDVoxelSegmentation.
We then show a wide range of examples, and we end with a discussion
of the limitations and further prospects of the tool. The SI files contain the video results of the algorithm.
Design
and Implementation
In the following sections, we describe
the complete algorithm in
two sections of pseudo code. We start with considering a single frame
and perform our multistep connective components analysis for lipid-based
systems. We then describe how we add the extra layer of chronological
consistency on top.
Pseudo Code for Spatial Segmentation of Leaflets
The
aim is to segment a lipid density into leaflets (i.e., a single membrane
leaflet is a segment). To successfully segment lipid densities into
leaflets, we require three things: a reference file, a trajectory
file, and a selection file. The reference file should contain information
regarding atoms and their naming (e.g., GRO, PDB). The trajectory
file should contain coordinates for each point in the reference file
in every frame (e.g., GRO, XTC, TRR). Finally, the selection file
should contain the selections (lipid heads, lipid tails, lipid linkers,
and exclusion beads). By default, most CG Martini lipids are supported,[20] meaning no selection is required to be specified
for the leaflet segmentation of those systems. However, if you are
working on a custom lipid, it can easily be added to the default selections
file (an example for CHARMM lipids is given in the methods for the
leaflet segmentation). The completed segmentation will assign each
lipid (residue) a segment identity. A schematic representation of
leaflet segmentation is shown in Figure .
Figure 1
Spatial leaflet segmentation. A 2D schematic
representation of
the leaflet segmentation of two stacked bilayers in a periodic box.
The upper bilayer contains a protein channel and the lower bilayer
contains a diving lipid. The protein is used as exclusion selection.
Spatial leaflet segmentation. A 2D schematic
representation of
the leaflet segmentation of two stacked bilayers in a periodic box.
The upper bilayer contains a protein channel and the lower bilayer
contains a diving lipid. The protein is used as exclusion selection.Leaflet segmentation starts with reading the input
files (Figure A) and
mapping all
the lipid tails, lipid heads, and exclusion beads to boolean voxel
matrices (0.5 nm binning for Martini lipid systems; see Figure B). This results in the following
three 3D boolean matrices: tails, heads, and exclusions. The exclusions
are always grown once, meaning all neighboring voxels (26) of a True
voxel become True. The exclusions are used as local stops during segmentation,
acting as boundaries for the segmentation (i.e., exclusions can be
used to confine the segmentation). We found that using exclusions
can be beneficial when working with proteins, since the protein boundaries
are prone to be edge cases for lipid segmentation and are better handled
later.The tails matrix is segmented using the adapted connected
components
algorithm (Figure C). Before segmentation starts, the heads matrix is subtracted from
the tails matrix. The exclusions act as boundaries during segmentation.
The result of the tails segmentation is depicted in Figure D.Roughly the same procedure
is followed for segmenting the heads
as was done for the tails. However, instead of using the complete
heads matrix, it is first deconstructed by the tails segments. Each
tails segment is individually processed (see Figures E and 1G). This is
achieved by using only those headgroups which have a lipid residue
index that is also in the active tails segment. Before heads segmentation
starts, the tails matrix is subtracted and the exclusions act as boundaries
during segmentation. Segmenting the heads in this manner results in
the leaflet segmentation per tails segment (Figure F and 1H).Although
the result at this point is good (Figures I and 1J), there might
be some lipids that still have no segment assigned. Force segmentation
can solve this problem, because it can be used to map lipids to a
segment based on a distance criterion. For every lipid that does not
have a segment assigned, all neighboring lipids are checked within
a certain cutoff distance (default is 2 nm). If the dominant segment
in that search is not 0 (unsegmented), the lipid is assigned to that
dominant segment. Our implementation performs this operation in an
iterative manner by trying to classify with small cutoff distances
first (default is 1 nm). The cutoff is slightly increased (default
is 0.1 nm) if no extra lipids were segmented. If an iteration does
result in new lipids being segmented, the cutoff is set back to the
minimum value. The iteration halts if the force segmentation returns
a stable output (i.e., no further changes occur), if the cutoff becomes
larger than the maximum value, or if all selected lipids have been
segmented. The final result after force segmentation is depicted in Figure . Force segmentation
can be turned off.Splitting the segmentation into two steps,
using force segmentation
as a post-processing step, means that simple cases are handled in
a relatively low-cost manner, whereas hard cases might use a different
approach altogether (in this case, our force-segmentation protocol).
This mixed use of a low-cost algorithm with more precise and expensive
algorithms reduces the total complexity of the calculation, while,
at the same time, hardly affecting the segmentation quality.The result after spatial leaflet segmentation of a trajectory is
a new trajectory that contains a segmentation entry for each index
in each frame. These indexes correspond to the atom indices in the
original structure file. Atoms that have no segmentation are assigned
to 0. However, because of the arbitrary picking of our initial voxels
for segmentation, the segmentation labels are missing sensible consistency
over time for all other labels than 0. To solve this lack of chronological
consistency, we must perform temporal segmentation.
Pseudo Code
for Temporal Segmentation
All that is required
as input for the temporal segmentation is the spatial segmentation
output. To perform temporal segmentation, we must find the identity
of segments for each frame sequentially. When comparing a new frame
with a current frame, the algorithm first considers the identities
of segments of the current frame. The identified segments of the current
frame are then matched to the segments in the new frame (Figure A). The atoms within
a segment are used as the basis for the identity of the segment. Segments
are treated as sets, for which the unique elements are the atom indices.
The identity of a segment in the new frame is the segment in the current
frame with which it shares the most atoms. In this way, segments can
be matched across frames, even if, because of shrinkage or growth,
their atom contents are not identical. Consider frame N which contains two segments N1 = {a1, a2, a3, a4}
and N2 = {a5, a6, a7, a8} and frame M, which contains the segments M1 = {a1, a2, a3, a8}
and M2 = {a5, a6, a7, a4, a9}.
Each segment contains unique elements, which have a lowercase unique
code. When we compare frame N with frame M, we see that the elements a8 and a4 have swapped places and a9 is introduced
to M2. However, we still want to assign N1 to M1 and N2 to M2, to allow for the growing and shrinking of segments. Therefore,
we need a way to identify the amount of shared elements between two
segments.
Figure 2
Finding the identity
of the segments in the new frame. (A) The
ordered 1 segment has a best match with the unordered 2 segment. (B)
The ordered 1 and 2 segments are merged into one segment, with the
ordered 2 segment providing the identity for the unordered 1 segment.
(C) The ordered 2 segment is split into two new labels: unordered
1 and 2. Ordered 0 is always matched as the new segment zero, since
this is the segment that contains all nonsegmented particles. N indicates the total number of segments in each frame.
Finding the identity
of the segments in the new frame. (A) The
ordered 1 segment has a best match with the unordered 2 segment. (B)
The ordered 1 and 2 segments are merged into one segment, with the
ordered 2 segment providing the identity for the unordered 1 segment.
(C) The ordered 2 segment is split into two new labels: unordered
1 and 2. Ordered 0 is always matched as the new segment zero, since
this is the segment that contains all nonsegmented particles. N indicates the total number of segments in each frame.To identify how many atoms are shared between two
segments across
frames the Jaccard index is used (see eq ). This index is defined as the intersection of two
sets divided by the union of two sets. This provides a ratio between
0 and 1, where 1 is complete similarity and 0 means no shared atoms.
The parameter Jaccard threshold defines the minimum score on the Jaccard
index that is required for two sets to have the same identity (see Figure S2 in the Supporting Information). For
example, a Jaccard threshold of 0.618 means that only if two segments
have a score higher than 0.618 together, they have the same identity.
This allows for consistency while also identifying large shifts between
frames in the simulation. For example, a leaflet in a bilayer system
that has lost lipids because of flip-flopping can still be identified
in the next frame. Because of the possibility of these large shifts,
the algorithm does not assume that the amount of segments between
frames remains the same. Therefore, it allows for the disappearance
of identified segments and the appearance of new identities. Identified
segments can disappear when no match in the new frame can be made
that exceeds the Jaccard threshold. An identified segment disappearing
is considered to have merged with another segment. A new identity
can be created when, in the new frame, a segment exists to which no
old identified segment has been matched. This new segment then receives
a new identity. A newly created identified segment is considered to
have split from a previously existing identified segment.In
the case of a merge (Figure B), two or more identified segments in the current
frame are merged into an unidentified segment in the new frame. Two
distinct cases are possible in the new frame. Either one of the identities
of the merging segments is inherited by the new segment or a new identity
is created. If one of the identities is maintained, the Jaccard score
between one of the identified segments and the unidentified segment
has exceeded the Jaccard threshold. This indicates that a large segment
is merging with one or more smaller segments. For example, this could
be the result of a transient fusion between two leaflets. If a new
identity is created, this means none of the previous segments contained
enough atoms to attain a Jaccard score higher than the Jaccard threshold,
when compared to the unidentified segment. In this case, all previous
identified segments are merged into a new segment with a new identity.To allow for the reappearance of identities, the disappeared identities
are stored in a database. This database contains the identity and
the atoms present in the identified segment at the current frame.
It also logs into which segment the disappeared segment has merged.
The segment into which the disappeared segment has merged is the segment
for which it had the highest Jaccard score. This score will always
be below the Jaccard threshold, since, otherwise, the identity would
be inherited. This information can be used to restore identities when
splitting events occur.In a split, two or more unidentified
segments in the new frame
have the highest Jaccard score with the same identified segment in
the current frame (Figure C). In such a split, there are also two distinct cases. In
one case, a small part splits off from an existing identified segment.
However, one of the unidentified segments still has a high enough
Jaccard score to match with the identified segment. This means that
one of the unidentified segments inherits the identity of the identified
segment. This could indicate a leaflet losing a small group of lipids,
which form their own segment. The other unidentified segments will
then either gain a new identity or take the identity of an old segment
from the database. The new segment takes the identity of the segment
with the highest Jaccard score in the database, but only if the score
exceeds the Jaccard threshold. If the threshold is not exceeded, the
segment is assigned a new identity.In the other case, the identified
segment could not match with
one of the unidentified segments. This will lead to the disappearance
of the identified segment. This could be caused by a leaflet splitting
into two roughly equal parts. For example, in the case of a bilayer
with a transient pore, closing of the pore would result in a split
of the bilayer in its previous two leaflet segments. The disappearing
segment will be stored in the database, while the new segments will
each be compared to the database. Each segment that has a match with
the database will continue with the identity of this match, while
the unmatched segments continue with new unique identities.The result after temporal segmentation is a consistent sequential
segmentation over the trajectory, both spatially and over time. Identities
between time frames are consistent, based on the atoms within each
segment. Merging segments are detected and identified in a size-dependent
way. Splitting segments are detected as well. A database of previous
segments allows identities to persist over a longer time frame.
Results
To validate the leaflet segmentation algorithm,
we prepared a wide
range of lipid systems using the Martini CG force field.[21] The same default input was used for all segmentation,
unless specifically stated otherwise. We start with simple lipid bilayer
systems in varying topologies (Figure ). We then add proteins and a high variety of compositional
complexity (Figure ). This is followed by a complex phase transition and fusion of DNA-lipid
complexes (lipoplexes) with endosome membrane models to illustrate
segmentation quality in the presence of tight, curved and complex
dynamic geometry (Figure ). We end with some more general real-world examples, including
an example at atomistic resolution (Figure ).
Figure 3
Leaflet segmentation and analysis for pure bilayer systems. (A,
B) Snapshots of a DOPC/CHOL(4:1) bilayer with and without force segmentation.
Some lipids were unassigned without force segmentation, because of
diving/occlusion (panel (A), yellow). Using force segmentation, nonsegmented
lipids are assigned to a segment as a postprocessing step (panel (B)).
(C) The flip-flop traces of a single cholesterol in the DOPC/CHOL
bilayer, comparing the z-height and segmentation
flip-flop approaches. Whenever the orange line is not visible, it
lies perfectly behind the green line. (D) Comparison of the upward
and downward cholesterol flip-flop events between the z-height and segmentation-based flip-flop analyses. A more intense
single dot represents more than one cholesterol flip-flopping at that
given instant. (E) Snapshot of closely stacked and intercalating bilayers
of DLPC. (F) Leaflet segmentation of the intercalating bilayers over
10 μs; the width of a segment represents its relative size.
(G) Snapshot of a DLPC bilayer with a toroidal pore (frame 750 of
panel (H)). (H) Leaflet segmentation of pore formation; the pore-forming
biasing potential was turned on between frame 500 and 1000, and the
width of a segment represents its relative size.
Figure 4
Leaflet segmentation in the presence of protein. (A) Adding protein
(lime) to a bilayer does not interfere with the leaflet assignment
of a CG POPC bilayer (red/blue). A video of the bilayer protein system
can be found in the SI. (B) The bottom and
top leaflet of the plasma membrane. Even extremely complex bilayers
such as the plasma membrane including many different proteins (transparent/lime)
and lipids (red/blue) do not hinder leaflet assignment. However, there
is a small cluster of nonassigned lipids (yellow) around one of the
proteins (highlighted box).
Figure 5
Leaflet segmentation and analysis for both tight and curved
complex
geometry. (A) The phase transition from a multilamellar bilayer system
to an H phase. Snapshots of the first,
intermediate and last segmented frame of a 1 μs trajectory.
The accessory video is available in the SI as Video 3. The corresponding graph shows all segments formed
in chronological order with the snapshots indicated with the dotted
lines. The width of a segment indicates its relative size. (B) Snapshots
of a lipoplex transfecting dsDNA over a bilayer (first, intermediate
and last segmented frame). The inner core (orange, yellow, green,
gray) is an intact H phase containing
the dsDNA (not visible), whereas the upper bilayer leaflet (red) has
already fused to the outer leaflet of the lipoplex by means of a fusion
stalk (I). During transfection the inner core segments fuse with the
bottom leaflet of the bilayer (II), resulting in a single flat bilayer
with all the dsDNA on the other side (III). The transfection can be
tracked in detail using the segmentation graph. The width of a segment
indicates its relative size.
Figure 6
Additional systems. (A) Snapshot of a fatty acid vesicle
(red/gray),
surrounded by individual fatty acids in the medium (green); the amount
of fatty acids in the inner and outer leaflet of the vesicle over
time is shown as a graph, and a video of the fatty acid dimer is available
in the SI Video 5. (B) Snapshot of the
segmented mitochondrion (∼83 million beads). Four distinct
leaflets can be traced (green, yellow, purple, red) if the intersection
zones are not included in segmentation (white). If the white zones
are included in the segmentation, the pink and red leaflets are fused
together. (C) Snapshots of the atomistic POPC system in the H phase, as previously published.[8] Segmentation analysis reveals previously unnoticed
connections between the initial four channels at the end of the simulation.
The width of a segment in the graph indicates its relative size.
Pure Bilayer Systems
Leaflet assignment
for a simple
bilayer containing 270 DOPC and 66 cholesterol (4:1) without force
segmentation resulted in good segmentation, but some lipids—in
particular, cholesterol—are unlabeled (see orange components
in Figures A and 3C). By turning on force
segmentation, all lipids including the cholesterol could be assigned
successfully to a leaflet (see green components in Figures B and 3C).Leaflet segmentation and analysis for pure bilayer systems. (A,
B) Snapshots of a DOPC/CHOL(4:1) bilayer with and without force segmentation.
Some lipids were unassigned without force segmentation, because of
diving/occlusion (panel (A), yellow). Using force segmentation, nonsegmented
lipids are assigned to a segment as a postprocessing step (panel (B)).
(C) The flip-flop traces of a single cholesterol in the DOPC/CHOL
bilayer, comparing the z-height and segmentation
flip-flop approaches. Whenever the orange line is not visible, it
lies perfectly behind the green line. (D) Comparison of the upward
and downward cholesterol flip-flop events between the z-height and segmentation-based flip-flop analyses. A more intense
single dot represents more than one cholesterol flip-flopping at that
given instant. (E) Snapshot of closely stacked and intercalating bilayers
of DLPC. (F) Leaflet segmentation of the intercalating bilayers over
10 μs; the width of a segment represents its relative size.
(G) Snapshot of a DLPC bilayer with a toroidal pore (frame 750 of
panel (H)). (H) Leaflet segmentation of pore formation; the pore-forming
biasing potential was turned on between frame 500 and 1000, and the
width of a segment represents its relative size.A common lipid property to investigate is the flip-flop rate. We
compared the assigned flip-flopping of our leaflet segmentation to
a common procedure. This common analysis would assign a lipid flip-flop
based on the z-coordinate of a lipid headgroup, with
respect to the average z-position of the bilayer.
To prevent high-frequency noise due to headgroups at or close to the
membrane center, a deadzone is used. Lipids within the deadzone region
around the center are not assigned to a leaflet. The downside of the z-height approach is that it can only be used for small
patches of a relatively flat bilayer with its normal pointing along
the z-dimension. Our method takes a rather different
approach. We check for changes in leaflet membership for each lipid.
To reduce noise, a lipid that is not segmented is assigned to its
previous valid leaflet. The advantage of our segmentation-based method
over the conventional approach is that it still works if the membranes/leaflets
are curved, or if they are not a well-defined bilayer at all. In our
comparison of cholesterol flip-flopping, our algorithm performs well
which is indicated by the close relation between the segmentation
labels and the discretization of the lipid z-height
(Figure C). Segmentation
without force segmentation is good enough for lipid flip-flop calculations.
The segmentation based flip-flop analysis compares well with the z-height approach if a deadzone of ±0.6 nm is used
in the latter (Figure D). Although the flip-flop patterns for both methods are highly similar,
they are not the same; therefore, direct quantitative comparison between
both methods should be performed with care. However, if a z-height deadzone of ±0.3 nm is used, almost all flip-flops
indicated by the segmentation analysis are in the set of the z-height approach (see Figure S3 in the Supporting Information). In other words, it appears that
the flip-flops indicated by the segmentation are an almost-perfect
subset of the z-height approach with a relatively
small deadzone.Since stacked bilayers are notoriously difficult
to label correctly
in other methods, we designed a test system containing two closely
stacked and intercalating bilayers of POPC (on average two hydration
layers of CG water beads, Figure E). The close proximity and or intercalation of the
bilayers did not hinder correct leaflet segmentation at all over a
trajectory spanning 10 μs (Figure F).The final pure bilayer system
tested is a bilayer in which we artificially
opened and closed a pore using a biasing potential. We used the leaflet
segmentation to detect the instances of pore opening and closing (see Figures G and 3H, as well as Video 1 in the Supporting
Information).Consistent leaflet segmentation was successfully
obtained using
the default settings for all pure bilayer systems, even in the presence
of cholesterol or intercalating headgroups. The leaflet segmentation
can also be used for flip-flop analysis and the detection of a toroidal
pore with high accuracy.
Complex Bilayers with Proteins
After
confirming that
our leaflet segmentation performs well on simple lipid-only bilayer
systems, we went one step further and considered systems of higher
compositional complexity. As for the lipid only segmentation, segmentation
of a simple bilayer containing POPC lipids and a single protein is
perfect using the default settings (see Figure A, as well as Video 2 in the SI). Next, we tested a crowded
plasma membrane model. This model contains a high variety of lipids
(∼60) and a wide range of proteins in a planar membrane.[6] Segmentation resulted in an almost-perfect assignment
(Figure B). However,
there are 12 lipids that remain unlabeled (see the highlighted box
in Figure B), caused
by an exclusion region. Setting the force-segmentation cutoff radius
slightly higher resolved the issue, but the current result is shown
to demonstrate the quality of the default settings. Missing 12 lipids
out of tens of thousands of lipids would probably not affect the analysis
to any serious degree. However, if perfection is required, some tweaking
might be needed. For example, the force-segmentation range or the
binning distance could be altered, as well as the minimum cluster
size.Leaflet segmentation in the presence of protein. (A) Adding protein
(lime) to a bilayer does not interfere with the leaflet assignment
of a CG POPC bilayer (red/blue). A video of the bilayer protein system
can be found in the SI. (B) The bottom and
top leaflet of the plasma membrane. Even extremely complex bilayers
such as the plasma membrane including many different proteins (transparent/lime)
and lipids (red/blue) do not hinder leaflet assignment. However, there
is a small cluster of nonassigned lipids (yellow) around one of the
proteins (highlighted box).Based on these results, we conclude that MDVoxelSegmentation works
well on systems with a complex composition, even in the presence of
protein(s).
Curved and Tight Geometry
Until
now, all our test systems
were relatively flat; therefore, as our final test set, we introduce
heavy curvature as well as closely packed segmentation. In our previous
work, we showed the phase transition of a stacked bilayer system to
an inverted hexagonal phase (H),[4,22] which was used as a system that should push the time consistency
part of our algorithm, as well as the agonistic behavior, with respect
to the amount of segments. A snapshot of the first intermediate and
last frame of the trajectory is shown in Figure A. The corresponding graph shows that the segmentation is
extremely stable over time. A link to the video of the full phase
transition is given in the SI as Video 3. During transition, the following happens. First, the leaflets that
share a common water reservoir (Figure A, panel I) fuse together by multiple stalk formation
(Figure A, panel II).
The leaflets are then split into smaller segments, more closely resembling
the classic view of the H phase. However,
because of the high concentration of water in this system, not all
aquatic channels are singular and some remain connected through water
pores perpendicular to the view normal (Figure A, panel III). The final result is a stable
H phase of channels with a varying degree
of interconnection. Note that the visualization of the segments (which
is part of the output, as can be seen in the Methods section) immediately makes it clear which channels are connected
and which are not.Leaflet segmentation and analysis for both tight and curved
complex
geometry. (A) The phase transition from a multilamellar bilayer system
to an H phase. Snapshots of the first,
intermediate and last segmented frame of a 1 μs trajectory.
The accessory video is available in the SI as Video 3. The corresponding graph shows all segments formed
in chronological order with the snapshots indicated with the dotted
lines. The width of a segment indicates its relative size. (B) Snapshots
of a lipoplex transfecting dsDNA over a bilayer (first, intermediate
and last segmented frame). The inner core (orange, yellow, green,
gray) is an intact H phase containing
the dsDNA (not visible), whereas the upper bilayer leaflet (red) has
already fused to the outer leaflet of the lipoplex by means of a fusion
stalk (I). During transfection the inner core segments fuse with the
bottom leaflet of the bilayer (II), resulting in a single flat bilayer
with all the dsDNA on the other side (III). The transfection can be
tracked in detail using the segmentation graph. The width of a segment
indicates its relative size.In our previous work, we showed how transfection with lipoplexes
can be simulated using the Martini force field.[4] Such lipoplexes are complexes of lipids and dsDNA. The
dsDNA resides inside the inverted hexagonal phase of the lipoplex
lipids. Upon fusion with a membrane, the dsDNA is released into the
cytosol. The tight packing of the lipids with the dsDNA, in combination
with the change in topology during transfection, makes this another
challenging system to analyze. A rendering of the leaflet segmentation
is shown in Figure B. A link to the video of the complete transfection process is available
in the SI (Video 4). Although the segmentation
is pretty good, it does contain some noise, illustrated by minor flickering
in the video. This noise could be removed with a noise filter, but
we left it in for fair comparison and to create a realistic reference
for the quality of the output under comparable conditions. As for
pores in the bilayer, we can use the amount of leaflet segments identified
in the simulation as a reliable metric for fusion events (Figure B, graph). We did
find that decreasing the resolution of the leaflet segmentation for
complex geometry resulted in a steep decrease in the quality of segmentation.
For more simple geometry, 1 nm voxels without hyper-resolution resulted
in acceptable segmentation. However, for these systems, 0.5 nm voxels
with hyper-resolution are required.The presence of (high) curvature
does not hinder correct leaflet
segmentation, even if the lipid arrangement does not follow a strict
bilayer definition or if the amount of segments is highly dynamic.
Further Tests with Complex and Real-World Systems
The
leaflet segmentation was tested on three additional complex systems
to obtain familiarity with the quality. The first system contains
a small fatty acid vesicle (see Figure , as well as Video 5 in the SI). Around the vesicle, fatty
acids are suspended in a water buffer. The goal was to measure the
imbalance of molecules in the inner and outer leaflet of the vesicle
as it absorbs the surrounding fatty acids in solution. As the outer
leaflet of the vesicle merges with more and more small micelles in
solution, the imbalance increases, potentially resulting in division.
Some flip-flopping occurs to release the tension, because of the imbalance
between the inner and outer leaflet. However, there are two moments
at which the inner and outer leaflet of the vesicle temporarily fuse
by means of a toroidal pore. These pores quickly cause a significant
amount of acyl chains to migrate from the outer to the inner leaflet,
indicated by a jump in the inner leaflet’s size (Figure A). Passive flip-flop occurs
all the time and is indicated by the more continuous increase in size
of the inner leaflet. This example furthermore illustrates the use
of our segmentation tool to track flip-flop and pore formation events
with high resolution.Additional systems. (A) Snapshot of a fatty acid vesicle
(red/gray),
surrounded by individual fatty acids in the medium (green); the amount
of fatty acids in the inner and outer leaflet of the vesicle over
time is shown as a graph, and a video of the fatty acid dimer is available
in the SI Video 5. (B) Snapshot of the
segmented mitochondrion (∼83 million beads). Four distinct
leaflets can be traced (green, yellow, purple, red) if the intersection
zones are not included in segmentation (white). If the white zones
are included in the segmentation, the pink and red leaflets are fused
together. (C) Snapshots of the atomistic POPC system in the H phase, as previously published.[8] Segmentation analysis reveals previously unnoticed
connections between the initial four channels at the end of the simulation.
The width of a segment in the graph indicates its relative size.The second test was to assign the leaflets in the
showcase mitochondrion
model presented in the work by Pezeshkian et al.[10] For the mitochondrion, we expected to observe four separated
leaflets, namely the separate leaflets of both the inner and outer
membrane. However, after leaflet segmentation of the complete mitochondrion,
we only observed three segments. To further investigate this issue,
we cut the mitochondria in 20 slices. Each piece was segmented individually
(Figure B). This revealed
that there are issues in two of the 20 slices. When these two problematic
slices are excluded, the expected four leaflets were found by segmentation
analysis. This indicates that unexpected overlap is occurring in those
two slices, likely as a result of stalk formation between closely
apposed membrane segments. Further subdivision could be utilized to
pinpoint the exact location, although doing so by hand becomes rather
cumbersome. Nevertheless, this example shows that large structures
can be successfully handled by our algorithm and the results can be
used for quality control of systems too large for visual inspection
by traditional means.Finally, in our previous publication on
lipoplexes, we described
the existence of connective channels in an overly hydrated inverted
hexagonal lipid phase, shown in the example above on formation of
an H phase (see Figure A).[4] These channels
probably have huge effects on the fusion kinetics of such phases.
Related work was performed on an atomistic scale by Ramezanpour et
al., in which they did not describe any of such channels.[8] After contacting the group and presenting this
paradox to them, we were kindly allowed access to their data. We used
MDVoxelSegmentation to find any possible unexpected and undescribed
connective channels. The channels were confirmed to also exist in
the atomistic simulations, mostly forming at the middle to end of
the trajectory of their lipid system with the highest hydration (Figure C). The channel was
stable, but difficult to spot using the conventional tools available.
This due to the fact that it spanned over the PBC, a common problem
in periodic systems. MDVoxelSegmentation was able to classify the
channel, since it supports all PBC used for molecular simulations.
Checking the marked frames by eye confirmed that the results of MDVoxelSegmentation
were correct; the authors agreed after re-evaluating their data. The
formation of these pores was not observed at lower temperatures, suggesting
that there is a kinetic factor and that their simulations might benefit
from extension.Together, these examples show that MDVoxelSegmentation
is a fast
and useful tool for spotting complex segmentation properties that
are otherwise easily missed or extremely cumbersome to analyze.
Performance
To get some insight in the performance of the
current MDVoxelSegmentation
tool, we timed the segmentation for some of the test systems, using
a varying amount of threads. The amount of threads were varied to
test the parallel processing of MDVoxelSegmentation. In all these
tests, we turned off force segmentation, since it can have a large
impact on performance based on the exact system (especially at higher
maximum recursion depth). The results are summarized in Figure . From the single thread values, we can deduce that our algorithm
indeed scales pretty much linearly with some variance, because of
packing and complexity (see Table ). Systems containing thousands to millions of particles
of interest will roughly take between less than seconds to minutes
per frame, respectively. The biggest system we timed was the 83 million
particle model of mitochondrion,[10] which
took over 4 h for a single frame. The system of the mitochondrion
was the only system that could not be handled with 16GB of RAM and
needed a specialized node to run. Its memory consumption peaked at
150GB.
Figure 7
Single frame scalability. The time required to segment a single
frame, relative to increasing particle count. Exact data is represented
in Table . Some variation
from linearity is expected due to the fact that the specific geometry
of a system can have an impact on the complexity.
Table 1
Performance of MDVoxelSegmentationa
system
number of
particles
number of
frames
number of
threads
real time
user time
sec/frame
bilayer pore
10 436
1500
12
2m, 25s
21m, 41s
0.096
fatty acid growth
66 278
1000
12
3m, 35s
27m, 37s
0.215
bilayer pore
10 436
1500
1
13m, 13s
15m, 23s
0.529
fatty acid growth
66 278
1000
1
16m, 20s
18m, 30s
0.980
plasma membrane
1 343 450
1
1
4m, 35s
4m, 36s
275
mitochondrion
83 288 300
1
1
260m, 18s
–
15 618
All systems were run on the same
Ubuntu desktop, except for the mitochondrion. The test machine contained
a Ryzen 1600 CPU (6 cores, hyper threading, 3.8 GHz) and 16GB of DDR4
memory (2× 8GB, 2.133 MHz). No GPU acceleration is currently
implemented. The MDVoxelSegmentation parameters were the same as stated
in the examples above, except for force segmentation, which was turned
off (−fs 0). Timing was performed using the default Ubuntu
time functionality. The CPU for the mitochondrion system operated
at 2.4 GHz for a single thread.
Single frame scalability. The time required to segment a single
frame, relative to increasing particle count. Exact data is represented
in Table . Some variation
from linearity is expected due to the fact that the specific geometry
of a system can have an impact on the complexity.All systems were run on the same
Ubuntu desktop, except for the mitochondrion. The test machine contained
a Ryzen 1600 CPU (6 cores, hyper threading, 3.8 GHz) and 16GB of DDR4
memory (2× 8GB, 2.133 MHz). No GPU acceleration is currently
implemented. The MDVoxelSegmentation parameters were the same as stated
in the examples above, except for force segmentation, which was turned
off (−fs 0). Timing was performed using the default Ubuntu
time functionality. The CPU for the mitochondrion system operated
at 2.4 GHz for a single thread.
Discussion
Future
Development of MDVoxelSegmentation
In this manuscript,
we show the use of voxel-based approaches for the analysis of lipid-
and surfactant-based systems. However, our implementation is not optimal
yet. What we hope to achieve is to stimulate developers of MD software
who have a large user base (MDAnalysis,[12] VMD,[15] Pymol,[23] UnityMol[24]) to embrace voxels and add
user intuitive voxel APIs with common components analysis to their
packages. VMD already has a well-optimized grid density function,
which could be made more accessible for users, or even contain some
GUI. The VMD code could probably perform all operations in this manuscript
at runtime for systems of considerable size due to their efficient
usage of GPU(s). How the segmentation API could best be merged into
the current selection syntax is not trivial and will require careful
consideration.
Potential Extra Features
As shown
here, leaflet segmentation
can be solved with high fidelity, using only local checks and chronological
segmentation. We showed accurate segmentation of a wide range of lipid
systems with or without the presence of cholesterol, protein, and
or (extreme) curvature. Although the focus of this manuscript was
lipid leaflet detection, we would like to highlight some possible
additional features that further broaden the scope of the applications.
One of those features is the tracking of membrane pores and stalks.
This should use the same basic voxel operations such as erosion, growth,
and common neighbor segmentation, but results in tracking abstract
segments such as pores. This would allow us to treat pores as segments,
propagating them through the trajectory using the same selection syntax
and data structures. Current attempts show to be promising, although
this pore segmentation appears to rely on a good understanding of
subdividing the segmentation results. Some preliminary attempts of
the porefinding can be found in the MDVoxelSegmentation github pore
branches.Another possible application is in the analysis of
interfaces and the manner in which they percolate between set targets.
In this case, the interphase itself might be the entity tracked over
time using a distance query on multiple segments.Detecting
lipid lateral phase separation is another problem that
we think could be potentially addressed with our tool. In this case,
we should not map lipid densities to voxels, but rather lipid compositions.
In the voxeled compositions, a three-dimensional (3D) edge detection
algorithm could be used to find transitions in local composition.
Since we know that, for a phase, its composition should be constant
in the bulk region, we can start with finding volumes that are constant
in composition by taking the derivative. This would work similar to
the work reported by Sodt et al., where they demonstrated such an
approach on a two-dimensional (2D) grid.[25] Probably some form of compositional smoothing or dimensionality
reduction would be required to reduce noise in the spatial composition
of complex systems.Finally, the analysis is generic enough
to be used practically
without adaptations on lipid-related systems such as amphipathic proteins
or fatty acids, as seen in the SI (Videos 5 and 6), thus showing promise to be used
as a more general procedure for amphipathic systems.
Conclusions
We showed that consistent and accurate
leaflet segmentation can be obtained with our sequential segmentation
algorithm. The algorithm proved to be successful on a wide range of
lipid- and lipid-related systems. These systems contained both high
compositional and topological complexity, which were not segmentable
with any of the currently available methods known to us. To achieve
this, we used a voxel data structure, which allows for fast local
queries, segmenting each frame individually. To achieve consistent
segmentation over multiple frames, this was followed by set theory,
making use of the Jaccard index. In our tests, we show many examples
of CG Martini lipid systems, mostly performed with the same parameter
settings. These settings show very few artifacts, indicating that,
in many cases, no time is required to be spent on parameter tweaking.
Besides CG systems, we also examined an atomistic system for which
segmentation showed to be of high quality with only minor changes
needed to the default settings.Our algorithm practically scales
linearly with the increase in particles and is able to handle millions
of particles within minutes on a desktop machine. However, performance
gains of orders of magnitude could probably be achieved by proper
SIMD/GPU implementation[26] and careful memory
management using more-advanced voxel data structures.[16−18] The resulting segmentation is easy to work with in the python or
VMD programs and can directly be used for visualization, because of
its supported segmentation data format.The MDVoxelSegmentation
package will be expanded by our own attempts
to address problems within this framework. We do invite interested
researchers to contribute to the further development of the tool.
To add your own segmentation routines to the package, simply open
a ticket at our github and indicate your interest (https://github.com/marrink-lab/MDVoxelSegmentation).
Methods
Molecular Dynamics Simulations
To
simulate the lipid-only
systems, we made use of GROMACS 2019.1.[27] Martini 2 was used as the force field.[21] All lipid parameters were taken from cgmartini.nl and initial configurations
were generated using insane.[20] Each simulation
contained 0.15 M NaCl and 10% antifreeze (WF) to prevent unnatural
freezing of the water, as described in the original Martini paper.[21] Rectangular periodic boundary conditions were
employed to prevent artificial wall effects (xy 10
nm). The steepest descent algorithm was used for energy minimization
(1000 steps) and equilibration was performed using the default Martini
settings with a time step of 2 fs for 25 000 steps.[21,28] The Verlet cutoff scheme was used with a 1.1 cutoff for both the
Coulombic (reaction-field) and van der Waals interactions. The v-rescale
scheme (tau-p = 1 ps) was used for the thermostat at 310 K, coupling
the lipids and solvent/ions in two separate groups. Pressure coupling
during equilibration was performed using the Berendsen barostat for
semianisotropic systems (tau-p = 3 ps, compressibility = 3e-4 bar-1,
ref-p = 1.0 bar in each dimension).[29] The
production run made use of a 20 fs time step, and the pressure coupling
was switched to Parrinello–Rahman (tau-p = 12).[30]To simulate pores in the DLPC bilayer,
positional restraints were used in the form of a cylindrical biasing
potential acting on the C1A and C1B tail beads (GROMACS potential
function 2, shape 8, hole radius −1.2 nm, force constant 1000
kJ mol–1). The negative sign in front of the hole
radius indicates an inversion of the biasing potential in GROMACS.
A reference file was created with all beads at the same position,
acting as the pore center reference position (−r flag in GROMACS). The pore formation was simulated by running a
production run for 10 ns without the biasing potential, then the potential
was turned on for another 10 ns and turned off again for the final
10 ns.
Leaflet Segmentation
We used the same default settings
for all the presented segmentation, unless specified differently.
These settings used a binning resolution of 5 Å and made use
of hyper-resolution of 0.5 (adding points half the resolution away
from the source point). The headgroups, linkers, tails, and exclusions
were as specified in the default selection input. The minimum size
of a segment was 50 particles, and iterative force segmentation was
turned on with a maximum distance of 20 Å. An identity threshold
of 0.618 was used for the Jaccard threshold. All keyword arguments
represent default values that do not need to be specified, except
for the reference frame and trajectory file (any MDAnalysis-supported
MD format).For the segmentation of the atomistic H system, hyper-resolution was turned off (−force_segmentation
0). Generally, hyper-resolution is not required for atomistic systems.
The respective headgroups, tails, and linkers selections were specified
in the selections input file:The AA terminal input was:
Lipid Flip-Flop
The average z-height
of the cholesterol headgroups was subtracted from the headgroup z-position. This resulted in headgroups lying either above
or below zero. Headgroups with a normalized z-position
less than 0.6 nm away from 0 were labeled either up or down (discretization
of the z-height). Differentiating the individual
lipid traces results in peaks at flip-flop events. These peaks were
counted, registering if the flip occurred upward or downward. This
calculation was repeated for all cholesterols in all frames, and a
graph was made using python3 matplotlib.[31] The z-height and flip-flop analysis is available
at the MDVoxelSegmentation github.To show the flip-flopping
of cholesterol using our leaflet segmentation, we used the default
settings, except for the force segmentation, which was turned off.
Lipids that were not assigned to segment 0 were assigned to their
previous segment. A graph was made using matplotlib and the plotting
script included with mdvseg. The leaflet flip-flop analysis is available
at the MDVoxelSegmentation github.
Visualization
All figures and videos were rendered
using VMD and a little TCL script of MDVoxelSegmentation to load the
segmentation array. Examples of TCL scripts are provided together
with the code at https://github.com/marrink-lab/MDVoxelSegmentation.
Authors: M Ramezanpour; M L Schmidt; B Y M Bashe; J R Pruim; M L Link; P R Cullis; P E Harper; J L Thewalt; D P Tieleman Journal: Langmuir Date: 2020-06-12 Impact factor: 3.882
Authors: Alexander J Sodt; Michael Logan Sandar; Klaus Gawrisch; Richard W Pastor; Edward Lyman Journal: J Am Chem Soc Date: 2014-01-03 Impact factor: 15.419
Authors: Martin Vögele; Ramachandra M Bhaskara; Estefania Mulvihill; Katharina van Pee; Özkan Yildiz; Werner Kühlbrandt; Daniel J Müller; Gerhard Hummer Journal: Proc Natl Acad Sci U S A Date: 2019-06-17 Impact factor: 11.205