| Literature DB >> 27462364 |
Li Huang1, Hongkyun Kim2, Jacob Furst1, Daniela Raicu1.
Abstract
The nematode Caenorhabditis elegans explores the environment using a combination of different movement patterns, which include straight movement, reversal, and turns. We propose to quantify C. elegans movement behavior using a computer vision approach based on run-length encoding of step-length data. In this approach, the path of C. elegans is encoded as a string of characters, where each character represents a path segment of a specific type of movement. With these encoded string data, we perform k-means cluster analysis to distinguish movement behaviors resulting from different genotypes and food availability. We found that shallow and sharp turns are the most critical factors in distinguishing the differences among the movement behaviors. To validate our approach, we examined the movement behavior of tph-1 mutants that lack an enzyme responsible for serotonin biosynthesis. A k-means cluster analysis with the path string-encoded data showed that tph-1 movement behavior on food is similar to that of wild-type animals off food. We suggest that this run-length encoding approach is applicable to trajectory data in animal or human mobility data.Entities:
Mesh:
Year: 2016 PMID: 27462364 PMCID: PMC4944090 DOI: 10.1155/2016/3516089
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 1Run-length encoding descriptor extraction: (a) video data for one worm tracking recording; (b) segmented images with worm body pixels shown in white; (c) worm body centroid extraction; (d) step-length, angle, and speed feature extraction; (e) movement pattern discovery (1 and 2 symbols represent the encoding of the path using clustering); (f) extraction of run-length encoding descriptors.
Figure 2Path similarity: clustering is applied on the worm data in which each video is encoded as a set of run-length encoding descriptors. The result consists of groups of worms that follow similar paths.
String symbols definitions resulting from k-means clustering (k = 2).
| Cluster symbol | Cluster name | Significant feature | Movement pattern path encoding |
|---|---|---|---|
| 1 | Sharp turn |
| See |
| 2 | Shallow turn |
|
Four run-length encoding matrices.
| Matrix | Example (using N2_f1) |
|---|---|
| IMF |
|
|
| |
| AMF |
|
|
| |
| BAMF |
|
|
| |
| BAAMF |
|
Run-length encoding descriptors.
| Formula | Description |
|---|---|
|
| Short run encoding (SRE) measures the distribution of short runs |
|
| |
|
| Long run encoding (LRE) measures the distribution of long runs |
|
| |
|
| Low angle run encoding (LARE) measures the distribution of the small angles |
|
| |
|
| High angle run encoding (HARE) measures the distribution of the large angles |
|
| |
|
| Short run low angle encoding (SRLAE) measures the joint distribution of short runs and sharp turns |
|
| |
|
| Short run high angle encoding (SRHAE) measures the joint distribution of short runs and shallow turns |
|
| |
|
| Large run low angle encoding (LRLAE) measures the joint distribution of large runs and sharp turns |
|
| |
|
| Large run low angle encoding (LRHAE) measures the joint distribution of large runs and shallow turns |
|
| |
|
| Angle level nonuniformity (ALN) measures the similarity of angle level distributions |
|
| |
|
| Run length nonuniformity (RLN) measures the similarity of run length distributions |
|
| |
|
| Run Percentage (RP) measures the homogeneity of the distribution of runs |
Path sequences that illustrate different values for the RLE descriptors.
| Example of paths | Description |
|---|---|
|
| Small Short Run Emphasis (SRE) |
|
| |
|
| Large Short Run Emphasis (SRE) |
|
| |
|
| Small Short Run High Angle-Level Emphasis (SRHAE) |
Step data ranges for clustering results when the number of clusters k is varied from 2 to 5.
| Number of clusters | Cluster ID | Angle (degree) | Step length (mm) | Speed (mm/s) |
|---|---|---|---|---|
|
| Cluster 1 | 0~84.32 | 0~23.84 | 0~8.49 |
| Cluster 2 | 82.64~180 | 0~32.21 | 0~1.94 | |
|
| ||||
|
| Cluster 1 | 0~53.08 | 0~23.84 | 0~8.49 |
| Cluster 2 | 52.16~116.87 | 0~32.21 | 0~6.56 | |
| Cluster 3 | 116.2~180 | 0~16.84 | 0~1.34 | |
|
| ||||
|
| Cluster 1 | 0~35.91 | 0~23.84 | 0~1.48 |
| Cluster 2 | 126~180 | 0~16.84 | 0~1.32 | |
| Cluster 3 | 35.34~77.21 | 0~17.87 | 0~8.49 | |
| Cluster 4 | 77.21~126.43 | 0~32.21 | 0~1.94 | |
|
| ||||
|
| Cluster 1 | 0~31.29 | 0~23.84 | 0~1.48 |
| Cluster 2 | 146.1~180 | 0~16.84 | 0~1.32 | |
| Cluster 3 | 106.3~146.1 | 0~15.65 | 0~1.34 | |
| Cluster 4 | 68.02~107.15 | 0~32.21 | 0~1.94 | |
| Cluster 5 | 30.46~68.04 | 0~17.87 | 0~8.49 | |
Figure 3Step data distribution for k = 2: (a, b) represent histograms for step-length and speed, for cluster 1; (c, d) represent the same features for cluster 2; the mean value for each cluster is also represented as a thin blue vertical line.
Selected feature for each type run-length encoding matrix.
| Type of RLE matrix | Selected descriptors |
|---|---|
| IMF | RP, LRHAE, ALN, SRHAE |
| AMF | RP, LRHAE, ALN |
| BAMF | RP, ALN |
| BAAMF | RP, LRHAE, SRHAE |
| IMFL | RP, SRE, ALN |
| AMFL | RP, ALN |
Cluster makeup based on RLE descriptors; each cell number represents the ratio between the number of worms of a specific type that fall under that cluster and the total number of worms in that cluster; the numbers in parentheses represent the ratio between the number of worms of a certain type which fall under that cluster and the total number of worms of that type.
| Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4 | Cluster 5 | Total # of videos | ||
|---|---|---|---|---|---|---|---|
|
| N2_f | 0 | 0 | 0 | 0.12 (25%) |
| 4 |
| N2_nf |
|
|
|
| 0 | 33 | |
| N2_nnf | 0.08 (50%) | 0 | 0.03 (50%) | 0 | 0 | 2 | |
|
| 0 | 0 |
|
| 0 | 13 | |
|
|
|
|
| 0 | 0 | 5 |