| Literature DB >> 29628624 |
Anastassia Angelopoulou1, Jose Garcia Rodriguez2, Sergio Orts-Escolano2, Gaurav Gupta3, Alexandra Psarrou1.
Abstract
This work presents the design of a real-time system to model visual objects with the use of self-organising networks. The architecture of the system addresses multiple computer vision tasks such as image segmentation, optimal parameter estimation and object representation. We first develop a framework for building non-rigid shapes using the growth mechanism of the self-organising maps, and then we define an optimal number of nodes without overfitting or underfitting the network based on the knowledge obtained from information-theoretic considerations. We present experimental results for hands and faces, and we quantitatively evaluate the matching capabilities of the proposed method with the topographic product. The proposed method is easily extensible to 3D objects, as it offers similar features for efficient mesh reconstruction.Entities:
Keywords: Clustering; Minimum description length; Self-organising networks; Shape modelling
Year: 2016 PMID: 29628624 PMCID: PMC5878838 DOI: 10.1007/s00521-016-2579-y
Source DB: PubMed Journal: Neural Comput Appl ISSN: 0941-0643 Impact factor: 5.606
Fig. 1The first row shows the original GNG while the second row shows the modified GNG. With the modified GNG any wrong corrections to corners and curvatures have been eliminated
Topology Preservation measures of the original vs. modified GNG with respect to frames per second (fps)
| Shape | Nodes | Original GNG | Modified GNG | ||||
|---|---|---|---|---|---|---|---|
| Fps | QE | TE | Fps | QE | TE | ||
| Star-4 | 71 | 1.16 | 2.7551 | 0 | 7.30 | 2.6375 | 0 |
| Star-6 | 74 | 1.11 | 2.9564 | 0 | 6.06 | 2.9073 | 0.0014 |
| Cloud | 97 | 0.61 | 2.7275 | 0 | 5.26 | 2.6561 | 0 |
| Heart | 70 | 1.38 | 2.9337 | 0 | 5.24 | 2.9347 | 0 |
| Lightning | 71 | 1.04 | 2.9391 | 0 | 6.99 | 2.8138 | 0 |
Fig. 2First shape of each of the first 10 objects in coil-100, showing the original image, the thresholded region, and the modified GNG contour representation
Fig. 3Modification of the GNG network to eliminate multiple connections and to attempt to reduce the network to a single series of sequentially linked nodes. Model A is the original network with the wrong connections (circled corners), while model B is our modified network
Fig. 4Likelihood node ratios for images with same image resolution but different skin to background ratio. a Network adaptation to images of 46,332 pixels with maps of 102 and 162 nodes. b Network adaptation to images of 21,903 pixels with maps of 46 and 132 nodes
Fig. 5a Plot of hand distributions. b Plot of the MDL values versus the number of cluster centres. The Minimum Description Length MDL(K) is calculated for all cluster configurations with clusters, and a global minimum is determined at 9 (circled point)
Fig. 6Network convergence for two sets of images after a sequence of k frames. The network is defined by the shape and the movement of the nodes depend on the posterior probability P(g(x, y)). The higher the probability of a node to belong to the skin prior probability, the faster the node will re-adjust its position to the new input distribution (black dot)
Fig. 7Some common gestures used in sign language
Fig. 8Examples of gestures in three different cluttered backgrounds
Fig. 9Example of correctly detected hands and face based on the golden ratio regardless of the scale and the position of the hands and the face. a Original image, b after applying EM to segment skin region, and c hand and face detector taking into account the connected nodes in the networks as well as the percentage of skin in the rectangular area
Topology preservation and processing time using the quantisation error and the topology preservation error for different variants
| Variant | Number of nodes | Time (s) | QE | TE |
|---|---|---|---|---|
|
| 23 | 0.22 | 8.932453 | 0.4349 |
|
| 122 | 0.50 | 5.393949 | −0.3502 |
|
| 168 | 0.84 | 5.916987 | −0.0303 |
|
| 23 | 0.90 | 8.024549 | 0.5402 |
|
| 122 | 2.16 | 5.398938 | 0.1493 |
|
| 168 | 4.25 | 4.610572 | 0.1940 |
|
| 23 | 1.13 | 0.182912 | −0.0022 |
|
| 122 | 2.22 | 0.172442 | 0.3031 |
|
| 168 | 8.30 | 0.169140 | −0.0007 |
|
| 23 | 1.00 | 0.188439 | 0.0750 |
|
| 122 | 12.02 | 0.155153 | 0.0319 |
|
| 168 | 40.98 | 0.161717 | 0.0111 |
Fig. 10a, b Distribution of two different hand shapes with plotted MDL(K) values within the range of and a global minimum at 9 (circled point). a, b also show the likelihood term that measures the model fit and the penalty; both of which grow with the number of used clusters
The topology preservation error for gestures (a–d)
| Image (a) | Image (b) | Image (c) | Image (d) | ||||
|---|---|---|---|---|---|---|---|
| Nodes | TE | Nodes | TE | Nodes | TE | Nodes | TE |
| 26 | −0.0301623 | 26 | −0.021127 | 24 | −0.017626 | 19 | −0.006573 |
| 51 | −0.030553 | 51 | −0.021127 | 47 | −0.047098 | 37 | −0.007731 |
| 77 | 0.04862 | 77 | 0.044698 | 71 | 0.046636 | 56 | 0.027792 |
| 102 | 0.048256 | 102 | 0.021688 | 95 | 0.017768 | 75 | 0.017573 |
| 128 | 0.031592 | 128 | 0.011657 | 119 | 0.014589 | 94 | 0.018789 |
| 153 | 0.038033 | 153 | 0.021783 | 142 | 0.018929 | 112 | 0.016604 |
| 179 | 0.047636 | 179 | 0.017223 | 166 | 0.017465 | 131 | 0.017755 |
| 205 | 0.038104 | 205 | −0.013525 | 190 | 0.017718 | 150 | 0.007332 |
| 230 | 0.037321 | 230 | 0.017496 | 214 | −0.007543 | 168 | 0.007575 |
Fig. 11Time taken to insert the maximum number of nodes per dataset
Fig. 12Gestures with different levels of Gaussian noise. From left to right mean = 0, sigma = 0; mean = 0, sigma = 0.25
Error measurements for modified GNG, Kohonen and GCS
| Gestures | Method | Nodes | RMS | TE |
|---|---|---|---|---|
| Gesture-three fingers (sigma = 0) | Modified GNG | 21 |
|
|
| Gesture-three fingers (sigma = 0) | Kohonen | 25 | 1.6410 | 0.172629 |
| Gesture-three fingers (sigma = 0) | GCS | 30 | 0.5494 | 0.159913 |
| Gesture-three fingers (sigma = 0.25) | Modified GNG | 21 |
|
|
| Gesture-three fingers (sigma = 0.25) | Kohonen | 25 | 2.6578 | 0.237586 |
| Gesture-three fingers (sigma = 0.25) | GCS | 30 | 1.6134 | 0.241429 |
| Gesture-thumb (sigma = 0) | Modified GNG | 25 |
|
|
| Gesture-thumb (sigma = 0) | Kohonen | 30 | 0.5376 | 0.194685 |
| Gesture-thumb (sigma = 0) | GCS | 31 | 0.3144 | 0.176336 |
| Gesture-thumb (sigma = 0.25) | Modified GNG | 25 |
|
|
| Gesture-thumb (sigma = 0.25) | Kohonen | 30 | 0.6956 | 0.242131 |
| Gesture-thumb (sigma = 0.25) | GCS | 31 | 0.3956 | 0.239292 |
| Gesture-open hand (sigma = 0) | Modified GNG | 23 |
|
|
| Gesture-open hand (sigma = 0) | Kohonen | 25 | 3.4727 | 0.146884 |
| Gesture-open hand (sigma = 0) | GCS | 27 | 2.3790 | 0.150354 |
| Gesture-open hand (sigma = 0.25) | Modified GNG | 23 |
|
|
| Gesture-open hand (sigma = 0.25) | Kohonen | 25 | 3.5340 | 0.240014 |
| Gesture-open hand (sigma = 0.25) | GCS | 27 | 2.4599 | 0.112732 |
Bold numbers demonstrate the lowest errors for our modified GNG network
Fig. 13Tracking a gesture. The images correspond from left to right and from top to bottom to every 10th frame of a 190 frame sequence. In each image the red points indicate the nodes and their adaptation after 4 iterations (colour figure online)
Fig. 14a Manual initialisation of the snake. b–d Adaptation of the snake after a number of iterations
Parameters and performance for snake
| Hand | Constants | Iterations | Time (s) |
|---|---|---|---|
| Sequence (a) |
| 40 | 15.29 |
|
| |||
|
| |||
|
| |||
|
| |||
|
| |||
| Sequence (b) |
| 50 | 15.20 |
|
| |||
|
| |||
|
| |||
|
| |||
|
| |||
| Sequence (c) |
| 40 | 12.01 |
|
| |||
|
| |||
|
| |||
|
| |||
|
| |||
| Sequence (d) |
| 20 | 5.60 |
|
| |||
|
| |||
|
| |||
|
| |||
|
|
Convergence and execution time results of modified GNG and snake
| Method | Convergence (iteration times) | Time (s) |
|---|---|---|
| Snake | 20 | 5.60 |
| 40 | 12.01 | |
| 50 | 15.20 | |
| 40 | 15.29 | |
| Modified GNG | 2 | 0.73 |
| 2 | 1.22 | |
| 3 | 2.17 | |
| 5 | 4.88 |
Fig. 15The two images on the left represent the raw data obtained from the low-cost sensor Kinect. The wire-frame representation generated by the original GNG is shown on the right
Fig. 16GNG 3D surface reconstructions. 3D reconstruction of different hand poses obtained using the Kinect sensor
Fig. 17GNG 3D surface reconstructions. 3D reconstruction of two 3D hand models generated using CAD software
Fig. 18Mean error based on the number of neurons that compose the network
Fig. 19GNG 3D reconstructions. Top 3D face reconstruction from data obtained using the Kinect sensor. Bottom 3D face reconstruction from data synthetically generated using the Blensor software