| Literature DB >> 35634127 |
Hu Wang1, Jianpeng Hu1.
Abstract
Automatic lecture recording is an appealing alternative approach to manually recording lectures in the process of online course making as it can to a large extent save labor cost. The key of the automatic recording system is lecturer tracking, and the existing automatic tracking methods tend to lose the target in the case of lecturer's rapid movement. This article proposes a lecturer tracking system based on MobileNet-SSD face detection and Pedestrian Dead Reckoning (PDR) technology to solve this problem. First, the particle filter algorithm is used to fuse the PDR information with the rotation angle information of the Pan-Tilt camera, which can improve the accuracy of detection under the tracking process. In addition, to improve face detection performance on the edge side, we utilize the OpenVINO toolkit to optimize the inference speed of the Convolutional Neural Networks (CNNs) before deploying the model. Further, when the lecturer is beyond the camera's field of view, the PDR auxiliary module is enabled to capture the object automatically. We built the entire lecture recording system from scratch and performed the experiments in the real lectures. The experimental results show that our system outperforms the systems without a PDR module in terms of the accuracy and robustness.Entities:
Keywords: Face detection; Particle filter; Pedestrian dead reckoning (PDR); Wireless communication
Year: 2022 PMID: 35634127 PMCID: PMC9137921 DOI: 10.7717/peerj-cs.971
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1Positioning method based on particle filter.
Figure 2PDR principle block diagram.
Figure 3Schematic diagram of coordinate system (on the left is the mobile phone coordinate system, and on the right is the earth coordinate system).
Sensor parameters.
| Sensor type | Output data and illustration |
|---|---|
| Rotation-vector sensor | |
Figure 4Heading angle of different carrying modes.
Figure 5The framework of the proposed system.
Figure 6Network implementation of the MobileNet.
Figure 7OpenVINO optimization process.
Comparison of detection accuracy.
| Model | Shape for input | Backbone | AP (%) |
|---|---|---|---|
| Face-detection-adas-0001 |
| Mobilenet | 94.1 |
| Face-detection-adas-binary-0001 |
| Mobilenet | 91.9 |
| Face-detection-retail-0001 |
| SqueezeNet-light+ssd | 83 |
Figure 8Camera’s field of view.
Figure 9Control strategy of the PT camera.
Figure 10Experimental environment.
Figure 11Schematic diagram of the experimental scene.
The important distance.
| Scenario | The distance between the PT camera and the lecturer (m) | The range of the lecturer’s movement (m) |
|---|---|---|
| Conference room | 1.2 | 3.0 |
| Laboratory | 1.5 | 4.2 |
| classroom | 1.8 | 4.8 |
Figure 12Devices used in experiments.
(A) Hardware connection diagram. (B) App interface.
Hardware and software platforms.
| Hardware | Software |
|---|---|
| Arduino uno R3 | Arduino IDE 1.8.13 |
| Servo moto HWZ020 | |
| Philips web-camera | |
| MI 11 phone | |
| Adjustable voltage regulator chip | |
| Power supply battery |
Results in three different scenarios.
|
|
|
|
|
| |
|---|---|---|---|---|---|
| Video | Recorded in the conference room | ||||
| Video1 | 1,355 | 920 | 1,250 | 67.89 | 92.25 |
| Video2 | 1,840 | 1,160 | 1,625 | 63.03 | 88.31 |
| Video3 | 2,320 | 1,525 | 2,105 | 65.72 | 90.73 |
| Video4 | 1,785 | 1,150 | 1,595 | 64.43 | 89.36 |
| Video5 | 2,585 | 1,710 | 2,320 | 66.15 | 89.75 |
| Video6 | 2,465 | 1,595 | 2,230 | 64.71 | 90.47 |
| Average | 2,058 | 1,343 | 1,854 | 65.32 | 90.15 |
| Video | Recorded in the laboratory | ||||
| Video7 | 1,300 | 810 | 1,150 | 62.30 | 88.46 |
| Video8 | 2,390 | 1,525 | 2,035 | 63.81 | 85.15 |
| Video9 | 2,030 | 1,245 | 1,775 | 61.33 | 87.44 |
| Video10 | 1,985 | 1,090 | 1,670 | 54.91 | 84.13 |
| Video11 | 2,230 | 1,255 | 1,905 | 56.28 | 85.43 |
| Video12 | 1,920 | 1,135 | 1,685 | 59.11 | 87.76 |
| Average | 1,976 | 1,176 | 1,703 | 59.62 | 86.40 |
| Video | Recorded in the classroom | ||||
| Video13 | 1,200 | 785 | 1,095 | 65.42 | 91.25 |
| Video14 | 1,345 | 870 | 1,235 | 64.68 | 91.82 |
| Video15 | 2,175 | 1,285 | 1,875 | 59.08 | 86.21 |
| Video16 | 2,670 | 1,665 | 2,435 | 62.36 | 91.20 |
| Video17 | 2,370 | 1,435 | 2,075 | 60.55 | 87.55 |
| Video18 | 1,875 | 1,130 | 1,660 | 60.27 | 88.53 |
| average | 1,939 | 1,195 | 1,729 | 62.06 | 89.42 |
Figure 13Distance analysis.
The horizontal field of view range.
| The field of view | ||||
|---|---|---|---|---|
| Conference room | 1.2 | 1.330 | 29 | 3.0 |
| Laboratory | 1.5 | 1.662 | 29 | 4.2 |
| Classroom | 1.8 | 1.996 | 29 | 4.8 |
Error in three scenarios.
| Scenario | Runtime (min) | Slight error count | Severe error count | Duration (s) |
|---|---|---|---|---|
| Conference room | 10 | 4 | 2 | 42 |
| Classroom | 10 | 4 | 2 | 50 |
| Laboratory | 10 | 5 | 3 | 62 |
Impact of carrying mode on the system.
|
|
|
|
|
| |
|---|---|---|---|---|---|
| Video | Recorded in the coat pocket mode | ||||
| Video1 | 1,420 | 850 | 1,210 | 59.86 | 85.21 |
| Video2 | 1,990 | 1,195 | 1,675 | 60.05 | 84.17 |
| Video3 | 2,050 | 1,205 | 1,725 | 58.78 | 84.15 |
| Video4 | 1,750 | 1,075 | 1,515 | 61.43 | 86.57 |
| Video5 | 2,155 | 1,275 | 1,855 | 59.16 | 86.07 |
| Video6 | 2,415 | 1,455 | 2,100 | 60.25 | 86.96 |
| Average | 1,963 | 1,175 | 1,680 | 59.92 | 85.52 |
| Video | Recorded in the pant pocket mode | ||||
| Video7 | 1,300 | 765 | 1,095 | 58.85 | 84.23 |
| Video8 | 1,645 | 970 | 1,405 | 58.97 | 85.41 |
| Video9 | 2,055 | 1,195 | 1,760 | 58.15 | 85.64 |
| Video10 | 2,470 | 1,405 | 2,035 | 56.88 | 82.39 |
| Video11 | 2,320 | 1,335 | 1,975 | 57.30 | 84.76 |
| Video12 | 1,875 | 1,060 | 1,570 | 56.53 | 83.73 |
| Average | 1,944 | 1,121 | 1,640 | 57.78 | 84.36 |
Videos without PDR auxiliary module.
| Video | System without PDR auxiliary system in classroom | ||||
|---|---|---|---|---|---|
|
|
|
|
|
| |
| Video1 | 1,300 | 635 | 785 | 48.89 | 60.39 |
| Viedo2 | 1,890 | 860 | 1,100 | 45.87 | 58.20 |
| Video3 | 2,080 | 990 | 1,240 | 49.09 | 59.62 |
| Video4 | 2,445 | 1,155 | 1,425 | 45.67 | 58.28 |
| Video5 | 1,875 | 875 | 1,130 | 47.08 | 60.27 |
| Video6 | 2,075 | 945 | 1,235 | 45.48 | 59.52 |
| Average | 1,944 | 910 | 1,152 | 47.01 | 59.38 |
The comparison of error count.
| Error count | Runtime (min) | Slight error count | Severe error count | Duration (s) |
|---|---|---|---|---|
| Proposed system | 10 | 4 | 2 | 50 |
| System without PDR module | 10 | 8 | 6 | 142 |
Figure 14Positioning error analysis of the particle filter algorithm.
Comparative analysis of existing lecturer tracking approaches.
| Technology | Advantages | Disadvantages |
|---|---|---|
| Multi camera ( | Easy deployed on the indoor and outdoor localizations | Multi-camera fusion requires post-processing |
| Panoramic camera ( | Convenient construction and low cost | Image distortion needs to be corrected |
| WIFI-PDR Integrated Indoor Positioning ( | Low cost, high positioning accuracy | Need to collect offline WiFi information and non-real time |
| Camera, IR thermal sensors ( | Low cost, real-time with good performance | Difficult to deploy, thermal sensors are sensitive to temperature |
| Siamese Fully Convolutional Classification and Regression ( | Fast and accurate | High requirements for equipment performance |
| PT Camera and smartphone (the proposed system) | Portable, low-cost, strong robustness and easy to deploy | Only single-face detection may lead to temporary tracking failure |