| Literature DB >> 27861492 |
Zobeida Jezabel Guzman-Zavaleta1, Claudia Feregrino-Uribe1.
Abstract
Passive content fingerprinting is widely used for video content identification and monitoring. However, many challenges remain unsolved especially for partial-copies detection. The main challenge is to find the right balance between the computational cost of fingerprint extraction and fingerprint dimension, without compromising detection performance against various attacks (robustness). Fast video detection performance is desirable in several modern applications, for instance, in those where video detection involves the use of large video databases or in applications requiring real-time video detection of partial copies, a process whose difficulty increases when videos suffer severe transformations. In this context, conventional fingerprinting methods are not fully suitable to cope with the attacks and transformations mentioned before, either because the robustness of these methods is not enough or because their execution time is very high, where the time bottleneck is commonly found in the fingerprint extraction and matching operations. Motivated by these issues, in this work we propose a content fingerprinting method based on the extraction of a set of independent binary global and local fingerprints. Although these features are robust against common video transformations, their combination is more discriminant against severe video transformations such as signal processing attacks, geometric transformations and temporal and spatial desynchronization. Additionally, we use an efficient multilevel filtering system accelerating the processes of fingerprint extraction and matching. This multilevel filtering system helps to rapidly identify potential similar video copies upon which the fingerprint process is carried out only, thus saving computational time. We tested with datasets of real copied videos, and the results show how our method outperforms state-of-the-art methods regarding detection scores. Furthermore, the granularity of our method makes it suitable for partial-copy detection; that is, by processing only short segments of 1 second length.Entities:
Mesh:
Year: 2016 PMID: 27861492 PMCID: PMC5115698 DOI: 10.1371/journal.pone.0166047
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Proposed visual fingerprint extraction.
Fingerprint weights.
| Th | CC | ORB | ||||
|---|---|---|---|---|---|---|
| Level | ||||||
| 1 | 0.25 | 1 | 0.25 | 1 | 0.50 | |
| 1 | 0.40 | 1 | 0.50 | −1 | - | |
| 1 | 0.40 | 1 | 0.60 | 0 | - | |
| 1 | 0.40 | 0 | - | 1 | 0.60 | |
In this table are presented the selected weights w, w and w, for each fingerprint set according to the matching level. In columns C, D and E the match results are indicated, where a value of 1 indicates fingerprint matched, 0 means fingerprint not found and −1 means fingerprint not matched.
Attacks parameters in ReTRiEVED dataset.
| Attack ID | Attack | Parameters | ||||||
|---|---|---|---|---|---|---|---|---|
| D | Delay (ms) | 100 | 300 | 500 | 800 | 1000 | - | - |
| J | Jitter (ms) | 1 | 2 | 3 | 4 | 5 | - | - |
| PLR | Packet Loss Rate (%) | 0.1 | 0.4 | 1 | 3 | 5 | 8 | 10 |
| R | Throughput (Mbps) | 0.5 | 1 | 2 | 3 | 5 | - | - |
In this table are presented the parameters of video attacks in ReTRiEVED dataset which are considered acceptable parameters of Quality of Service (QoS) based on ITU recommendations, opinion models for video telephony applications and ETSI recommendation on speech and multimedia transmission quality [48].
Additional attacks on ReTRiEVED video dataset.
| Attack ID | Attack | Parameters |
|---|---|---|
| FLIP | Flip | Horizontal flip |
| DQ1 | Frame rate change | 20 fps |
| Contrast change | +10% | |
| Noise Addition | Gaussian: mean = 0; variance = 0.01 | |
| DQ2 | Frame rate change | 20 fps |
| Brightness change | +10% | |
| Cropping | 10% of the frame border | |
| DQ3 | Frame dropping | 10% dropped |
| Rotation | +5 | |
| Brightness change | +10% | |
| FLIP + DQ2 | Flip | Horizontal flip |
| Frame rate change | 20 fps | |
| Brightness change | +10% | |
| Cropping | 10% of the frame border | |
| PROJ | Projection | |
| PROJ + DQ2 | Projection | |
| Frame rate change | 20 fps | |
| Brightness change | +10% | |
| Cropping | 10% of the frame border | |
| POST1 | Insertion of patterns | Standard image baboon.jpg 30x30 pixels |
| POST2 | Subtitles | 35 characters are inserted [ |
| POST3 | PiP | Original video resized to 90% at front |
| POST4 | PiP | Original video resized to 90% at front |
| Insertion of patterns | Standard image baboon.jpg 30x30 pixels | |
| Subtitles | 35 characters are inserted [ |
Common simulated visual transformations used in state-of-the-art experimental results [14] and combinations of them.
F1-scores for delay attacks.
| Delay (ms) | |||||
|---|---|---|---|---|---|
| 100 | 300 | 500 | 800 | 1000 | |
| CST-SURF | 0.5263 | 0.3283 | 0.3425 | 0.3384 | 0.3333 |
| ST&V Hash | 0.9818 | 0.9629 | 0.9642 | 0.9306 | 0.9622 |
| CC | 1 | 0.9818 | 0.9734 | 0.9615 | 0.9719 |
| Th + CC + ORB (our proposed method) | 0.9931 | 0.9933 | 0.9930 | 0.9929 | 0.9931 |
F1 detection scores for different fingerprints under common simulated attacks.
| Attack ID | CST-SURF | ST&V Hash | CC | Th + CC + ORB (our proposed method) |
|---|---|---|---|---|
| None | 1 | 1 | 1 | 1 |
| FLIP | 0.2727 | 0.3714 | 1 | 1 |
| DQ1 | 0.0740 | 0.9052 | 0.9278 | 1 |
| DQ2 | 0.2909 | 0.8536 | 0.9069 | 0.9911 |
| DQ3 | 0.2173 | 0.7941 | 0.9350 | 0.9916 |
| FLIP + DQ2 | 0.0377 | 0.2666 | 0.9494 | 1 |
| PROJ | 0 | 0.5714 | 0.9655 | 0.7407 |
| PROJ + DQ2 | 0.1290 | 0.5853 | 0.7659 | 0.8367 |
| POST1 | 1 | 0.9914 | 1 | 1 |
| POST2 | 0.9142 | 0.9821 | 1 | 1 |
| POST3 | 0.1714 | 0.6086 | 0.7450 | 0.7890 |
| POST4 | 0.1379 | 0.7142 | 0.6153 | 0.7865 |
F1-scores under throughput attack.
| R(Mbps) | |||||
|---|---|---|---|---|---|
| 0.5 | 1 | 2 | 3 | 5 | |
| CST-SURF | 0.0895 | 0.3142 | 0.3188 | 0.3380 | 0 |
| ST&V Hash | 0.8965 | 0.9557 | 0.9642 | 0.9557 | 0.5915 |
| CC | 0.9593 | 0.9914 | 0.9913 | 0.9914 | 0.7341 |
| Th + CC + ORB (our proposed method) | 1 | 1 | 1 | 1 | 0.7475 |
Fig 2Example of false positive alarm.
The figures correspond to matched keyframes of different videos in VCDB benchmark [10]: (a) Query video segment (baggio_penalty_1994), (b) (the_last_samurai_last_battle) correlation = 0.889; (c) (endless_love) correlation = 0.776; (d) corresponding keyframe of original video (baggio_penalty_1994) correlation = 0.674.
Execution times for different fingerprint extraction types.
| Method | Ex.Time (seconds) |
|---|---|
| CC [ | 0.0204 |
| ST&V Hash [ | 0.4373 |
| CST-SURF [ | 0.4898 |
| Th + CC + ORB(our proposed method) | 0.0440 |
F1-scores for PLR attacks.
| Packet Loss Rate (PLR%) | |||||||
|---|---|---|---|---|---|---|---|
| 0.1 | 0.4 | 1 | 3 | 5 | 8 | 10 | |
| CST-SURF | 0.4383 | 0.1587 | 0.3283 | 0.4657 | 0.3188 | 0.1311 | 0.1818 |
| ST&V Hash | 0.9345 | 0.9259 | 0.9230 | 0.9818 | 0.9158 | 0.9038 | 0.8785 |
| CC | 0.9821 | 0.9824 | 0.9909 | 0.9724 | 0.9824 | 0.9541 | 0.9473 |
| Th + CC + ORB (our proposed method) | 0.9929 | 0.9861 | 0.9930 | 1 | 1 | 0.9931 | 0.9932 |
F1-scores for jitter attack.
| Jitter (ms) | |||||
|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | |
| CST-SURF | 0.5066 | 0.1791 | 0.0563 | 0.0563 | 0.0540 |
| ST&V Hash | 0.9818 | 0.9391 | 0.8688 | 0.8205 | 0.8000 |
| CC | 0.9532 | 0.9009 | 0.8960 | 0.8870 | 0.8292 |
| Th + CC + ORB (our proposed method) | 0.9929 | 0.9932 | 0.9801 | 0.9664 | 0.9655 |