| Literature DB >> 35685157 |
Abstract
In this paper, we analyze the construction of cross-media collaborative filtering neural network model to design an in-depth model for fast video click-through rate projection based on cross-media collaborative filtering neural network. In this paper, by directly extracting the image features, behavioral features, and audio features of short videos as video feature representation, more video information is considered than other models. The experimental results show that the model incorporating multimodal elements improves AUC performance metrics compared to those without multimodal features. In this paper, we take advantage of recurrent neural networks in processing sequence information and incorporate them into the deep-width model to make up for the lack of capability of the original deep-width model in learning the dependencies between user sequence data and propose a deep-width model based on attention mechanism to model users' historical behaviors and explore the influence of different historical behaviors of users on current behaviors using the attention mechanism. Data augmentation techniques are used to deal with cases where the length of user behavior sequences is too short. This paper uses the input layer and top connection when introducing historical behavior sequences. The models commonly used in recent years are selected for comparison, and the experimental results show that the proposed model improves in AUC, accuracy, and log loss metrics.Entities:
Mesh:
Year: 2022 PMID: 35685157 PMCID: PMC9173947 DOI: 10.1155/2022/4951912
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Flowchart of extensive data collection, processing, and storage of transmedia technology.
Figure 2Collaborative filtering model of convolutional neural network with outer vector product.
Audio feature extraction results.
| Function number | Function name | Description |
|---|---|---|
| 1 | Chromaticity deviation | Standard deviation of 12 chromaticity coefficients |
| 2 | Chromaticity vector | The 12 elements of spectral energy represent the 12 isothermal pitch classes (semitone spacing) of Western music |
| 3 | Mel's inverse spectral coefficient | Mel frequency cepstrum coefficients forming the cepstrum representation, where the frequency bands are not linear but have to be distributed according to the Mel scale |
| 4 | Spectral roll-off point | Below this frequency, 90% of the spectrum's amplitude distribution is concentrated |
| 5 | Spectral flux | The squared difference between the normalized amplitudes of the spectra of two consecutive frames |
| 6 | Spectral entropy | The entropy of the normalized spectral energy of a set of subframes |
| 7 | Spectral extension | The second central moment of the spectrum |
| 8 | Spectral center of mass | The center of gravity of the spectrum |
| 9 | Energy entropy | The entropy of the normalized energy of a subframe, which can be interpreted as a measure of the mutation |
| 10 | Energy | The sum of squares of the signal values normalized by the corresponding frame length |
| 11 | Trans-zero rate | The rate of sign change of the signal during a given frame duration |
Figure 3Network structure of the click-through rate prediction model.
Figure 4Comparison of disambiguation effects of different classification models with separate feature extraction.
Figure 5NCF performance on Movie Lens-100k dataset.
Figure 6Experimental results of MMIE and SDIN in different sequence length comparison.
Figure 7Comparison of test results of different components.