| Literature DB >> 35909490 |
Yousef A Alohali1, Mahmoud S Fayed1, Tamer Mesallam2, Yassin Abdelsamad3, Fida Almuhawas4, Abdulrahman Hagr4.
Abstract
One of the most widely used measures of scientific impact is the number of citations. However, due to its heavy-tailed distribution, citations are fundamentally difficult to predict but can be improved. This study was aimed at investigating the factors and parts influencing the citation number of a scientific paper in the otology field. Therefore, this work proposes a new solution that utilizes machine learning and natural language processing to process English text and provides a paper citation as the predicted results. Different algorithms are implemented in this solution, such as linear regression, boosted decision tree, decision forest, and neural networks. The application of neural network regression revealed that papers' abstracts have more influence on the citation numbers of otological articles. This new solution has been developed in visual programming using Microsoft Azure machine learning at the back end and Programming Without Coding Technology at the front end. We recommend using machine learning models to improve the abstracts of research articles to get more citations.Entities:
Mesh:
Year: 2022 PMID: 35909490 PMCID: PMC9329008 DOI: 10.1155/2022/2239152
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.246
Some of the title n-grams with positive weight.
| Feature | Weight |
|---|---|
| Preprocessed TI.[ganglion] | 915.36 |
| Preprocessed TI.[speak_language] | 188.11 |
| Preprocessed TI.[chronic] | 180.89 |
| Preprocessed TI.[ear] | 175.04 |
| Preprocessed TI.[implantation] | 141.14 |
| Preprocessed TI.[acoustic_stimulation] | 138.86 |
| Preprocessed TI.[implication_cochlear] | 110.82 |
| Preprocessed TI.[perception_cochlear] | 94.83 |
| Preprocessed TI.[adult_use] | 86.15 |
| Preprocessed TI.[affect] | 78.60 |
| Preprocessed TI.[auditory_nerve] | 73.73 |
| Preprocessed TI.[development] | 72.84 |
| Preprocessed TI.[language_development] | 70.51 |
| Preprocessed TI.[use_cochlear] | 66.22 |
| Preprocessed TI.[deafness] | 55.75 |
| Preprocessed TI.[skill] | 53.03 |
| Preprocessed TI.[electrical] | 47.46 |
| Preprocessed TI.[depth] | 45.88 |
| Preprocessed TI.[speak] | 42.75 |
Some of the title n-grams with negative weights.
| Feature | Weight |
|---|---|
| Preprocessed TI.[electrode_insertion] | -90.47 |
| Preprocessed TI.[assessment] | -90.87 |
| Preprocessed TI.[profound] | -92.07 |
| Preprocessed TI.[stimulation_auditory] | -93.67 |
| Preprocessed TI.[study] | -94.31 |
| Preprocessed TI.[ganglion_neuron] | -96.39 |
| Preprocessed TI.[implant_patient] | -99.88 |
| Preprocessed TI.[nerve] | -103.80 |
| Preprocessed TI.[congenital] | -106.46 |
| Preprocessed TI.[early_cochlear] | -161.79 |
| Preprocessed TI.[cochlear_implantation] | -164.26 |
| Preprocessed TI.[child_use] | -172.33 |
| Preprocessed TI.[implant_user] | -172.34 |
| Preprocessed TI.[spiral] | -453.88 |
| Preprocessed TI.[spiral_ganglion] | -453.88 |
Figure 1Title n-gram word art.
Using different models to predict the total citations using the paper title.
| Algorithm | Mean absolute error | Root mean squared error | Relative absolute error | Relative squared error | Coefficient of determination |
|---|---|---|---|---|---|
| Linear regression | 58.73 | 80.43 | 1.21 | 1.41 | -0.41 |
| Boosted decision tree regression | 48.67 | 70.15 | 1.00 | 1.07 | -0.07 |
| Decision forest regression | 46.12 | 69.45 | 0.95 | 1.05 | -0.05 |
| Neural network | 60.23 | 87.51 | 1.24 | 1.67 | -0.67 |
Error in citation count.
| Error | Percentage of citation range (523 citations) | Papers count | Percentage of testing papers (150 papers) |
|---|---|---|---|
| ≤10 citations | 1.9% | 29 papers | 19.33% |
| ≤40 citations | 7.6% | 98 papers | 65.33% |
| ≤80 citations | 15.29% | 125 papers | 83.33% |
| ≤100 citations | 19.12% | 131 papers | 87.33% |
Figure 2RMSE for different models that use the title n-grams.
Some of the abstract n-grams with positive weights.
| Feature | Weight |
|---|---|
| Preprocessed AB.[specimen] | 215.01 |
| Preprocessed AB.[chronic] | 201.60 |
| Preprocessed AB.[expression] | 188.20 |
| Preprocessed AB.[individual] | 180.01 |
| Preprocessed AB.[largely] | 150.27 |
| Preprocessed AB.[direct] | 137.41 |
| Preprocessed AB.[age] | 135.98 |
| Preprocessed AB.[refer] | 130.23 |
| Preprocessed AB.[language_development] | 128.61 |
| Preprocessed AB.[hear_aid] | 128.15 |
| Preprocessed AB.[detection] | 123.85 |
| Preprocessed AB.[place] | 119.44 |
| Preprocessed AB.[base] | 117.42 |
| Preprocessed AB.[point] | 116.75 |
| Preprocessed AB.[listen] | 115.41 |
| Preprocessed AB.[excellent] | 110.98 |
| Preprocessed AB.[widely] | 110.75 |
| Preprocessed AB.[English] | 110.69 |
| Preprocessed AB.[psychological] | 109.72 |
Some of the abstract n-grams with negative weights.
| Feature | Weight |
|---|---|
| Preprocessed AB.[amplitude] | -77.38 |
| Preprocessed AB.[regard] | -77.63 |
| Preprocessed AB.[world] | -77.78 |
| Preprocessed AB.[occur] | -78.17 |
| Preprocessed AB.[normal] | -81.00 |
| Preprocessed AB.[aid_condition] | -82.22 |
| Preprocessed AB.[profound_deafness] | -82.57 |
| Preprocessed AB.[child_use] | -82.95 |
| Preprocessed AB.[potential_record] | -85.43 |
| Preprocessed AB.[overall] | -87.01 |
| Preprocessed AB.[child_implant] | -90.88 |
| Preprocessed AB.[outcome] | -93.81 |
| Preprocessed AB.[month_implantation] | -95.26 |
| Preprocessed AB.[receptive] | -95.41 |
| Preprocessed AB.[frequency_information] | -96.82 |
| Preprocessed AB.[treat] | -96.91 |
| Preprocessed AB.[distort] | -102.61 |
| Preprocessed AB.[achieve] | -107.08 |
| Preprocessed AB.[implant_year] | -111.90 |
| Preprocessed AB.[post] | -117.16 |
| Preprocessed AB.[old] | -117.89 |
| Preprocessed AB.[site] | -124.55 |
Figure 3Abstract n-gram word art.
Using different models to predict the total citations using the paper abstract.
| Algorithm | Mean absolute error | Root mean squared error | Relative absolute error | Relative squared error | Coefficient of determination |
|---|---|---|---|---|---|
| Linear regression | 51.49 | 68.56 | 1.25 | 1.30 | -0.30 |
| Boosted decision tree regression | 47.87 | 66.00 | 1.16 | 1.21 | -0.21 |
| Decision forest regression | 42.75 | 63.53 | 1.04 | 1.12 | -0.12 |
| Neural network | 40.48 | 62.76 | 0.98 | 1.09 | -0.09 |
Error in citation count.
| Error | Percentage of citation range (523 citations) | Paper count | Percentage of testing papers (145 papers) |
|---|---|---|---|
| ≤10 citations | 1.9% | 46 papers | 31.72% |
| ≤40 citations | 7.6% | 93 papers | 64.13% |
| ≤80 citations | 15.29% | 127 papers | 87.58% |
| ≤100 citations | 19.12% | 135 papers | 93.1% |
Figure 4RMSE for different models that use the abstract n-grams.
Using different models to predict the total citations using the paper authors.
| Algorithm | Mean absolute error | Root mean squared error | Relative absolute error | Relative squared error | Coefficient of determination |
|---|---|---|---|---|---|
| Linear regression | 50.12 | 69.58 | 1.049 | 1.12 | -0.12 |
| Boosted decision tree regression | 45.34 | 65.79 | 0.949 | 1.00 | -0.00 |
| Decision forest regression | 46.94 | 67.36 | 0.98 | 1.05 | -0.05 |
| Neural network | 49.84 | 70.19 | 1.04 | 1.14 | -0.14 |
Error in citation count.
| Error | Percentage of citation range (523 citations) | Paper count | Percentage of testing papers (150 papers) |
|---|---|---|---|
| ≤10 citations | 1.9% | 23 papers | 15.33% |
| ≤40 citations | 7.6% | 90 papers | 60% |
| ≤80 citations | 15.29% | 130 papers | 86.66% |
| ≤100 citations | 19.12% | 136 papers | 90.66% |
Figure 5RMSE for different models that use the author n-grams.
Figure 6Citation prediction application: main window.
Figure 7Citation prediction application: dataset window.
Figure 8Inserting data from the dataset window to the main window.
The best algorithms and the corresponding RMSE.
| Feature | Best algorithm | Root mean squared error |
|---|---|---|
| Title | Decision forest regression | 69.45 |
| Abstract | Neural network | 62.76 |
| Authors | Boosted decision tree regression | 65.79 |
Figure 9Graph demonstrates the RMSE achieved by each algorithm.
Number of n-grams for each feature.
| The feature used in prediction |
|
|---|---|
| Title | 164 |
| Abstract | 1714 |
| Authors | 94 |
Figure 10Graph demonstrated the number of n-grams for each feature.