Literature DB >> 28018947

Random number datasets generated from statistical analysis of randomly sampled GSM recharge cards.

Hilary I Okagbue1, Abiodun A Opanuga1, Pelumi E Oguntunde1, Paulinus O Ugwoke2.   

Abstract

In this article, a random number of datasets was generated from random samples of used GSM (Global Systems for Mobile Communications) recharge cards. Statistical analyses were performed to refine the raw data to random number datasets arranged in table. A detailed description of the method and relevant tests of randomness were also discussed.

Entities:  

Keywords:  Chi-square tests; GSM recharge cards; Random number tables; Randomness

Year:  2016        PMID: 28018947      PMCID: PMC5167235          DOI: 10.1016/j.dib.2016.12.003

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table Value of the data Can be used for educational purposes. The method of generation of the data and the data itself will be very helpful in low and middle-income countries where there are little or no computational random number generator for scientific purposes. This is because access to recharge cards is more likely than access to internet in those countries where there is inadequate access to internet to download large datasets caused by either epileptic power supply or lack of internet access. Some of the countries have inadequate internet infrastructure. Even when mobile phones are available, they cannot be compatible with the volume of datasets. See [1]. Also in low and middle-income countries, access to used recharged cards are at no or little cost. This research can serve as a cheap way of leveraging on the strong computational algorithms used by GSM companies to generate random datasets. The algorithms are assets through which revenue is accrued through the sale of recharge cards.

Data

The datasets are the table of random numbers in the raw excel file and the data grouped in four digits in the pdf file. The statistical tests for randomness are indications of the confidence in the reliability of the data for any given purpose.

Experimental design, materials and methods

Every attempt to construct a random number table must take into account that the table must be independent on any row or column. Furthermore, the data will not be found to follow any observed pattern(s). See [2], [3], [4], [5], [6], [7], [8], [9], [10] for details on other methodologies and results. The choice of using the used recharge cards of GSM network operator was based on the fact that their recharge cards are produced by strong computational algorithms that are programmed to generate random digits and numbers. The steps undertaken to obtain the table of random number datasets are listed below in details. Step 1: A random sample of used recharge cards from a particular GSM network was taken. 380 samples were obtained, each is 16 digits. The number of the digits (0–9) for each of the 16 digits (a–p) is tabulated to show the frequency distribution. This is shown in (Table 1).
Table 1

The frequency distribution of the digits of the raw datasets.

AbcdefghijklmnopTotal
047474238304235313437354235373748617
135374534464137344041473829434740634
251413138353841414133424246484239649
340344047353642403934413030403732597
436413136422846424341374042333540613
538354140393528513442443347323229600
639362931344736303946403542373936596
739394233434142463438373640425041643
854383744364238343536364035373138611
91324239403035314132214434313037520
Step 2: The exploratory data analysis is done to reveal the measures of central tendencies, variation and dispersion of the different columns of the data. This is shown in (Table 2).
Table 2

The descriptive statistics of the raw datasets.

ColumnMeanMedianModeStandard DeviationSkewness
a4.05482.6910.023
b4.32402.8980.042
c4.47512.9660.011
d4.51432.8840.029
e4.57512.852−0.006
f4.44562.892−0.043
g4.47442.8360.038
h4.51552.7240.000
i4.48442.8600.058
j4.48562.806−0.041
k4.21412.7250.088
l4.51492.9730.006
m4.56552.767−0.049
n4.32422.8590.086
o4.33472.8430.047
p4.37402.9690.031
Step 3: The data were checked for equal or unequal frequency distributions when the digits (0–9) of the 16 columns (a–p) are grouped into mutually exclusive and distinct classes such as; even and odd numbers (Table 3), 0 to 4 and 5 to 9 (Table 4), and 0 and primes against composite numbers (Table 5).
Table 3

The even and odd number distribution of the raw datasets.

abcdefghiJklmnopTotal
Even2272031701871771971961781921931901992001921842013086
Odd1531772101932031831842021881871901811801881961792994
Table 4

The 0–4 and 5–9 distribution of the raw datasets.

abcdefghijklmnopTotal
0−42092001891931881852011881971862021921822011981993110
5−91711801911871921951791921831941781881981791821812970
Table 5

The zero prime, and composite numbers distribution of the raw datasets.

abcdefghIjklMnopTotal
0 and primes2151961961961821921882091821841991831981991981893106
Composite numbers165184184184198188192171198196181197182181182191`2974
Step 4: Analysis of variance (ANOVA) is performed to show the variation between and within the columns (Table 6).
Table 6

The Analysis of Variance of the raw datasets.

Sources of variationSSdfMSFF criteriaP value
Between Groups110.8710526157.3914035090.9113913941.6680345320.550681054
Within Groups49179.1684260648.110021178
Total49290.039476079
Step 5: Correlation among the columns were investigated by computing the Spearman rank correlation coefficients for the pairs of the columns. Randomness is achieved at zero or near zero correlations as shown in (Table 7). The Chi-square test of independence is conducted on each pair of the columns to investigate whether there is association among them. This is shown using the p-values. The bold sections of (Table 8) are indications of association between the columns and hence the probability of randomness is small.
Table 7

Correlation coefficients for the raw data sample.

Bcdefghijklmnop
A−0.0110.048−0.0230.0400.0760.056−0.057−0.006−0.017−0.028−0.0160.015−0.0140.009−0.030
B0.142−0.070−0.012−0.0190.050−0.009−0.063−0.014−0.0460.087−0.002−0.0140.0250.001
C0.015−0.0420.0290.0490.030−0.0740.0540.038−0.0630.0470.0150.062−0.016
D0.099−0.0040.0480.0020.036−0.029−0.0360.014−0.046−0.0300.102−0.054
E0.045−0.0220.127−0.1140.024−0.0130.052−0.0080.045−0.0380.055
F0.083−0.063−0.027−0.028−0.018−0.0070.0600.0600.0560.001
G−0.027−0.023−0.073−0.0160.116−0.041−0.0480.099−0.059
h−0.0870.019−0.035−0.061−0.0390.023−0.0690.032
i0.0180.0100.0190.007−0.0530.0510.067
j0.032−0.0030.015−0.0040.004−0.007
k0.099−0.037−0.090−0.079−0.007
l0.0450.0050.043−0.048
m−0.0970.034−0.076
n−0.015−0.086
o−0.018
Table 8

P value of chi-square test of independence for the raw datasets.

bcdefghijklmnop
a0.8380.8080.3930.0780.1110.9270.8450.7760.9740.2610.4830.7490.0160.7620.918
b0.4110.2890.3170.3270.4600.7030.4120.9930.5280.3040.7690.4140.7610.934
c0.6720.2960.9250.7170.8540.8490.2830.0760.9530.3020.3710.7400.876
d0.0580.9340.0910.8740.9610.4090.0060.8730.5800.9480.4950.502
e0.5460.7630.4690.6620.8650.4110.5730.4930.6850.1930.145
f0.9670.3820.030.6950.3130.5990.7350.4460.6140.157
g0.7990.3100.2620.1260.0450.2290.2240.4050.486
h0.4480.4170.0960.1780.9650.1650.0860.600
i0.9450.8560.2350.9180.2460.3360.077
j0.1640.2870.5530.7570.5270.635
k0.2790.9660.3980.9360.585
l0.0940.9500.0340.772
m0.0110.6010.260
n0.2300.830
o0.740
Step 6: Chi-square goodness of fit test is conducted to investigate the random distributions of the digits (0–9) shown in (Table 9a, Table 9b). Near zero values of the p-values implies lower probability of randomness.
Table 9a

Chi-square test of Independence of the raw dataset.

DigitObserved frequencyExpected frequencyResidual
06176089
163460826
264960841
3597608−11
46136085
5600608−8
6596608−12
764360835
86116083
9520608−88
Table 9b

Goodness of Fit Test Summary of the raw dataset.

Test statisticsValue
Chi-square19.359
df9
P-value0.022
Step 7: Chi-square goodness of fit test is conducted to check the random distribution of the digits (0–9) across the columns (a-p) shown in (Table 10). Higher values of p-values are desirable for randomness irrespective of the values of the Chi-square statistics.
Table 10

The Goodness of Fit test for all the columns of the raw datasets.

ColumnChi-SquareP-value
a49.8420.000
b4.3680.886
c7.6320.572
d5.6840.771
e5.5790.781
f8.1050.524
g6.0000.740
h12.0000.213
i2.7890.972
j4.7370.857
k11.8420.222
l4.6840.861
m9.4740.395
n6.7890.659
o10.5790.306
p6.3160.708
Step 8: The residuals obtained in step 7 for all the columns against the digits are tabulated. This is shown in (Table 11).
Table 11

The Residuals of the Goodness of test of the columns of the raw datasets.

abcdefghijklmnopTotal
09940−84−3−7−4−1−34−3−1−1109
1−3−17−483−1−42390−959226
2133.70−303335448104141
32−429−3−2421−43−8−82−1−6−11
4−23−7−24−108453−124−5−325
50−3321−3−1013−446−59−6−6−9−8
61−2−9−7−49−2−8182−34−11−2−12
7114−55348−40−1−22412335
8160−16−240−4−3−2−22−3−1−703
9−37−6412−8−3−73−6−176−4−7−8−1−88
Step 9: Randomness is improved when there are equal distributions of the digits (0–9) in the columns (a–p). This step involves randomly manipulations of the numbers in each column using the table of residuals as a guide. This can be done manually or computationally. For example in column a: randomly remove 0 in 9 places, introduce 1 in 3 places, remove 2 in 13 places and so on. This is repeated for all the other 15 columns. The rationale is to achieve equal representation of digits in random sampling irrespective of the columns. Step 10: Analysis of variance (ANOVA) is performed to show the variation between and within the columns. If step 9 is done correctly, it is expected that the variation between the groups will be zero as shown in (Table 12)
Table 12

Analysis of Variance of the Final datasets.

Sources of variationSSdfMSFF criteriaP value
Between Groups015001.6680345321.00
Within Groups5016060648.27176781
Total501606079
Step 11: The Chi-square goodness of fit test is performed to show that the occurrence of the digits are equal in distribution and random. This is shown in (Table 13).
Table 13

The Goodness of Fit test for all the columns of the final datasets.

ColumnChi-SquareP-value
a0.0001.000
b0.0001.000
c0.0001.000
d0.0001.000
e0.0001.000
f0.0001.000
g0.0001.000
h0.0001.000
i0.0001.000
j0.0001.000
k0.0001.000
l0.0001.000
m0.0001.000
n0.0001.000
o0.0001.000
p0.0001.000
Step 12: Correlation among the columns are conducted to verify weak associations among the various columns (Table 14). To show the degree of randomness, Chi-square test of independence is performed to show independence or association between the pairs of the columns. Higher values of p-values are desirable for randomness. It can be seen from (Table 15) that all the p-values are greater than 0.05.
Table 14

The correlation coefficient for the columns of the final datasets.

bcdefghijklmnop
a−0.0240.009−0.0180.0700.0860.017−0.0360.055−0.037−0.023−0.099−0.0240.0260.000−0.074
b0.143−0.057−0.0010.0050.0280.018−0.0580.011−0.0330.0610.014−0.0720.0630.109
c0.010−0.0460.0010.0050.065−0.0600.096−0.010−0.0390.0710.0290.066−0.017
d0.063−0.036−0.057−0.0040.044−0.031−0.0280.0850.0100.0000.070−0.037
e0.0040.0260.096−0.0530.055−0.0080.0060.042−0.041−0.049−0.010
f0.022−0.043−0.0250.0200.056−0.0440.0740.0620.059−0.013
g0.012−0.052−0.1000.0390.028−0.110−0.0310.034−0.022
h−0.070−0.0130.018−0.002−0.049−0.005−0.049−0.052
i0.005−0.0060.061−0.042−0.082−0.0010.039
j0.0220.0210.0460.0560.047−0.032
k0.063−0.0230.004−0.069−0.104
l−0.0690.0190.067−0.003
m0.0200.033−0.029
n−0.0260.002
o−0.015
Table 15

The P-value for the columns of the final datasets.

bcdefgHIjklmnop
a0.3370.5270.6100.1040.8030.0980.9960.9780.8990.4940.7210.4450.3660.7910.511
b0.3660.2810.3510.3370.2200.7770.4290.9220.7350.2560.3820.7210.8810.577
c0.5270.5110.9760.2690.9570.5110.1590.8610.7050.1500.5770.8610.721
d0.7640.7350.3650.9650.9570.3660.1770.0970.0650.9150.4770.969
e0.3510.4360.1180.4130.6590.7500.5270.9500.5610.3370.111
f0.4970.5440.1110.8390.3080.1330.9350.6430.2810.198
g0.7520.2850.3620.3630.2150.1320.8250.9520.189
h0.5270.4770.4130.8280.7350.4130.1980.978
i0.7500.5440.1680.8390.6270.7500.659
j0.5940.8500.6100.4940.5440.941
k0.2950.3370.2310.9920.922
l0.1250.7350.6750.231
m0.1040.1680.198
n0.1980.881
o0.690
Step 13: The final data is 380 by 16 table of random numbers.
Subject areaStatistics
More specific subject areaRandom Sampling
Type of dataTable
How data was acquiredCollected at random from particular used GSM recharge cards
Data formatRaw
Experimental factorsTest for randomness
Experimental featuresAnalysis of Variance (ANOVA), Chi-square test of goodness of fit, Chi-square test of independence
Data source locationCovenant University Mathematics Laboratory, Ota, Nigeria.
Data accessibilityAll the data are in this data article.
  20 in total

1.  Survey datasets on the externalizing behaviors of primary school pupils and secondary school students in some selected schools in Ogun State, Nigeria.

Authors:  Sheila A Bishop; Enahoro A Owoloko; Hilary I Okagbue; Pelumi E Oguntunde; Oluwole A Odetunmibi; Abiodun A Opanuga
Journal:  Data Brief       Date:  2017-06-16

2.  Survey dataset on the impact of stakeholder's relationship on the academic performance of engineering students.

Authors:  Opeyemi Oyeyipo; Henry Odeyinka; James Owolabi; Adedeji Afolabi; Rapheal Ojelabi
Journal:  Data Brief       Date:  2018-02-27

3.  Data exploration of social client relationship management (CRM 2.0) adoption in the Nigerian construction business.

Authors:  Rapheal A Ojelabi; Adedeji O Afolabi; Opeyemi O Oyeyipo; Patience F Tunji-Olayeni; Bukola A Adewale
Journal:  Data Brief       Date:  2018-04-17

4.  Survey datasets on women participation in green jobs in the construction industry.

Authors:  Adedeji O Afolabi; Rapheal A Ojelabi; Patience F Tunji-Olayeni; Olabosipo I Fagbenle; Timothy O Mosaku
Journal:  Data Brief       Date:  2018-02-09

5.  Survey dataset on occupational hazards on construction sites.

Authors:  Patience F Tunji-Olayeni; Adedeji O Afolabi; Obiora I Okpalamoka
Journal:  Data Brief       Date:  2018-04-13

6.  Survey dataset on analysis of queues in some selected banks in Ogun State, Nigeria.

Authors:  Sheila A Bishop; Hilary I Okagbue; Pelumi E Oguntunde; Abiodun A Opanuga; Oluwole A Odetunmibi
Journal:  Data Brief       Date:  2018-05-24

7.  Statistical exploration of dataset examining key indicators influencing housing and urban infrastructure investments in megacities.

Authors:  Adedeji O Afolabi; Rapheal A Ojelabi; Adewale Bukola; Adedotun Akinola; Adesola Afolabi
Journal:  Data Brief       Date:  2018-05-02

8.  Statistical analysis of bank deposits dataset.

Authors:  Pelumi E Oguntunde; Hilary I Okagbue; Patience I Adamu; Omoleye A Oguntunde; Sola J Oluwatunde; Abiodun A Opanuga
Journal:  Data Brief       Date:  2018-03-26

9.  Personal name in Igbo Culture: A dataset on randomly selected personal names and their statistical analysis.

Authors:  Hilary I Okagbue; Abiodun A Opanuga; Muminu O Adamu; Paulinus O Ugwoke; Emmanuela C M Obasi; Grace A Eze
Journal:  Data Brief       Date:  2017-09-01

10.  Survey datasets on patterns of utilization of mental healthcare services among people living with mental illness.

Authors:  Tomike I Olawande; Hilary I Okagbue; Ayodele S Jegede; Patrick A Edewor; Lukman T Fasasi
Journal:  Data Brief       Date:  2018-07-05
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.