| Literature DB >> 32021812 |
Rafael Prieto Curiel1, Carmen Cabrera Arnau2, Mara Torres Pinedo3, Humberto González Ramírez4, Steven Richard Bishop2.
Abstract
Discrete observations from data which are obtained from sparse, and yet concentrated events are often observed (e.g. road accidents or murders). Traditional methods to compute summary statistics often include placing the data in discrete bins but for this type of data this approach often results in large numbers of empty bins for which no function or summary statistic can be computed. Here, a method for dealing with sparse and concentrated observations is constructed, based on a sequence of non-overlapping bins of varying size, which gives a continuous interpolation of data for computing summary statistics of the values for the data, such as the mean. The method presented here overcomes the problem which sparsity and concentration present when computing functions to represent the data. Implementation of the method presented here is facilitated via open access to the code. •A new method for computing functions over sparse and concentrated data is constructed.•The method allows straightforward functions to be computed over partitions of the data, such as the mean, but also more complicated functions, such as coefficients, ratios, correlations, regressions and others.Entities:
Keywords: Continuous binning; Discrete data; Smooth functions evaluated in concentrated observations; Sparse data
Year: 2019 PMID: 32021812 PMCID: PMC6994295 DOI: 10.1016/j.mex.2019.10.020
Source DB: PubMed Journal: MethodsX ISSN: 2215-0161
Fig. 1Observations are depicted as arrows, where their position represents the (sparse and concentrated) values of ti and the colour represent the values of some variable xi, for instance the total value. The function f is the “average colour” of the xi and so, for empty bins, it is impossible to simply assign a value of the f(bk).
Fig. 2Sparse and concentrated observations are depicted as the arrows, where their position represents the values of ti and the colour represent the values of some variable xi. Although for different partitions, empty bins are obtained, they are ignored for the computation of the values of f(tk). The random initial point of the partitions and the varying width gives a smooth description of the function f.
| Subject Area: | |
| More specific subject area: | |
| Method name: | |
| Name and reference of original method: | |
| Resource availability: |