| Literature DB >> 24130745 |
Abstract
We investigate the computational structure of a paradigmatic example of distributed social interaction: that of the open-source Wikipedia community. We examine the statistical properties of its cooperative behavior, and perform model selection to determine whether this aspect of the system can be described by a finite-state process, or whether reference to an effectively unbounded resource allows for a more parsimonious description. We find strong evidence, in a majority of the most-edited pages, in favor of a collective-state model, where the probability of a "revert" action declines as the square root of the number of non-revert actions seen since the last revert. We provide evidence that the emergence of this social counter is driven by collective interaction effects, rather than properties of individual users.Entities:
Mesh:
Year: 2013 PMID: 24130745 PMCID: PMC3794014 DOI: 10.1371/journal.pone.0075818
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
A day of edits on the George_W._Bush page, starting at midnight UTC, 21 March 2006.
| time (UTC) | user | SHA1 (partial) | code |
| 02∶08 | Sarah | 4abc4aef1ea5 |
|
| 05∶02 | Alexh25 | 1e3a2a4656d8 |
|
| 05∶04 | Mhking | 4abc4aef1ea5 |
|
| 11∶39 | Trezatium | 3b03700b0d9c |
|
| 12∶15 | Brazilfantoo | 94a5c05ba10e |
|
| 12∶31 | Brandon39 | 3b03700b0d9c |
|
| 23∶28 | Titoxd | 109986b8f390 |
|
| 23∶31 | Titoxd | 334a315944ce |
|
| 23∶38 | Titoxd | 739c15e5bc6a |
|
| 23∶40 | Titoxd | 3063a0289680 |
|
| 23∶42 | Titoxd | 7aafc8f3f762 |
|
As can be seen by comparing SHA1 hashes of the page content, user Mhking reverted an edit by user Alexh25 to the previous version by user Sarah. Later in the day, user Brandon39 reverted user Brazilfantoo. In between, one can see “cooperative” stretches involving both single and multiple users. This sequence of events is coarse-grained into the substring “CCRCCRCCCCC.” The full string of (in this case) 45,220 action symbols forms the basis of the finite-state analysis. As with all data used in this study, this sequence is publicly available, in this case at http://en.wikipedia.org/w/index.php?title=George_W._Bush&offset=200603218&action=history [last accessed 15 August 2013].
Figure 1Top.
Distribution of consecutive C (“cooperative”) events in the edit history of the most-edited article on the English-language Wikipedia, George_W._Bush. Solid histogram: actual data. Red/solid line: maximum-likelihood fit for the three-parameter collective state (CS) model of Eq. 6, preferred over the sum of exponential model (nEXP) of Eq. 2. The blue/dashed and green/dotted lines show the one and two component finite-state approximations to the Collective State model. The finite state model approximates the collective state model in this data at four components (eight parameters), at which point it is strongly disfavored as non-parsimonious by Bayesian model selection. Bottom. Contributions to (log-likelihood relative to collective state) for the one, two, and three component fits (blue/dashed, green/dotted and yellow/solid, respectively).
log-Evidence () ratios, for the collective state versus the finite-state case, for the ten most-edited pages on Wikipedia.
| sig. | page name | historylength |
| collectivestate index |
| CS |
| |||
| <10−8 | George_W._Bush | 45,220 | 18.5 | 0.576±0.005 |
| <10−6 | Islam | 18,054 | 14.9 | 0.592±0.007 |
| <10−5 | United_States | 31,919 | 12.3 | 0.545±0.006 |
| Global_warming | 19,541 | 12.1 | 0.602±0.008 | |
| <10−4 | Wikipedia | 31,927 | 11.3 | 0.638±0.006 |
| Michael_Jackson | 26,977 | 10.4 | 0.572±0.007 | |
| <10−3 | 2006_Lebanon_War | 19,656 | 9.1 | 0.49±0.01 |
| Deaths_in_2009 | 20,902 | 7.7 | 0.42±0.01 | |
| >104 | Deaths_in_2007 | 18,215 | −11.5 | – |
| >107 | Deaths_in_2008 | 19,072 | −17.5 | – |
In cases where the collective state model is strongly favored (large, positive ), we show the best-fit value of the parameter (see Eq. 6). Eight pages show strong (-value ) evidence for the collective state (CS) model of Eq. 6 over and above that for the sum of exponentials (nEXP). The strongest evidence in favor of finite-state computation is found for two of the three “death list” pages, which collate otherwise unrelated information from other parts of the encyclopedia. Appendix S4 in the File S1 gives details on the use and computation of for model selection.
Figure 2Solid line: distribution of consecutive single-user C (“cooperative”) events in George_W._Bush.
The contrast to the multi-user case is clear, showing that long periods of cooperative editing can not be accounted for by unbroken single-user patters. The distribution is well-modeled by the collective state model, Eq. 8, with distinct functional form and parameter values from the fit for the multi-user case. The fit is preferred to the finite-state nEXP model at ().