| Literature DB >> 36159898 |
Ahmed Samir Imam Mahmoud1, Tapajit Dey2, Alexander Nolte1,3, Audris Mockus4, James D Herbsleb3.
Abstract
Context: Hackathons have become popular events for teams to collaborate on projects and develop software prototypes. Most existing research focuses on activities during an event with limited attention to the evolution of the hackathon code. Objective: We aim to understand the evolution of code used in and created during hackathon events, with a particular focus on the code blobs, specifically, how frequently hackathon teams reuse pre-existing code, how much new code they develop, if that code gets reused afterwards, and what factors affect reuse. Method: We collected information about 22,183 hackathon projects from Devpost and obtained related code blobs, authors, project characteristics, original author, code creation time, language, and size information from World of Code. We tracked the reuse of code blobs by identifying all commits containing blobs created during hackathons and identifying all projects that contain those commits. We also conducted a series of surveys in order to gain a deeper understanding of hackathon code evolution that we sent out to hackathon participants whose code was reused, whose code was not reused, and developers who reused some hackathon code. Result: 9.14% of the code blobs in hackathon repositories and 8% of the lines of code (LOC) are created during hackathons and around a third of the hackathon code gets reused in other projects by both blob count and LOC. The number of associated technologies and the number of participants in hackathons increase reuse probability.Entities:
Keywords: Code reuse; Empirical study; Hackathon; Mining software repositories; Survey; World of code
Year: 2022 PMID: 36159898 PMCID: PMC9489595 DOI: 10.1007/s10664-022-10201-x
Source DB: PubMed Journal: Empir Softw Eng ISSN: 1382-3256 Impact factor: 3.762
Fig. 1Data collection workflow: highlighting the different data sources used and the process of gathering the required information from them, and the data used in answering our research questions
Fig. 2Common technologies associated with the chosen hackathon projects
Fig. 3Distribution of team sizes for the hackathon projects
Fig. 4Distribution of when the hackathons under consideration took place
Confusion matrix showing the effectiveness of our heuristics in identifying template code
| True positive | True negative | |
|---|---|---|
| Predicted positive | 96 | 4 |
| Predicted negative | 26 | 74 |
Description of variables used for addressing RQ3. For the Binary variable, no. of TRUE/FALSE cases are shown
Survey 1 sent to individuals who had created a blob during a hackathon that was reused after the event and to individuals who had created a blob during a hackathon that was not reused
| Perception about the code contained in the blob |
| The commit was made by GITHUB_HANDLE. Is this your handle? (yes, no, not sure) |
| Where to the best of your knowledge did the code contained in this file originate from? Please select all options that apply |
| (1) I wrote it during the hackathon |
| (2) I reused code that I had written before the hackathon |
| (3) My team members wrote it during the hackathon |
| (4) My team members reused code they had written before the hackathon |
| (5) From friends or colleagues |
| (6) From other GitHub repositories |
| (7) From the web (e.g. on Stackoverflow or forums) |
| (8) It was generated by a tool |
| (9) I am not sure |
| (10) Other |
| Where, if at all, did you REUSE the code contained in this file after the hackathon? Please select all options that apply |
| (1) At another hackathon |
| (2) In School as part of a class, project or thesis |
| (3) At work |
| (4) In another open source project in my free time |
| (5) I did not reuse it after the hackathon |
| (6) I do not recall reusing it after the hackathon |
| (7) Other |
| What if anything did you do to share the code contained in this file after the hackathon? Please select all options that apply |
| (1) I sent it to friends or colleagues |
| (2) I shared it online (e.g. via GitHub, Stackoverflow, Social media) |
| (3) I did not do anything to share the code after the hackathon |
| (4) Other |
| For whom do you think this file might be useful? Please select all options that apply |
| (1) Only for me |
| (2) For our hackathon project or team |
| (3) For a small group of people |
| (4) For many people |
| Are you aware of anyone else REUSING the code contained in this file after the hackathon? Please select all options that apply |
| (1) Yes one of my hackathon team members |
| (2) Yes one of my friends or colleagues |
| (3) Yes someone else used it in an open source project |
| (4) I am not aware of anyone reusing this code after the hackathon |
| (5) Other |
| Perception about the hackathon project |
| Can you recall any instance where you reused ideas that arose from this hackathon after it had ended? (yes, no, not sure) |
| Perceived usefulness of the hackathons project (based on Reinig ( |
| I am satisfied with the work completed in this team |
| I am satisfied with the quality of my team’s output |
| My ideal outcome coming into my team was achieved |
| My expectations towards my team were met |
| We lacked important skills to complete our project |
| Intentions to continue working on the hackathon project (based on Bhattacherjee ( |
| I intend to continue working on this project rather than not continue working on it |
| My intentions are to continue working on this project rather than any other project |
| If I could, I would like to continue working on this project as much as possible |
| Demographics |
| How old are you currently? (18 to 24, 25 to 34, 35 to 44, 45 to 54, 55 to 64, 65 to 74, 75 or older, Prefer not to say) |
| Are you...? (Female, Male, Non-binary, Prefer not to say) |
| Do you consider yourself a minority? (For example in terms of race, gender, expertise or in another way)? (Yes, No, Prefer not to say) |
| About how many years of experience do you have as a contributor to open source projects in general? (1, 2, 3, 4, 5 + ) |
Survey 2 sent to individuals who had reused a blob that was created during a hackathon
| Perception about the code contained in the blob |
| The commit was made by GITHUB_HANDLE. Is this your handle? (yes, no, not sure) |
| Where to the best of your knowledge did the code contained in this file originate from? Please select all |
| options that apply |
| (1) I wrote it |
| (2) From friends or colleagues |
| (3) From other GitHub repositories |
| (4) From the web (e.g. on Stackoverflow or forums) |
| (5) It was generated by a tool |
| (6) I am not sure |
| (7) Other |
| Perceived ease of use (based on Davis ( |
| Learning to use the code in this file was easy for me |
| I found it easy to get the code in this file to do what I want it to do |
| It was easy for me to become skillful at using the code in this file |
| I found the code in this file easy to use |
| Demographics |
| How old are you currently? (18 to 24, 25 to 34, 35 to 44, 45 to 54, 55 to 64, 65 to 74, 75 or older, Prefer not to say) |
| Are you...? (Female, Male, Non-binary, Prefer not to say) |
| Do you consider yourself a minority? (For example in terms of race, gender, expertise or in another way)? (Yes, No, Prefer not to say) |
| About how many years of experience do you have as a contributor to open source projects in general? (1, 2, 3, 4, 5 + ) |
Surveys demographics summary
| Survey 1 | Survey 2 | Survey 3 | Combined | ||
|---|---|---|---|---|---|
| percentage | percentage | percentage | percentage | ||
| % | % | % | % | ||
| Gender | Male | 83.5 | 79.3 | 85.3 | 82.8 |
| Female | 11.9 | 19.0 | 10.3 | 13.5 | |
| Prefer not to say | 4 | 1.7 | 4.3 | 3.4 | |
| Non-binary | 0.6 | 0 | 0 | 0.2 | |
| Age group | 18 to 24 | 49.7 | 61.0 | 41.4 | 50.6 |
| 25 to 34 | 40.7 | 35.6 | 42.2 | 39.7 | |
| 35 to 44 | 6.8 | 2.5 | 6.9 | 5.6 | |
| 45 to 54 | 0 | 0 | 5.2 | 1.5 | |
| 55 to 64 | 0 | 0.8 | 0 | 0.2 | |
| Prefer not to say | 2.8 | 0 | 4.3 | 2.4 | |
| Experience | 1 | 18.4 | 25.0 | 14.1 | 19.0 |
| 2 | 23.4 | 22.7 | 21.7 | 22.7 | |
| 3 | 12.8 | 17.0 | 20.7 | 16.2 | |
| 4 | 14.2 | 11.4 | 15.2 | 13.7 | |
| 5+ | 31.2 | 23.9 | 28.3 | 28.3 | |
| Minority | Yes | 30.2 | 41.1 | 25.5 | 32.0 |
| No | 64.5 | 56.2 | 70.9 | 64.0 | |
| Rather not say | 5.2 | 2.7 | 3.6 | 4.1 |
Fig. 5Plot of Who created how much of the Hackathon Code and When (RQ 1.a and RQ 1.b)
Code Origin Response from surveys of hackathon participants whose code was reused (Survey 1) and whose code wasn’t reused (Survey 2)
Fig. 6Top 5 languages for blobs created before, during, and after hackathons
Fig. 7Top 5 languages for blobs created by project members, co-contributors, and others
Fig. 8Distribution of Lines of Code for code blobs created at different times (a) and by different authors (b)
Descriptive statistics of code sizes (LOC) for blobs created at different times and by different authors
| Type | Origin | Minimum LOC | Mean LOC | Median LOC | Maximum LOC |
|---|---|---|---|---|---|
| Timing | Before | 0 | 241.59 | 32 | 297,827 |
| During | 0 | 208.67 | 43 | 3,665,363 | |
| After | 0 | 772.62 | 159 | 359,982 | |
| Author | Project member | 0 | 565.97 | 95 | 3,665,363 |
| Co-Contributer | 0 | 362.35 | 75 | 67,233 | |
| Other suthor | 0 | 142.34 | 22 | 297,827 |
Fig. 9Plot of the distribution of Template Code depending on Who created the code and When
Logistic Regression model for statistically explaining code blob reuse by whether a blob is template or not
| Estimate | Std. Error | p-Value | |
|---|---|---|---|
| (Intercept) | − 0.9355 | 0.0030 | < 2 |
| Template- | 1.0155 | 0.0159 | < 2 |
Fig. 10Size distribution of reused and not reused code blobs
Descriptive Statistics for Code sizes (LOC) for Hackathon blobs that were later reused and blobs that were not reused
| Type | Minimum LOC | Mean LOC | Median LOC | Maximum LOC |
|---|---|---|---|---|
| Reused | 0 | 185.92 | 33 | 771,207 |
| Not-reused | 0 | 187.55 | 40 | 3,665,363 |
Fig. 11Boxplot for satisfactions and intentions scale questions
Fig. 12Ease of use of hackathon code that was reused (N = 118)
Fig. 13Top 5 Languages for the reused code blobs in different projects
Fig. 14Weekly hackathon code blob reuse in projects of different categories over the period of 2 Years
Fig. 15If the code creators reused the code contained in this file after the hackathon
Fig. 16If/where did the code creators share the code contained in this file after the hackathon
Logistic Regression model for statistically explaining code blob reuse by if the authors foster code blob reuse by reusing/sharing the code
Fig. 17Awareness of others reusing the code
Fig. 18For whom would the code be useful?
Logistic Regression model for statistically explaining code blob reuse by the authors’ awareness of reuse and feeling about the usefulness of the code
Survey 1 and 2 responses for if the participants reused the ideas that arose from the hackathon
| Survey 1 | Survey 2 | |||
|---|---|---|---|---|
| Count | Percentage | Count | Percentage | |
| Yes | 57 | 32.2% | 34 | 28.6% |
| No | 80 | 45.2% | 60 | 50.4% |
| Not sure | 40 | 22.6% | 25 | 21% |
Effect of project characteristics on hackathon code reuse - results from the generalized additive model
Part A. showing the results for the linear terms, with the associated Estimate, Standard Error, and p-Values
Part B. shows the results for the non-linear terms, with the Effective Degrees of Freedom – “edf” – a measure of the degree of non-linearity, the p-Values, and the partial effects of each variable on the response (0: No Effect, Positive Values: Positive effects, Negative Values: Negative Effects)
The “pctData” variable, found to be “not significant”, is shown in RED, and the corresponding effect plot is omitted