| Literature DB >> 29564087 |
Abstract
BACKGROUND: Crowdsourcing is a nascent phenomenon that has grown exponentially since it was coined in 2006. It involves a large group of people solving a problem or completing a task for an individual or, more commonly, for an organisation. While the field of crowdsourcing has developed more quickly in information technology, it has great promise in health applications. This review examines uses of crowdsourcing in global health and health, broadly.Entities:
Mesh:
Year: 2018 PMID: 29564087 PMCID: PMC5840433 DOI: 10.7189/jogh.08.010502
Source DB: PubMed Journal: J Glob Health ISSN: 2047-2978 Impact factor: 4.413
Description of studies included in overview
| Reference | Category | Topic | How crowdsourcing is used | Results |
|---|---|---|---|---|
| Diagnosis | Malaria diagnosis | Uses gaming (BioGames) to diagnose malaria parasites. Gamers are given a tutorial, and must achieve accuracy of >99% in training game before playing real game. Gamers asked to label cell as infected vs healthy. | Gamer diagnoses had an accuracy of 99%, sensitivity of 95.1% and specificity of 99.4%. Authors suggests that gaming could be a viable option for telepathology. | |
| Diagnosis | Malaria – education through diagnosis | Based off BioGames app [ | BioGames has achieved diagnostic accuracy comparable to that of experts when scores from individual non-experts are aggregated and the crowd size is large. | |
| Diagnosis | Malaria diagnosis | A crowdsourcing game (MalariaSpot) was designed using malaria-positive blood films. Players were asked to tag as many malarial parasites as possible in 1 min and given continuous feedback. | Combination of games or more resulted in extremely accurate identification of malarial parasites (99% accuracy). | |
| Diagnosis | Detecting glaucomatous optic neuropathy | Uses Amazon Mechanical Turk to study viability of crowdsourcing to detect glaucomatous optic neuropathy. Turkers were asked to classify images as normal or abnormal, with each image being classified 20 times. | Authors had two groups, one did not restrict and the other restricted to high-performing Turkers. Sensitivity was high across both, ranging from 83%-88%, but specificity was poor, ranging between 35%-43%. | |
| Diagnosis | Grading of diabetic retinopathy | Uses Amazon Mechanical Turk for classifying fundus photos of diabetic retinopathy. | 81.3% of images were correctly classified in an average time of 25 s per image. However, Turkers struggled to specify the level of severity. | |
| Diagnosis | Large scale molecular pathology studies in cancer | Used 98 293 citizen scientists to access cell slider web page and score tumor markers. Specifically, citizen scientists scored sub-images of tissue microarray cores labelled for estrogen receptor prognosis. | The citizen scientists performed well in identifying cancer (area under ROC curve 0.95, 95% CI 0.94 to 0.96), and estrogen receptor status (0.97, 95% CI 0.96 to 0.97), and was similar to trained pathologists. | |
| Diagnosis | Skin self-examination for melanoma | Conducted a physical crowdsourcing exercises in a mall, recruiting 500 participants and teaching basic skin self-examination techniques. Implemented various thresholds to improve crowd results. | Using a 19% threshold, 90% of melanomas were identified and 72% of non-melanomas, and with a 65% threshold, 67% of melanomas were identified and 100% of non-melanomas. Authors recommend the 19% threshold. | |
| Diagnosis | Diagnosing medical images | Because there is a lack of high-level experts in rural China, the authors investigated whether crowdsourcing could be used to diagnose medical images. 2nd- or 3rd-year graduate students with a medical imaging major participated. | The average accuracy was 39.54%, with the best student only making the correct diagnosis 50% of the time. Using a machine learning algorithm with majority voting, combined with crowdsourcing, which learns the students’ mistakes, accuracy can increase to 80%. | |
| Diagnosis | Diagnosing medical illnesses | Investigated the feasibility of using crowdsourcing for diagnosing medical conditions with case descriptions of varying difficulty, posted on Amazon Mechanical Turk, O’Desk, and web forums. | Web forums were ineffective. Turkers diagnosed easy to diagnose cases. O’Desk workers were able to diagnose easy cases, but were more likely to express caution when providing diagnoses for any complicated cases, and some refused to provide diagnoses. | |
| Diagnosis | Point-of-care problem solving for clinicians | Authors reports clinicians’ experiences using a crowdsourcing application for point-of-care problem solving, where clinicians post problems via an app and these are answered by verified users and viewable by users in that user’s provider group | Over 80% of respondents felt that app could have a positive impact on patient care, medical education, referrals, and difficult diagnoses. Both non-users and users were surveyed, and non-users were more concerned about potential to disrupt workflow. | |
| Diagnosis | Increasing diagnostic accuracy among junior physicians | Developed and piloted a web-based crowdsourcing software to enable junior doctors to upload cases and receive feedback from expert physicians, with an element of gamification using reward points. | The web interface improved diagnostic ability of junior clinicians, but senior clinicians were less actively involved, due to workload, time, availability and reluctance to embrace the new technology. | |
| Surveillance | Review of participatory epidemiology | The author provides a review of participatory epidemiology, including FrontlineSMS, Ushahidi, GeoChat, Asthmapolis, and Outbreaks Near Me. | While, at the time of this review, participatory epidemiology platforms were relatively new, there were already palatable benefits. | |
| Surveillance | Online self-reported influenza | Volunteer users filled in a short survey regarding flu symptoms, and enrolling family members. Volunteers enter information weekly, and a map of influenza is available to them. | 9300 users in August 2012 throughout the US. | |
| Surveillance | Global disease surveillance | Describes a smartphone app, Click Clinica, to increase the identification of infectious diseases globally. App contains clinical guidelines, and questions to confirm diagnosis and resistance information. | When ‘live’ for one month, app had already been downloaded over 1000 times and 600 disease notifications had been added. Data was treated as most trusted depending on information provided by submitter (ie, if email, contact details submitted). | |
| Surveillance | Disease outbreak monitoring | Uses Lady Health Workers in rural areas of Pakistan to report via SMS health information to an electronic disease monitoring system (Jaroka TeleHealthcare System), which provides geospatial location of patients for doctors, medical experts and health officials. | The program was able to display regional patterns for diseases, as well as a disease outbreak that was due to a mass migration of internally displaced persons. The authors reported that the program helps identify whether an epidemic is imminent. | |
| Surveillance | Dengue surveillance | Reports on an app, “Mo-Buzz,” which contains predictive surveillance, civic engagement, and health communication. Citizens use the app or social media to report breeding sites, symptoms and mosquito bites. Using this information, tailored health messages are delivered to individuals living in hot spots. Predictive surveillance predicts outbreaks using this information, combined with weather and other data. | The paper discusses some difficulties with the app, including verifying images due to clarity and receiving multiple images/submissions of the same breeding site. It does not report on the impact of the app on dengue outbreaks. | |
| Surveillance | Malaria surveillance in India | Amazon Mechanical Turk was used to solicit self-reports about malaria diagnosis and related information. | Authors gained information of distribution of malaria species in India, and estimated burden, which coincided with official public health reports. | |
| Identifying erroneous global burden of disease estimates | Surveillance | A crowdsourcing platform was designed, comparing the effect of gamification, to identify erroneous estimates in the global burden of disease database. | Overall, the classifications were matched to a GBD expert 86% of the time. Adding gamification increased accuracy significantly, with gamified users identifying 1.7 times more trends than those using a standard (non-gamified) interface. | |
| Nutrition | Restaurant reviews to identify foodborne illness | Used poor Yelp reviews, specifically those using the words sick, vomit, diarrhea, or food poisoning, to identify food poisoning in New York City restaurants | 3 cases of outbreaks met the Department of Health and Mental Hygiene criteria for a food outbreak that were previously unreported. | |
| Nutrition | Using Yelp reviews to correlated with failed hygiene inspections | Authors review Yelp, examining whether reviews are correlated with failing hygiene inspections. | The authors find that poor Yelp reviews are correlated with having failed hygiene inspections in Seattle. | |
| Nutrition | Healthier food choices using an app | An app, FoodSwitch, uses crowdsourced submissions of food products in Australia. Crowdsourced submissions are scanned by SKU and then labelled red, green, or yellow to make it easier for consumers to identify healthy foods. | FoodSwitch has been downloaded by 400 000 users and more than 30 000 crowdsourced products have been added to the app. | |
| Nutrition | Nutritional analyses using photos of food – “PlateMate” | Uses Amazon Mechanical Turk, and a step-by-step process to estimate calories, fat, carbohydrates, and protein. First, every food item in a photo is tagged, then identified, then measured, each in a separate HIT. | The application’s error rate was not significantly different from MealSnap (another application) or dieticians. Challenges identified include tagging the entire food item (otherwise it may only be partially measured at a latter stage), or correctly identifying the food item. | |
| Nutrition | Nutritional analysis of photos of food – “Eatery App” | Users in the Eatery App post a photo of food, asking other users how healthy it is, and receive crowdsourcing ratings. The goal is to modify diets based on the feedback. | Overall, peer and expert ratings were highly correlated across the US and Europe. Several food categories led to higher healthiness scores among peers (fruit, vegetables, whole grains, legumes, nuts and seeds) and lower healthiness scores among peers (fast food, refined grains, red meat, cheese, savory snacks, desserts, and sugar-sweetened beverages) | |
| Nutrition | Identifying calories in meals using a smartphone | A mobile application was used to pilot the feasibility of a smartphone app for crowdsourcing with non-experts to identify calories. Training was provided to non-experts, who were asked a month later to estimate calories of food using a photo. A crowd of experts and non-experts was investigated. | Both the crowd of experts and the crowd of non-experts outperformed individual experts or individual non-experts. The expert group estimated the calories significantly more accurately than the non-expert group. | |
| Nutrition | Predictors of obesity | Participants recruited via reddit, and asked to pose and answer questions regarding childhood predictors of adult BMI for the purpose of generating predictors for a statistical model. Users answered previous questions and posed new ones. | Final sample of 532, 56 new questions identified, 16 of which were highly correlated with adult obesity. Exploratory factor analysis identified 4 factors (home environment, psychosocial well-being, healthy lifestyle, and family history and biological factors). Study identified well-known predictors, but also predictors that had not been well-studied previously. Data collection was rapid. | |
| Nutrition | Predictors of behavioural outcomes | Participants recruited via reddit were asked to pose and answer questions regarding predictors of adult obesity and energy consumptions. The questions and their answers were used to develop predictors in a statistical model. | Despite having a low number of participants in the energy sample, authors were able to develop a predictive model that showed that number of adults in the home and ownership of hot water in the home and an electric heater were predictive. The predictive model for BMI showed demographic, social, economic, genetic, psychological, dietary, and physical-activity related factors. | |
| Public Health & Environment | Measuring second-hang smoking in vehicles | Developed an app to be used while driving (by passengers) to measure smoking in other vehicles. Smartphone collects data on number of cars passing (denominator) and user inputs when he/she sees a person smoking in the car, and if so, whether there are other occupants and if occupants are children | A smoking prevalence of was 2.9% in New Zealand, was collected from 66 registered users. These results were similar to a study in 2011. | |
| Public Health & Environment | Point-of-Sale Tobacco | Uses Amazon Mechanical Turk and image annotation, with micro-tasks (using a zoom feature) to identify point-of-sale tobacco advertising, and compared to field-raters. | Found excellent inter-rater agreement, with AUC averaging over 0.95 (with sensitivity analyses). Author recommends further testing of photograph annotation tools in future work. | |
| Public Health & Environment | Point-of-Sale Tobacco | Authors used Gigwalk, which is a mobile crowdsourcing application, to request workers to physically conduct point-of-sale tobacco monitoring. Workers were provided with a manual, but no training, and their work was compared to trained data collectors. | There was extremely high agreement between the crowd and trained data collectors on most measures, so much so that kappa couldn’t be computed in some instances as agreement was perfect. | |
| Public Heath & Environment | Built environment surveillance | Used crowdsourcing to annotate and evaluate captured scenes from 23 000 webcams. | Once annotated, study found changes in behaviour after changes in built environment. | |
| Public Health & Environment | Air pollution sensors | Authors describe two projects where sensors are provided to citizens and linked to smartphones, measuring air pollution. The hope is that the sensors will change behaviour patterns, causing citizens to avoid polluted areas, while also producing a map of pollution for cities in which sensors are in use. | The results of these projects were not described. | |
| Public Health & Environment | Testing multilingual health promotion | Using Amazon Mechanical Turk, the authors tested promotional health materials with both native English and native Spanish speakers. | Authors were able to reach a more diverse population than with traditional data collection methods, more quickly, and for less cost. They were able to gain more nuanced suggestions to tailor their materials to different populations. | |
| Public Health & Environment | Including youth perspectives in HIV/AIDS messaging | A website, CrowdOutAIDS.org, was created to enable youth to be involved in shaping policies, to set priorities and influence actions at UNAIDS. The website intended to connect a community of young people and collect their experiences and ideas and to provide a means to synthesize the information collected from youth globally. Questions were asked in community and online forums in order to ensure youth without internet access were able to participate. | UNAIDS was able to collect information across the globe, highlight similarities and differences from youth both globally and within regions, and to enable youth to influence their policy. | |
| Public Health & Environment | Mapping Automated External Defibrillators (AEDs) | Developed a crowdsourcing challenge to map AEDs in Philadelphia, which was advertised on TV and radio. Contestants registered via Web or an app, and photographed AED locations (along with AED information) around the city for a chance to win US$ 10 000 prize. | Study lasted 8 weeks. 313 teams and individuals participated and those >40 submitted more entries than younger participants. 1429 submissions were received. | |
| Public Health & Environment | Contest for promotional videos HIV testing programs | Authors launched a contest for promotional videos to encourage HIV testing in China. | Seven eligible videos were received in 2 mo. Videos were judged on reaching untested individuals, engaging the community, and creating excitement around HIV testing. | |
| Education | Study materials for medical students | The authors used Google Drive and Java to enable students in the preclinical medicine program to continuously submit and collaboratively edit study questions throughout the course. Prior to the exam, Java turned the study questions into flashcards. | 16 150 study questions were created, and the students in that year outperformed students of the previous year in all exams. | |
| Genetics | Genetic prediction of Rheumatoid Arthritis | Authors describe a challenge by DREAM and SAGE, which is a crowdsourcing competition, to develop genetic predictors of response to immunosuppressive therapy in rheumatoid arthritis. | N/A (description was the challenge, but not the results) | |
| Genetics | Detecting somatic mutations from cancer genomes | Describes a DREAM challenge, which is a crowdsourcing competition, to detect somatic mutations from cancer genomes. Employed Google Cloud, and had a public, real-time leaderboard. | Received 248 submissions from 21 teams over approximately 6 mo. The leaderboard enabled teams to improve submissions once they had an initial performance estimate. Finally, authors aggregated submissions of best-performing teams. | |
| Genetics | Human gene annotation | Dizeez is a crowdsourcing game where players match genes with a clue of the disease; players receive points for selecting the correct disease-gene match. Players can select a specific disease area or protein family. Annotations that are reported across multiple players receive the highest confidence scores. | In 9 mo, 6, 941 unique gene-disease assertions were generated from Dizeez; 2137 were not found in any gene-disease databases (OMIM, PharmGKB, or Gene Wiki). 17 of these associations occurred more than 7 times; these were statistically significant. Authors examined these through a manual literature search and found evidence 14. | |
| Genetics | Gene mutation relations | Authors used Amazon Mechanical Turk to judge associations between genes and mutations. Genes were taken from the GenNorm system, and mutations from the Extractor of Mutations system. Genes and mutations mentioned in Pubmed were included, and Turkers were provided with the abstract(s) mentioning the gene and mutations and had to judge if they were related. | The authors explored quality control methods, including repeating experiments on the same HIT and aggregating those results, and eliminating any Turker whose performance fell below 50% accuracy on control items. When these were implemented, accuracy of 89.9% was achieved for a cost of US$ 0.76 | |
| Genetics | Exploring relationship between genes and social intelligence | Using results from personal genomics (My Quantified Self), and through tracking personal behavior, the authors explore the relation between the OXTR gene and social intelligence using personality testing | The authors’ results were not statistically significant and need more power; however, the authors’ initial results were in a different direction than hypothesized. Individuals with the AG genotype have lower EQ/IRI values than those with AA, and that an increase in the A allele’s frequency corresponds to decreased optimism. | |
| Genetics | Incidental findings in GWAS studies | Authors propose using crowdsourcing to solve the problem of reporting incidental findings to populations who have participated in GWAS studies, given that new knowledge of genetic diseases is being discovered. | Proposed system: authors propose a binning system, where crowd sorts findings into clinically actionable, clinically valid but not actionable, or no known clinical significance. | |
| Psychology | Investigating whether Amazon Mechanical Turk is applicable for mental health | Amazon Mechanical Turk was used, restricted to US residents, to explore whether an AMT population would be a viable research tool for mental health studies. Participants were followed up one week later. Fabrication of mental health symptoms was investigated. | The authors found that Turkers were younger, more educated, white, and more likely to be middle-class compared to the general population. The frequency of trauma exposure and depression. A high proportion of Turkers had clinically relevant anxiety symptoms, but this mirrors previous studies of active internet users. The data were deemed reliable, but authors recommend similar studies in other countries. | |
| Psychology | Measuring depression in populations via social media | Uses Amazon Mechanical Turk to obtain a survey on depression, and a self-report of history of depression. The Turkers could opt-in to share their social media handles. Handles that were shared were data mined within a three-month period. | The authors characterize differences between depressed and non-depressed individuals, including time of posting, emotion, linguistic style, engagement and ego-network. These are used to create a social media depression index that could be used to predict risk of depression based on social media posts for other users. | |
| Psychology | Advice for people living with autism | Authors wanted to explore using crowdsourcing for advice with people with autism, compared to in-group advice for the same. Questions were selected from online help forums, and authors uploaded those questions to Amazon Mechanical Turk. | Authors received responses within hours, and paid US$ 90 for 400 responses. Out-group responders (those without autism) were more direct in advice, provided superior informational value, and more helpful answers than the in-group responses. | |
| Psychology | Social media surveillance | Used Amazon Mechanical Turk workers to write 20 alternative sentences for each life satisfaction statement. Statements were then used to data mine Twitter. | 1000 statements were collected in 5 d for less than US $10. | |
| General medicine/ Other | Collateral damage of breast cancer treatment | A webpage was designed to collect information regarding collateral damage of breast cancer treatment from survivors. | 1191 responses were collected. While many issues reported were known side effects, some issues were not commonly reported. | |
| General medicine/ Other | Ovarian cancer awareness | Using Amazon Mechanical Turk, the authors conduct a survey to explore awareness of ovarian cancer, using breast cancer as a control. | Knowledge of ovarian cancer was low among the population studied (which was a US population). | |
| General medicine/ Other | Gene selection for breast cancer survival | Created a game called “the Cure” which trained players prior to entering the main gaming area. Once in this area, gamers play against an automated opponent, selecting genes in a decision tree classifier, with the aim of surviving. | The authors divided players into ‘experts’ and ‘inexperienced’ and found that the expert group and considering both groups together significantly enriched knowledge for cancer related diseases, while the inexperienced group’s results did not. | |
| General medicine/ Other | Evaluation of medical pictograms | Amazon Mechanical Turk was used to obtain judgement of meaning behind medical pictograms. | Comprehensibility scores were calculated, which ranged from 45% to 98%, and correlated strongly to those in another study uses oral responses with the same pictograms. Misinterpretations were judged to be based on errors within the pictograms themselves, not with the Turkers’ abilities. | |
| General medicine/ Other | Fact extraction from scientific literature | Authors present a conceptual framework for scientific fact extraction from literature in different disciplines, to assist researchers who are conducting cross-disciplinary research. | N/A | |
| General medicine/ Other | Extracting annotation from medical text | “Dr. Detective” is a game that uses medical experts as a crowd, and is designed to extract annotation and solve disagreements in medical text. | The results from the crowd were comparable to those of natural language processing parser. | |
| General medicine/ Other | Semantic tagging of medical documents | Used CrowdFlower, and uploaded SNOMED CT relationships and a definition; the crowd was asked whether this was true or false. Experts were also asked to evaluate relationships. | 200 SNOMED CT relationships were evaluated (each by 25 workers). The experts and crowd responses were nearly indistinguishable. Errors were identified, which is concerning regarding the biomedical ontologies within SNOMED CT. | |
| General medicine/ Other | Natural language processing | Using CrowdFlower, crowdsourcing medication names, types and linked attributes of clinical trials that were randomly selected from ClinicalTrials.gov | High agreement between crowd’s annotations for medication names and types, correction of previous annotations and linking medications with their attributes. The authors found that simple voting provided the best form of aggregation. | |
| General medicine/ Other | Adverse drug reactions | Used Amazon Mechanical Turk to rank severity of adverse drug reactions (ADRs), which were retrieved from the SIDER2 database. Turkers were provided with 10 pairwise comparisons of ADRs and were asked to select which is worse. | ADRs ranked as more serious by Turkers were also associated with more deaths in the FDA adverse events reporting system. | |
| General medicine/ Other | Drug indication curation | Used Amazon Mechanical Turk to curate drugs indications. HITs were simplified by asking Turkers to make a judgement of whether a drug label is indicated for a disease, which is highlighted. | 3000 HITs were posted from 706 drug labels in 8 h. The aggregated accuracy was 96%, and the total cost was US$ 1.75 per drug label, which is substantially less expensive that traditional alternatives. | |
| General medicine/ Other | Black market prices for prescription opioids | Uses StreetRx (a crowdsourcing website) to obtain prices of prescription opioids. Visitors to the website anonymously post the price they paid for prescription opioids and where they were purchased (and are able to see similar purchases and prices). | 954 reports were obtained through the website. These were compared to prices provided through law enforcements and through the dark web. The prices were highly correlated between the three. | |
| General medicine/ Other | Physical crowdsourcing of mosquito samples | Authors collected physical samples of diptera culcidae mosquito through crowdsourcing methods. | Authors received 110 shipped samples of mosquitos, 60% of which came from individuals unknown to laboratory members. Mosquitos came from areas that were difficult to reach. | |
| General medicine/ Other | Logistical deliveries via crowdsourcing | Authors propose a distribution method using the mobility of the local population, and using information gained by cell towers. Participants would exchange packages at a point they normally visit, at a time they normally visit it. | Authors piloted their method, but did not describe it well. | |
| General medicine/ Other | Reference correspondence in endoscopic images during minimally invasive surgery | Used Amazon Mechanical Turk to find sets of corresponding points in endoscopic images, the results of which were compared to medical students and experts. | The experiment took 77 ± 16 min for 100 HITs. The authors note that 10 000 annotations could be generated in 24 h. Using a clustered analysis, the authors obtained an accuracy that outperformed 4 of 5 experts. | |
| General medicine/ Other | Viability of crowdsourcing for organizational survey research | Used Amazon Mechanical Turk to collect basic demographic information and information on internet knowledge, computer attitudes and knowledge, goal orientation, and personality. The results were compared to a control, which was a traditional psychology participant pool. | Both samples were similar in demographic characteristics; however, the crowdsourcing sample was more diverse in education, employment status and profession. There was slightly better social desirability and reliability in the crowdsourced data. The authors conclude that crowdsourcing is a good data pool for organizational research. | |
| General medicine/ Other | Clinical trial protocols | A clinical trial protocol was crowdsourced for input from physicians and patients. | 43 physicians and 33 patients took part in the crowdsourcing process to inform development of the clinical trial’s protocol. | |
| General medicine/ Other | Health care priority setting | Uses Amazon Mechanical Turk to identify health care priorities, asking ‘what should be your priority when treating disease.’ Turkers were asked to distribute 100 points among 5/8 questions (which were randomly assigned) | Dimensions identified include: scale of disease, household financial effects, social equity, cost-effectiveness, spillover effects. It is unclear from the manuscript which rated most important. | |
| General medicine/ Other | Healthcare costs | Suggests crowdsourcing health care costs as a response to higher health care costs for out-of-pocket health care consumers than those insured in the US. Specifically, suggests hosting a website where users can gain access by posting their (de-identified) medical bills. | N/A |