| Literature DB >> 29959110 |
Timothy C Guetterman1, Tammy Chang1,2, Melissa DeJonckheere1, Tanmay Basu3, Elizabeth Scruggs4, V G Vinod Vydiswaran5,6.
Abstract
BACKGROUND: Qualitative research methods are increasingly being used across disciplines because of their ability to help investigators understand the perspectives of participants in their own words. However, qualitative analysis is a laborious and resource-intensive process. To achieve depth, researchers are limited to smaller sample sizes when analyzing text data. One potential method to address this concern is natural language processing (NLP). Qualitative text analysis involves researchers reading data, assigning code labels, and iteratively developing findings; NLP has the potential to automate part of this process. Unfortunately, little methodological research has been done to compare automatic coding using NLP techniques and qualitative coding, which is critical to establish the viability of NLP as a useful, rigorous analysis procedure.Entities:
Keywords: coding; methodology; natural language processing; qualitative research; text data
Mesh:
Year: 2018 PMID: 29959110 PMCID: PMC6045788 DOI: 10.2196/jmir.9702
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Participant demographic information.
| Variable | Drug response (n=48) | Police response (n=59) | Drug or police response (n=66) | |
| Age, mean (SD) | 18.5 (2.2) | 18.3 (2.5) | 18.3 (2.4) | |
| Female | 28 (58.3) | 33 (55.9) | 37 (56.1) | |
| Male | 18 (37.5) | 25 (42.4) | 27 (40.9) | |
| Other | 2 (4.2) | 1 (1.7) | 2 (3.0) | |
| White | 26 (54.2) | 36 (61.0) | 38 (57.6) | |
| Black | 8 (16.7) | 9 (15.3) | 11 (16.7) | |
| Asian | 7 (14.6) | 7 (11.9) | 8 (12.1) | |
| Other (including multiracial) | 7 (14.6) | 7 (11.9) | 9 (13.6) | |
| Hispanica | 0 (0.0) | 3 (6.7) | 3 (6.4) | |
| <High school | 19 (39.6) | 28 (47.5) | 31 (47.0) | |
| High school grade | 7 (14.6) | 5 (8.5) | 7 (10.6) | |
| Some college | 17 (35.4) | 19 (32.2) | 20 (30.3) | |
| College grade (BA+) | 5 (10.4) | 7 (11.9) | 8 (12.1) | |
| High school or less | 1 (3.5) | 2 (4.4) | 2 (4.3) | |
| Some college or 2-year degree | 4 (13.8) | 5 (11.1) | 5 (10.6) | |
| BA but less than Masters | 7 (24.1) | 8 (17.8) | 9 (19.2) | |
| Masters but less than PhD | 9 (31.0) | 18 (40.0) | 19 (40.4) | |
| PhD | 8 (27.6) | 12 (26.7) | 12 (25.5) | |
| Parents | 22 (75.9) | 33 (73.3) | 35 (74.5) | |
| Dorm | 0 (0.0) | 1 (2.2) | 1 (2.1) | |
| Sharing an apartment with other people | 7 (24.1) | 8 (17.8) | 8 (17.0) | |
| Other | 0 (0.0) | 3 (6.7) | 3 (6.4) | |
| 1-3 | 6 (20.7) | 9(20.0) | 10 (21.3) | |
| 4-6 | 17 (58.6) | 29 (64.4) | 29 (61.7) | |
| 7-10 | 5 (17.2) | 7 (15.6) | 7 (14.9) | |
| 11+ | 1 (3.5) | 0 (0.0) | 1 (2.1) | |
| Married or together | 21 (72.4) | 36 (80.0) | 36 (76.6) | |
| Divorced or separated | 7 (24.1) | 7 (15.6) | 9 (19.2) | |
| Other (widowed, unsure) | 1 (3.5) | 2 (4.4) | 2 (4.3) | |
aSample sizes are as follows: Drug response (n=29); police response (n=45); and drug or police response (n=47). Participants were not required to provide demographic information, so the n for respective demographic questions in this table is lower than the total number of participants. Because some responded to both questions, we have 3 columns of demographic information. There are fewer responses for ethnicity, parent’s education, primary living situation, family size, and parent’s marital status due to those questions not being asked to the subset of individuals who had demographics requested twice. The third column displays data for those who responded to at least one question.
Comparison of findings derived from qualitative-only and qualitative followed by natural language processing-augmented approaches to coding for the drug question (n=58). Key aspects of each finding are italicized.
| Theme | Qualitative onlya | Qualitative (natural language processing augmented)b |
| Prescription drugs and illegal drugs | Prescription drugs and illegal drugs are | Prescription drugs and illegal drugs are |
| Danger | ||
| Either Rxc drugs or illegal drugs could be more dangerous based on addictiveness, accessibility, prevalence, overdose, or danger. Side effects: known or unknown. Stigma. | Either Rx drugs or illegal drugs could be more dangerous based on addictiveness, accessibility, prevalence, overdose, or danger. Side effects: known or unknown. Stigma of getting drugs off the street versus discreetness of “popping” Rx pills. | |
| Is the drug safe for everyone or unsafe for some people depending on whether prescription was prescribed to you. | Is the drug safe for everyone or unsafe for some people depending on whether prescription was prescribed to you. | |
| — | What the drugs consisted of. Mixing Rx versus unknown contents of street drugs versus taking Rx you do not know what they are. | |
| Legality | — | Legal more prominent; the |
| — | ||
| — | Mortality was often mentioned (overdose, “something that could kill you”). | |
| “Weed, heroin, cocaine, meth, alcohol, smoking” and using them for comparisons for safety and side effects or addictiveness. |
aTime required (person min): 270.
bTime required (person min): 180.
cRx: prescription medication.
Comparison of findings derived from natural language processing–only and natural language processing followed by qualitative-augmented approaches to coding for the drug question (n=58). Key aspects of each finding are italicized.
| Theme | Natural language processing onlya | Natural language processing (qualitative augmented)a |
| Prescription drugs and illegal drugs | Respondents seemed | Of the 58 respondents, 24 noted that illegal |
| Danger | 11 respondents noted that it | For some, the question of danger |
| It seemed that more wrote that | ||
| — | ||
| — | Several respondents noted that taking a prescription drug that is not yours is | |
| Respondents wrote about | — |
aTime required (person min): 120.
Comparison of findings derived from qualitative-only and qualitative followed by natural language processing-augmented approaches to coding for the police question (n=68). Key aspects of each finding are italicized.
| Theme | Qualitative onlya | Qualitative (natural language processing augmented)b |
| Interactions | For those who had interaction, the majority of participants described | For those who had interaction, the majority of participants described |
| Racism/gender differences | Some individuals wrote about major concerns with | Some individuals wrote about major concerns with |
| Public safety | A small group described police as | A small group described police as |
aTime required (person min): 270.
bTime required (person min): 120.
Comparison of findings derived from different approaches to coding for the police question (n=68). Key aspects of each finding are italicized.
| Theme | Natural language processing onlya | Natural language processing (qualitative augmented)b |
| Number of experiences | Some youth reported | Some youth reported |
| Interaction | Many youths reported good or | Many youths reported good or |
| Situations | Youth can point to | Youth can point to |
| Avoidance | — | |
| Individual characteristics |
aTime required (person min): 40.
bTime required (person min): 180.
Example natural language processing output from the drug dataset.
| Code word | Frequency | Similar words | Data segments |
| Medicines | 98 | Medication, medicine, medicate, prescription, drug | “an illegal drug” |
| Illegal | 48 | — | “is the illegal drug yours” |
| Prescription | 26 | Prescriptions | “addictive than prescription medication” |
| Dangerous | 14 | — | “are equally dangerous” |
| Depends | 11 | — | “it depends on which” |
| More | 9 | — | “can be more addictive than” |
Figure 1Summary of methodological findings. NLP: natural language processing.