| Literature DB >> 32195145 |
Tim Weißer1, Till Saßmannshausen1, Dennis Ohrndorf1, Peter Burggräf1, Johannes Wagner1.
Abstract
Within a systematic literature review (SLR), researchers are confronted with vast amounts of articles from scientific databases, which have to be manually evaluated regarding their relevance for a certain field of observation. The evaluation and filtering phase of prevalent SLR methodologies is therefore time consuming and hardly expressible to the intended audience. The proposed method applies natural language processing (NLP) on article meta data and a k-means clustering algorithm to automatically convert large article corpora into a distribution of focal topics. This allows efficient filtering as well as objectifying the process through the discussion of the clustering results. Beyond that, it allows to quickly identify scientific communities and therefore provides an iterative perspective for the so far linear SLR methodology.•NLP and k-means clustering to filter large article corpora during systematic literature reviews.•Automated clustering allows filtering very efficiently as well as effectively compared to manual selection.•Presentation and discussion of the clustering results helps to objectify the nontransparent filtering step in systematic literature reviews.Entities:
Keywords: Clustering; Literature filtering; Systematic literature review
Year: 2020 PMID: 32195145 PMCID: PMC7078380 DOI: 10.1016/j.mex.2020.100831
Source DB: PubMed Journal: MethodsX ISSN: 2215-0161
Fig. 1Finding communities via iterative search and clustering.
The proposed clustering algorithm used for validation (according to [2]).
Fig. 2SSE for title with search string (left) and SSE for title without search string (right).
Fig. 3SSC for title with search string (left) and SSC for title without search string (right).
Fig. 4SSC for Keywords without search string (left) and SSC for abstract without search string (right).
Fig. 5Two-dimensional representation of generated clusters and their top words based on document titles for k = 13 with search string included in vocabulary.
Fig. 6Two-dimensional representation of generated clusters and their top words based on document titles for k = 13 with search string excluded from vocabulary.
Most relevant cluster with top terms, cluster size and average TFIDF score.
| Cluster No. | Top terms | Cluster size | Avg. TFIDF score |
|---|---|---|---|
| 9 | 10 | 1.1 | |
| 8 | 20 | 0.9 | |
| 7 | 16 | 0.8 | |
| 0 | 16 | 0.9 |
Specification Table
| Subject Area: | |
| More specific subject area: | |
| Method name: | |
| Name and reference of original method: | • Brocke et al. (2009) - Reconstructing the giant: on the rigor in documenting the literature search process |
| Resource availability: |