| Literature DB >> 30546170 |
Jefferson Seide Molléri1, Kai Petersen1, Emilia Mendes1.
Abstract
The importance of achieving high quality in research practice has been highlighted in different disciplines. At the same time, citations are utilized to measure the impact of academic researchers and institutions. One open question is whether the quality in the reporting of research is related to scientific impact, which would be desired. In this exploratory study we aim to: (1) Investigate how consistently a scoring rubric for rigor and relevance has been used to assess research quality of software engineering studies; (2) Explore the relationship between rigor, relevance and citation count. Through backward snowball sampling we identified 718 primary studies assessed through the scoring rubric. We utilized cluster analysis and conditional inference tree to explore the relationship between quality in the reporting of research (represented by rigor and relevance) and scientiometrics (represented by normalized citations). The results show that only rigor is related to studies' normalized citations. Besides that, confounding factors are likely to influence the number of citations. The results also suggest that the scoring rubric is not applied the same way by all studies, and one of the likely reasons is because it was found to be too abstract and in need to be further refined. Our findings could be used as a basis to further understand the relation between the quality in the reporting of research and scientific impact, and foster new discussions on how to fairly acknowledge studies for performing well with respect to the emphasized research quality. Furthermore, we highlighted the need to further improve the scoring rubric.Entities:
Keywords: Conditional inference tree; Empirical software engineering; Exploratory study; Reporting of research; Research practice; Scientific impact
Year: 2018 PMID: 30546170 PMCID: PMC6267265 DOI: 10.1007/s11192-018-2907-3
Source DB: PubMed Journal: Scientometrics ISSN: 0138-9130 Impact factor: 3.238
Scoring rubric for evaluating rigor (Ivarsson and Gorschek 2011)
| Aspect | Strong description (1) | Medium desc. (0.5) | Weak desc. (0) |
|---|---|---|---|
| Context (C) | The context is described to the degree where a reader can understand and compare it to another context. This involves description of development mode, e.g., contract driven, market driven etc., development speed, e.g., short time to market, company maturity, e.g., start-up, market leader etc. | The context in which the study is performed is mentioned or presented in brief but not described to the degree to which a reader can understand and compare it to another context | There appears to be no description of the context in which the evaluation is performed |
| Study design (SD) | The study design is described to the degree where a reader can understand, e.g., the variables measured, the control used, the treatments, the selection/ sampling used etc. | The study design is briefly described, e.g. “ten students did step 1, step 2 and step 3” | There appears to be no description of the design of the presented evaluation |
| Validity threats (V) | The validity of the evaluation is discussed in detail where threats are described and measures to limit them are detailed. This also includes presenting different types of threats to validity, e.g., conclusion, internal, external and construct | The validity of the study is mentioned but not described in detail | There appears to be no description of any threats to validity of the evaluation |
Scoring rubric for evaluating relevance (Ivarsson and Gorschek 2011)
| Aspect | Contribute to relevance (1) | Do not contribute to relevance (0) |
|---|---|---|
| Users/subjects (U) | The subjects used in the evaluation are representative of the intended users of the technology, i.e., industry professionals | The subjects used in the evaluation are not representative of the envisioned users of the technology (practitioners). Subjects included on this level is: |
| Context (C) | The evaluation is performed in a setting representative of the intended usage setting, i.e., industrial setting | The evaluation is performed in a laboratory situation or other setting not representative of a real usage situation |
| Scale (S) | The scale of the applications used in the evaluation is of realistic size, i.e., the applications are of industrial scale | The evaluation is performed using applications of unrealistic size. Applications considered on this level is: |
| Research method (RM) | The research method mentioned to be used in the evaluation is one that facilitates investigating real situations and that is relevant for practitioners. Research methods that are classified as contributing to relevance are: (i) Action research, (ii) Lessons learned, (iii) Case study, (iv) Field study, (v) Interview, and (vi) Descriptive/exploratory survey | The research method mentioned to be used in the evaluation does not lend itself to investigate real situations. Research methods classified as not contributing to relevance are: (i) Conceptual analysis, (ii) Conceptual analysis/mathematical, (iii) Laboratory experiment (human subject), (iv) Laboratory experiment (software), (v) Other, and (vi) N/A |
Candidate measures for the investigated criteria
| Feature | Options | Consequences |
|---|---|---|
| Rigor and relevance | CASP Qualitative Checklist (Casp | Address the rigor, credibility, and relevance issues through ten questions. The questions are not mapped to the quality dimensions. It was developed for Evidence-Based Medicine and is broadly applied |
|
Dybå and Dingsøyr ( | Address context, rigor, credibility, and relevance criteria. There is only one relevance question addressing the value provided for research or practice | |
|
Ivarsson and Gorschek ( | Address rigor and relevance through 3 and 4 questions, respectively. Results are computed in an ordinal scale | |
| Impact | Absolute number of citations (Adler et al. | Not appropriate to compare papers with distinct ages (i.e., published in different years) |
| Average number of citations (Adler et al. | Citations are not equally distributed over the years | |
| Impact factor (Adler et al. | Journal-level metric. Provides no information on specific paper |
Candidate papers for the exploratory study
| ID | Paper type | Assessment scores | Primary studies | Data origin | Missing scores |
|---|---|---|---|---|---|
| Journal (JSS) | Detailed | 87 | Collected from the paper | Relevance: context and research method | |
| Journal (IST) | Detailed | 43 | Collected from the paper | ||
| Journal (JSS) | Detailed | 58 | Asked by mail | Values reported as N/A instead of 0 for Relevance: Users/Subject and Scale | |
| Journal (IST) | No | – | |||
| Journal (IST) | No | – | |||
| Journal (TSE) | Detailed | 196 | Asked by mail | All (different methods mapped to rigor and relevance scores) | |
| Master thesis | No | – | |||
| Journal (IST) | Detailed | 46 | Collected from the paper | All (different methods mapped to rigor and relevance scores) | |
| Journal (IST) | Detailed | 41 | Collected from the paper | ||
| Journal (IST) | Detailed | 43 | Collected from the paper | ||
| Journal (JSS) | Detailed | 38 | Asked by mail | ||
| Master thesis | No | – | |||
| Master thesis | Overall scores only | 41 | Collected from the thesis | ||
| Master thesis | Detailed | 89 | Collected from the thesis | ||
| Journal (CLEIej) | Detailed | 18 | Collected from the paper | ||
| Ph.D. thesis | Detailed | 18 | Collected from the thesis | ||
|
|
|
|
Fig. 1Process for identification and selection of candidate studies
Fig. 2Distribution of the dataset according to the number of primary studies published in each year. The shading segments of the columns represent the normalized citation counts (cit/year), i.e. a stronger shade means a higher number of citations per year, whereas the lighter ones show the less cited papers. A legend on the right side shows a sample of the shades within the range of minimum and maximum citation counts, zero (0) and 108, respectively
Overview of how the Ivarsson’s and Gorschek (2011) scoring rubrics have been used
| Question | Aspect | Description |
|---|---|---|
| RQ1. How was it applied? | As in the rubrics | Refers Ivarsson and Gorschek ( |
| Details the scoring rules [S3, S11] | ||
| Discuss application issues [S2] | ||
| Interpretation of the scores | Builds objective rules to assess each aspect [S1, S9] | |
| Maps a checklist Dybå and Dingsøyr ( | ||
| Two independent reviewers [S1] | ||
| RQ2. Used as/for... | Quality Assessment | Detailed assessment [S1, S3, S9, S10, S11, S13, S14, S15, S16] |
| Not explicit [S2] | ||
| Objective or Results | Research Question [S3, S9, S11, S16] | |
| Discussion of Results [S1, S2, S10, S13, S14] | ||
| Implication of Findings [S6, S8] | ||
| Study Limitation [S15] |
Dybå and Dingsøyr’s (2008) checklist alignment to Ivarsson and Gorschek’s (2011) scoring rubrics
| Rigor aspects (Ivarsson and Gorschek | Checklist items (Dybå and Dingsøyr |
|---|---|
| Q1: Is there a rationale for why the study was undertaken? | |
|
| Q2: Is there an adequate description of the context (industry, laboratory setting, products used, etc.) in which the research was carried out? |
|
| Q3: Is there a justification and description for the research design? |
| Q4: Does the study provide description and justification of the data analysis approaches? | |
| Q5: Is there a clear statement of findings and has sufficient data been presented to support them? | |
|
| Q6: Did the authors critically examine their own role, potential bias and influence during the formulation of research questions and evaluation? |
| Q7: Do the authors discuss the credibility and limitations of their findings explicitly? |
Fig. 3a Boxplot distribution of the normalized citations per year. The dots at the upper end of the plot denote the outliers, that are distant from the rest of the observations. b Cluster dendrogram of variables in the dataset. The gray line enclosing the variables represents suitable dimensions
Fig. 4Conditional inference tree describing the relationship between rigor and relevance criteria and the normalized citations
Description of the papers contained in the three terminal nodes
| Nodes | Spliting criteria | Impact (cit./year) | Length (pages) | Paper type | ||
|---|---|---|---|---|---|---|
| Journal (%) | Conf. (%) | Others (%) | ||||
| 3 | ri ≤ 0 | 1.79 | 9 | 34.8 | 63.9 | 1.3 |
| 4 | 0 < ri ≤ 2 | 2.79 | 10 | 39.3 | 57.0 | 3.7 |
| 5 | ri > 2 | 5.83 | 11 | 44.0 | 52.6 | 3.4 |
Fig. 5Boxplot and Density plot for the impact (i.e., normalized citations per year) according to the splitting nodes, omitting outliers
Fig. 6CIT excluding factors from S6, S11, and S14