| Literature DB >> 28603512 |
John M Felt1, Ruben Castaneda1, Jitske Tiemensma1, Sarah Depaoli1.
Abstract
Context: When working with health-related questionnaires, outlier detection is important. However, traditional methods of outlier detection (e.g., boxplots) can miss participants with "atypical" responses to the questions that otherwise have similar total (subscale) scores. In addition to detecting outliers, it can be of clinical importance to determine the reason for the outlier status or "atypical" response. Objective: The aim of the current study was to illustrate how to derive person fit statistics for outlier detection through a statistical method examining person fit with a health-based questionnaire. Design and Participants: Patients treated for Cushing's syndrome (n = 394) were recruited from the Cushing's Support and Research Foundation's (CSRF) listserv and Facebook page. Main Outcome Measure: Patients were directed to an online survey containing the CushingQoL (English version). A two-dimensional graded response model was estimated, and person fit statistics were generated using the Zh statistic.Entities:
Keywords: Cushing's syndrome; CushingQoL; item response theory; person fit; quality of life
Year: 2017 PMID: 28603512 PMCID: PMC5445123 DOI: 10.3389/fpsyg.2017.00863
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1Distribution of . This is not a strict cutoff; it is suggested researchers create a cutoff that best highlights important features of the distribution. For example, if the interest is in assessing over-fit, then the cutoff would be placed at the higher end of the distribution (e.g., around 2).
Item parameter estimates.
| Cush1 | – | 1.034 | 1.748 | 0.146 | −1.314 | −3.380 | – | 0.719 |
| Cush2 | 1.809 | – | 2.285 | 1.017 | −0.396 | −2.174 | 0.875 | – |
| Cush3 | – | 3.819 | 5.261 | 2.564 | −0.772 | −3.875 | – | 0.967 |
| Cush4 | – | 2.059 | 2.858 | 1.143 | −0.806 | −2.842 | – | 0.900 |
| Cush5 | 1.403 | – | 2.981 | 1.173 | −0.640 | −3.407 | 0.814 | – |
| Cush6 | 2.248 | – | 2.321 | 0.480 | −1.537 | −3.685 | 0.914 | – |
| Cush7 | 2.067 | – | 1.239 | −0.193 | −1.702 | −3.673 | 0.900 | – |
| Cush8 | 2.634 | – | 3.052 | 0.417 | −1.331 | −3.739 | 0.935 | – |
| Cush9 | 3.233 | – | 2.850 | 0.847 | −1.374 | −3.645 | 0.955 | – |
| Cush10 | 3.268 | – | 2.358 | 0.676 | −1.427 | −3.931 | 0.956 | – |
| Cush11 | 1.593 | – | 1.812 | −0.359 | −2.677 | −4.598 | 0.847 | – |
| Cush12 | 1.697 | – | 0.561 | −0.995 | −3.490 | −5.433 | 0.862 | – |
M2.
Person fit values and their corresponding response patterns for most misfitting, well fitting, and overfitting persons.
| −3.92791 | 2 | 4 | 5 | 4 | 3 | 3 | 2 | 5 | 5 | ||||
| −3.79973 | 5 | 2 | 2 | 1 | 1 | 5 | 5 | 3 | 1 | ||||
| −3.52911 | 3 | 4 | 5 | 3 | 1 | 5 | 3 | 5 | 4 | ||||
| −3.47521 | 5 | 1 | 1 | 2 | 1 | 5 | 2 | 1 | 1 | ||||
| −2.94507 | 1 | 1 | 5 | 4 | 4 | 2 | 2 | 1 | 2 | ||||
| −2.79318 | 1 | 3 | 4 | 4 | 5 | 1 | 2 | 4 | 1 | ||||
| −2.54869 | 1 | 4 | 3 | 5 | 2 | 2 | 1 | 2 | 4 | ||||
| −2.39281 | 2 | 1 | 1 | 3 | 5 | 1 | 1 | 1 | 1 | ||||
| −2.24675 | 3 | 3 | 3 | 4 | 3 | 3 | 1 | 3 | 4 | ||||
| −2.21036 | 3 | 4 | 3 | 2 | 4 | 3 | 1 | 5 | 4 | ||||
| −0.03243 | 4 | 5 | 4 | 4 | 5 | 3 | 5 | 4 | 3 | ||||
| −0.02635 | 1 | 3 | 4 | 3 | 2 | 1 | 1 | 1 | 1 | ||||
| −0.00813 | 4 | 3 | 4 | 2 | 3 | 2 | 4 | 2 | 1 | ||||
| −0.0019 | 1 | 3 | 1 | 1 | 1 | 1 | 1 | 3 | 1 | ||||
| 0.007965 | 5 | 4 | 2 | 5 | 5 | 5 | 5 | 3 | 3 | ||||
| 0.02727 | 5 | 4 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | ||||
| 0.02917 | 5 | 3 | 1 | 1 | 3 | 2 | 3 | 1 | 1 | ||||
| 0.033304 | 3 | 4 | 4 | 4 | 4 | 3 | 2 | 1 | 1 | ||||
| 0.036985 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||||
| 0.041895 | 4 | 5 | 3 | 5 | 5 | 3 | 4 | 3 | 3 | ||||
| 1.944592 | 4 | 4 | 3 | 3 | 2 | 3 | 3 | 3 | 3 | ||||
| 1.984945 | 3 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | ||||
| 1.98928 | 3 | 3 | 3 | 4 | 3 | 3 | 3 | 2 | 2 | ||||
| 1.99864 | 4 | 4 | 3 | 2 | 4 | 4 | 4 | 3 | 3 | ||||
| 2.070151 | 4 | 3 | 3 | 3 | 4 | 4 | 3 | 3 | 3 | ||||
| 2.098399 | 4 | 3 | 3 | 2 | 3 | 3 | 3 | 3 | 2 | ||||
| 2.155638 | 4 | 4 | 4 | 3 | 3 | 4 | 4 | 3 | 3 | ||||
| 2.196376 | 3 | 3 | 3 | 2 | 2 | 2 | 2 | 2 | 1 | ||||
| 2.202284 | 5 | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 3 | ||||
| 2.221422 | 4 | 4 | 4 | 3 | 4 | 4 | 4 | 3 | 3 | ||||
Bold font represents items relating to the second factor. Zh-values can be interpreted using the z-scale distribution. For example, a respondent with a Zh-value of −1 shows a response pattern that deviates from the average response by one standard deviation. In addition, a respondent with −2 indicates his or her response deviates by two standard deviations from the average response pattern.
Statistical Code for outlier detection using the Person Fit method.
| First, you have to download and install the free R program. Just copy and paste the URL in the left column to your web browser. The following code (in the left column) can all be copy and pasted into the R script after it is downloaded. | |
| install.packages(“mirt”) library(mirt) | Once you have downloaded R, you have to install and load the package to estimate item response theory models. The install.packages(“mirt”) command tells R that you want to install the multidimensional item response theory (mirt) package. After doing this, R will ask you to select a CRAN mirror (i.e., location to download the program from). Just select the location closes to where you are implementing the analysis. After the package has installed, you can load it to R using library(mirt). |
| data <- read.csv(“C:/Users/name/desktop/Dataset.csv,” header = T) | Next, you will have to load your data into R. Data files can be uploaded from a variety of files including .txt, .csv., SPSS data files, SAS data files, or Stata data files (see |
| cfa2 <- mirt.model(“P = 2, 5–12 S = 1, 3, 4 COV = P*S”) | This line of code is how you specify which items belong to which subscale. If your questionnaire is just a single total score, skip this part. Here, we saved our subscales to the object we called “cfa2.” We specified the Physical subscale (P) as containing items 2, and 5–12, and the PsychoSocial subscale(S) as containing items 1, 3, and 4. We also indicated that both subscales are correlated with the COV = P*S command. |
| mod1 <- mirt(data, cfa2, itemtype = “graded”) | This line of code is how you estimate the item response theory model. We have saved the model to an object called mod1. Within the mirt command, the first thing you do is specify your data set. Here, we put “data” because that is what we named our dataset earlier. Next, we specify any subscales (cfa2). If your questionnaire has no subscales, you would put the number 1 instead, indicating a single score. itemtype = “graded” indicates that the items are ordered categorical (i.e., Likert-type scales). If the items on your questionnaire only contain two responses, you would specify itemtype = “2PL.” |
| pfit <- personfit(mod1) | The person fit function allows you to generate person fit scores for each subject. Here, we saved this information in an object we called pfit. |
| PFdata <- cbind(pfit, data) | Next, we wanted to add the person fit scores for each subject to the data set. To do this, we used the cbind function in R to add the column of person fit scores to the original data set. We then saved this combined dataset to the object PFdata. |
| sorted.dat <- PFdata[order(PFdata[,1]) | After combining the person fit scores to the data files, you will want to inspect the data. To make this easier, you can sort the data by the person fit scores ( |
| head(sorted.dat,20) tail(sorted.dat,20) | You can choose to look at the most misfitting response and the most “overfitting” responses using the head or tail commands. Here, we specified the first 20 responses (head) and the last 20 responses (tail). |
| hist(sorted.dat[,1]) | You can look at a histogram of your person fit scores ( |