Literature DB >> 34179750

Improving the Delivery of Function-Directed Care During Acute Hospitalizations: Methods to Develop and Validate the Functional Assessment in Acute Care Multidimensional Computerized Adaptive Test (FAMCAT).

Andrea L Cheville¹, Chun Wang², Kathleen J Yost³, Jeanne A Teresi^4,5, Mildred Ramirez⁴, Katja Ocepek-Welikson⁴, Pengsheng Ni⁶, Elizabeth Marfeo⁷, Tamra Keeney⁸, Jeffrey R Basford¹, David J Weiss⁹.

Abstract

OBJECTIVE: To (1) develop a patient-reported, multidomain functional assessment tool focused on medically ill patients in acute care settings; (2) characterize the measure's psychometric performance; and (3) establish clinically actionable score strata that link to easily implemented mobility preservation plans.
DESIGN: This article describes the approach that our team pursued to develop and characterize this tool, the Functional Assessment in Acute Care Multidimensional Computer Adaptive Test (FAMCAT). Development involved a multistep process that included (1) expanding and refining existing item banks to optimize their salience for hospitalized patients; (2) administering candidate items to a calibration cohort; (3) estimating multidimensional item response theory models; (4) calibrating the item banks; (5) evaluating potential multidimensional computerized adaptive testing (MCAT) enhancements; (6) parameterizing the MCAT; (7) administering it to patients in a validation cohort; and (8) estimating its predictive and psychometric characteristics.
SETTING: A large (2000-bed) Midwestern Medical Center. PARTICIPANTS: The overall sample included 4495 adults (2341 in a calibration cohort, 2154 in a validation cohort) who were admitted either to medical services with at least 1 chronic condition or to surgical/medical services if they required readmission after a hospitalization for surgery (N=4495). INTERVENTION: Not applicable. MAIN OUTCOME MEASURES: Not applicable.
RESULTS: The FAMCAT is an instrument designed to permit the efficient, precise, low-burden, multidomain functional assessment of hospitalized patients. We tried to optimize the FAMCAT's efficiency and precision, as well as its ability to perform multiple assessments during a hospital stay, by applying cutting edge methods such as the adaptive measure of change (AMC), differential item functioning computerized adaptive testing, and integration of collateral test-taking information, particularly item response times. Evaluation of these candidate methods suggested that all may enhance MCAT performance, but none were integrated into initial MCAT parameterization.
CONCLUSIONS: The FAMCAT has the potential to address a longstanding need for structured, frequent, and accurate functional assessment among patients hospitalized with medical diagnoses and complications of surgery.

Entities: Chemical

Keywords: AM-PAC, Activity Measure of Post-Acute Care; AMC, Adaptive Measurement of Change; Activities of daily living; CAT, computerized adaptive testing; Cognition; DIF, differential item functioning; EHR, electronic health record; FAM, Functional Assessment for Acute Care Multidimensional; FAMCAT, Functional Assessment in Acute Care Multidimensional Computer Adaptive Test; HIPAA, Health Insurance Portability and Accountability Act of 1996; IRT, item response theory; MCAT, multidimensional computerized adaptive testing; MGRM, multidimensional graded response model; MIRT, multidimensional item response theory; PAC, postacute care; PH, physical function; PROM, patient-reported outcome measure; PROMIS, Patient-Reported Outcomes Measurement Information System; Rehabilitation; SF, short form

Year: 2021 PMID： 34179750 PMCID： PMC8212002 DOI： 10.1016/j.arrct.2021.100112

Source DB: PubMed Journal: Arch Rehabil Res Clin Transl ISSN： 2590-1095

Aging, frailty, and chronic disease account for more than 80% of United States health care spending, with the cost of care doubling for people with impaired mobility. Increasing attention is being devoted to an important aspect of this serious problem: hospitalization rarely addresses and often accelerates the progressive functional losses of these groups.2, 3, 4, 5, 6 Most importantly, a majority recover slowly, if at all, from hospital acquired functional losses and are consequently placed at a markedly increased risk of falls, institutionalization, rehospitalization, and even death.7, 8, 9, 10, 11 Tellingly, these losses have contributed to a more than doubling of postacute care (PAC) spending in the past decade. Hospital-based rehabilitation has been proven to slow or prevent these losses, but its provision has been limited by human resource constraints and challenges in providing the right services to the right patient. In fact, a minority of patients who could benefit from rehabilitation services actually receive them. For example, many patients referred for physical therapy are never seen, and with the exception of specialized populations (eg, stroke, spinal cord injury, hip fractures), extended delays in treatment are common during which patients often remain bed-based., Nurses are generally expected to mobilize patients who are not seen by therapists; however, nurses confront formidable competing demands, and even ambulatory older patients spend the majority of their time in bed. These delays and omissions can be catastrophic. Up to 63% of older patients rapidly lose muscle mass, and decline in their mobility and capacity for self-care during even brief hospitalizations.,, The expectation that they will regain this lost function is frequently not met, leading to institutionalizations and increased caregiver demands. Such outcomes are often avoidable because rehabilitation has been clearly shown to reduce care utilization, hospital lengths of stay, and PAC use in chronic diseases ranging from heart failure to cancer.20, 21, 22 An absence of a data-driven, standardized means to determine patients’ rehabilitative needs is a critical barrier to preserving their function. A new, more effective model is needed; however, increasing the demands on oversubscribed nurses and/or boosting therapist staffing are unlikely to be effective or scalable solutions. The use of mobility technicians or personal care assistants to implement simple but effective mobility preservation care plans may offer promise. However, this approach requires a systematic and accurate means of matching of patients’ needs with ability-matched care plans. Historically, we have relied on clinician assessment as the sole basis for such matching, even though patient-reported information has shown value as a means of distinguishing inpatient care needs. Human resource-intensive triage has proven a damaging bottleneck to timely service provision. This limitation has proven particularly pernicious for patients admitted with medical diagnoses because they are often frail with multiple comorbid conditions and are uniquely vulnerable to the disabling effects of even brief immobility. The use of routinized functional measurements has been shown to be a feasible and scalable means of identifying patients’ rehabilitation service needs. Specifically, the 6-clicks short forms (SFs) have shown promise as a timely means of identifying hospitalized patients who require therapy. Although originally developed as a patient-reported outcome measure (PROM) for use in PAC, the 6-clicks is principally administered by nurses or therapists as a clinician-rated measure in acute care settings. When used in this manner among populations with orthopedic and neurologic conditions, 6-click scores associate with hospital discharge location and have proven useful for allocating therapy resource and discharge planning.26, 27, 28 The 6-clicks has been less studied among patients hospitalized with medical diagnoses. In contrast, functional items from the Braden Scale for predicting pressure ulcer risk have been shown to be strongly associated with discharge destination in medical populations., However, similar to the 6-clicks, the Braden Scale is provider administered with associated burdens and barriers. Moreover, neither tool has been scrutinized as a means of monitoring functional change over time. The R01-funded Computerized Adaptive Testing to Improve Delivery of Function-Directed Care project was designed to address the need for an easy-to-use, low burden, functional assessment tool with high discrimination applicable for patients hospitalized with medical diagnoses. In brief, the project sought to (1) develop a patient-reported, multidomain functional assessment tool focused on medically ill patients in acute care settings; (2) rigorously characterize the measure's psychometric performance; and (3) establish clinically actionable score strata for functional domains that would link directly to easily implemented mobility preservation plans irrespective of a patient's status. This article provides a high-level overview of the multistep process that our team pursued to realize these goals. Described below is the approach we used to develop the Functional Assessment in Acute Care Multidimensional Computer Adaptive Test (FAMCAT), a multidimensional item response theory (MIRT)-based measure of key functional domains among hospitalized patients admitted to medical services.

Approach

Overview, setting, and population

The FAMCAT was conceptualized as a means to guide patients, their caregivers, and inpatient and primary care providers in a continuing program of needs-matched function-directed activities during and after a hospital stay (fig 1). Development of the FAMCAT was conceived as a multistep process including the steps illustrated in figure 2: (1) expand and refine existing item banks to optimize salience for hospitalized patients; (2) administer candidate items to patients in the calibration cohort; (3) estimate MIRT models, calibrate item banks, and evaluate potential multidimensional computerized adaptive testing (MCAT) enhancements; (4) parameterize FAMCAT; (5) administer the FAMCAT to patients in validation cohort; and (6) estimate FAMCAT predictive and psychometric characteristics. Because the provision of rehabilitation services is more frequently inconsistent, delayed, and/or absent among patients on medical services or readmitted to surgical services, these subgroups comprised the FAMCAT target population. All research activities were conducted within the Mayo Clinic hospitals, Rochester, Minnesota, and were approved by the Mayo Clinic Institutional Review Board.

Fig 1

Anticipated integration of FAMCAT testing during and following a typical hospital stay.

Fig 2

Sequential steps in FAMCAT development and testing.

Anticipated integration of FAMCAT testing during and following a typical hospital stay. Sequential steps in FAMCAT development and testing.

Justification for defining project characteristics

Rationale for using the extant Activity Measure of Post-Acute Care banks

Rather than develop all items de novo, we elected to use the Activity Measure of Post-Acute Care (AM-PAC) item banks as a starting point. The item response theory (IRT)-modeled AM-PAC was the first multidomain functional PROM with the capability to direct care. Its 3 domains, Mobility (131 items), Daily Activities (88 items), and Applied Cognitive (50 items), were established through factor, modified parallel, and Rasch analysis and encompass the dimensions of function essential for independence using data collected from patients in PAC settings., One-third of the 1041-patient cohort used to initially calibrate the AM-PAC item banks had complex or chronic medical conditions. Additionally, the AM-PAC banks were developed to align with the domains of the World Health Organization's International Classification of Functioning, Disability, and Health and therefore conform to a widely accepted conceptual framework. Moreover, extensive work has established the enhanced precision, reduced ceiling/floor effects and lessened respondent burden achieved when the AM-PAC domains were administered using a computerized adaptive testing (CAT) platform., Importantly, the AM-PAC CAT's responsiveness in longitudinal monitoring of symptomatic and chronically ill patients has already been explored by members of our team.

Rationale for MCAT

A key project goal was to render repeated comprehensive yet precise functional assessments feasible within busy hospital settings where only a limited number of items can be administered. MIRT and MCAT allow for the simultaneous, and hence more efficient, estimation of correlated traits.37, 38, 39 Because the 3 AM-PAC domains are moderately correlated, MCAT administration offered a potential means of reducing the number of items required to achieve sufficient precision to inform clinical care. MIRT models can be used to specify MCAT algorithms for item selection, although there is limited precedent for this approach in the medical field. Although MIRT concepts have been available for many years,, only recently have computing capacity and estimation algorithms reached a level to permit realistic implementation.,,43, 44, 45 The administration of MIRT-modeled item banks with an MCAT platform offers an opportunity to further enhance measurement efficiency because the MCAT, rather than selecting an item for a single scale at each stage of a CAT, selects an item that simultaneously provides the most information about the examinee's levels on all functional ability domains being assessed., As such, an MCAT rapidly yields more precise score estimates with less respondent burden than would a series of unidimensional CATs.

Rationale for enhancements to the MCAT

Implementation of MCAT alone was expected to result in significant gains in measurement precision and efficiency; however, additional enhancements to the MCAT algorithm were adopted as means of achieving even greater improvement. The project considered 3 enhancements for which there was a strong theoretical and anecdotal foundation. First, we proposed a novel strategy to address differential item functioning (DIF) that can occur when the probability of item responses varies across groups defined by age, education, ethnicity, and so on. This means that on average, individuals from different subgroups but with the same level of functional ability may answer the item differently. Put another way, reporting difficulty with walking should depend only on the level of ability or disability in mobility and not on membership in a group, for example, male or female. Identifying the presence and magnitude of DIF in clinically integrated PROMs is essential to eliminating bias and addressing health care disparities. A customary approach is to eliminate items that display DIF. However, highly discriminating items may be lost in this way and, potentially, a different bias introduced—an inability to estimate traits with equal precision across subgroups. CAT offers an alternate approach that we termed DIF-CAT. In DIF-CAT, DIF information is incorporated into the MCAT item selection algorithm such that subgroup specific item parameters can be used for items that display DIF, provided that the MCAT was informed of a patients’ subgroup membership before starting the test. DIF information was also used to lower the exposure rate (the frequency with which items are administered in a CAT) of items that displayed DIF. Second, we proposed to leverage collateral test-taking information to enhance MCAT efficiency. More specifically, the amount of time that test takers require to respond to an item may provide information that can accelerate trait estimation. We hypothesized that longer response times may correlate with lower Applied Cognitive function estimates and that these data could be included in MIRT models to enhance precision. Hierarchical approaches model participants’ responses and response times simultaneously. These models have been used in academic assessments to identify cheating behaviors and to reduce the number of items required for trait estimation. Because AM-PAC response times vary substantially, it is reasonable to test these models as means to enhance the efficiency of MIRT trait estimation., Last, we proposed to determine whether an Adaptive Measurement of Change (AMC), approach could reduce the number of items administered on repeat assessments. In AMC, a CAT is administered at 2 (or more) time points. In practice, (1) the examinee's trait theta (θ) level from time 1 is used to begin the time 2 CAT, and (2) termination of the time 2 CAT occurs when sufficient evidence has been obtained to determine whether a statistically significant change has occurred. Thus, the AMC limits the second CAT to the minimal number of items needed to determine whether a respondent has changed from the time of the previous CAT session. This approach may substantially reduce respondent burden during repeat assessments, a highly desirable attribute in clinical assessment. The project proposed to extend the AMC procedure to polytomous scored items based on the IRT models used in MCAT and to extend the methodologies of AMC to multiple occasions of measurement to detect transitions between MCAT-defined mobility strata. This ability to detect transitions was thought to be clinically desirable because the functional status of medically ill patients may be highly dynamic, particularly after transitions to and from intensive care units, with important management implications.

Item bank enrichment

A total of 44 AM-PAC items were deleted from the AM-PAC banks for lack of relevance to hospital settings, and 101 new items were added, yielding a total of 326 items across 3 domains: Basic Mobility (111 items), Daily Activities (108 items), and Applied Cognitive (107 items). Table 1 summarizes the enrichment of the AM-PAC candidate items in an effort to enhance their salience to hospitalized patients.

Table 1

FAMCAT item bank expansion summary

Domain	Subdomain	No. of Original AM-PAC Items Retained (n)	No. of Items Added/ Modified From Extant Sources (n)	No. of Items Written De Novo By Study Team (n)	Total No. of Items In Initial Calibration Cohort (n)	No. of Linking Items (n)
Mobility	Ambulation	15	6	3	24	4
	Carrying/reaching	11	0	8	19	0
	Changing body position	9	0	2	11	0
	Maintaining body position	7	2	2	11	0
	Stair climbing	15	0	0	15	4
	Transfers	19	0	0	19	0
	Other	12	0	0	12	0
	Total for Mobility domain	88	8	15	111	8
Daily Activities	ADL	26	3	3	32	2
	Appendicular strength	14	3	5	22	1
	Dexterity	25	1	3	29	4
	IADL	13	0	2	15	1
	Reaching	8	0	2	10	0
	Total for Daily Activities domain	86	7	15	108	8
Applied Cognitive	Communication: verbal	13	8	1	22	3
	Communication: written	7	3	0	10	1
	Decision making	1	3	0	4	0
	Environmental awareness	1	0	1	2	0
	Problem solving/executive functioning	14	11	1	26	3
	Procedural memory	2	3	0	5	0
	Processing speed	1	4	1	6	0
	Social awareness	3	0	0	3	0
	Understanding instructions	4	1	6	11	0
	Working memory	5	10	3	18	1
	Total for Applied Cognitive domain	51	43	13	107	8

Abbreviations: ADL, activities of daily living; IADL, instrumental activites of daily living.

FAMCAT item bank expansion summary Abbreviations: ADL, activities of daily living; IADL, instrumental activites of daily living.

Item bank culling

To adapt the item banks’ coverage and content for hospitalized patients, panels of 8-9 clinical content experts were assembled for each AM-PAC domain. Because the AM-PAC banks were initially developed to assess patients in PAC settings, multiple items queried respondents about the degree of difficulty they experienced when performing activities with the gait aids and wheelchairs commonly used in those settings. Consensus was reached among the expert panel to remove these items because fewer patients use gait aids in the hospital, patients may not have their aids in the hospital, and inquiring about gait aids would increase the response burden.

Expansion

Subdomains were identified within each domain, and the experts assigned the retained AM-PAC items to the domain and subdomains. Some items were reassigned from their original AM-PAC domain to a different domain by the experts; for example, “How much difficulty do you currently have operating an ATM to get cash or make deposits?” was moved from Daily Activities to Applied Cognitive. The experts identified content gaps in subdomain coverage across the entire range of each trait and provided potential sources of extant items to fill the deficits. In addition to legacy instruments suggested by the expert panels, the IRT-modeled Patient-Reported Outcomes Measurement Information System (PROMIS) and Quality of Life in Neurological Disorders banks were reviewed. Items selected from these sources were edited to conform to the stem structure and response options of the AM-PAC items. Most items began with “How much DIFFICULTY do you currently have…” and presented response options “unable,” “a lot,” “a little,” and “none”; a small percentage of items began with “How much HELP from another person do you currently need...” and used response options “total,” “a lot,” “a little,” and “none.” Persistent coverage deficits were addressed by writing new items related to the limited activities that can be performed in a standard hospital room. A total of 43 de novo items were generated and tested with inpatients to confirm understanding. A series of 6 teleconferences were held throughout the item bank expansion process to allow the expert panel to reach consensus on recommendations and finalization of the item bank.

Calibration cohort enrollment

Participants (n=2341) were recruited from the Mayo Clinic Hospital and identified through a well-established electronic search tool. Minority recruitment was enhanced using the search tool to optimize demographic representation. During the 13-month initial data collection interval (May 2016-June 2017) the tool was used to identify patients admitted to inpatient medical services over the preceding 24 hours with at least 1 chronic condition. Although the study's primary focus was patients admitted to medical services, patients with complicated postoperative courses were also considered appropriate and approached if they required readmission after a hospitalization for surgery. Figure 3 outlines the flow of participants and their data through the calibration and validation cohort studies.

Fig 3

Participant flow diagram for calibration and validation cohorts. *An initial batch 1 data export was performed after 500 participants had been assessed to identify linking items. The identification of linking items prior to completing batch 1 data collection allowed a seamless transition from batch 1 to batch 2 collection because batch 2 included the linking items. Data from this initial pull were used for the MIRT models. †Responses were retained from calibration cohort members who answered at least 90% of the administered items. ‡The complete calibration cohort data set was used for the DIF analyses. These data differed in that they included the batch 1 data collected following the initial export. Patients’ electronic medical records were reviewed to determine eligibility: no requirement for ventilatory support other than continuous positive airway pressure or intermittent bilevel positive airway pressure; no use of cognitive depressant medications apart from soporifics, antipsychotics, anxiolytics, analgesics, or antidepressants; ability to respond to orally administered questions; and fluency in English adequate to respond to the items. The Mini-Cog was collected for use as a covariate in the analysis of the Applied Cognitive domain. Patients were interviewed on a single occasion immediately after providing written informed consent and Health Insurance Portability and Accountability Act of 1996 (HIPAA) authorization. Purposive sampling was used to ensure adequate representation of demographically and clinically defined subgroups spanning the entire trait range. Table 2 lists the demographic and clinical characteristics of the calibration and validation cohorts. Once the pool of potentially eligible patients was established on a given day, targeted recruitment was used to ensure that the sample maintained adequate subgroup representation for DIF analyses with the following characteristics: (1) roughly equal numbers in each age stratum (<60, 60-75, and >75 years); (2) ≥15% high school noncompletion (comparable with national levels); and (3) ≥15% with moderate to severe pain.59, 60, 61 Additionally, recruitment efforts were coordinated across hospital services (ie, cardiac, gastrointestinal, organ transplant, pulmonary, medical oncology, general internal medicine, etc) and hospital floors/buildings to ensure a clinically diverse sample. Patients admitted to neurology services or readmitted to neurosurgical services were not recruited because therapy is routinely provided to these patients, and they are consequently at a lesser risk of preventable hospital acquired disablement.

Table 2

Demographic and clinical characteristics of the FAMCAT validation and calibration cohorts

Characteristics	Validation Cohort, n=2050	Calibration Cohort, n=2024
Age (y)
mean ± SD	61.4±16.0	63.6±16.0
median (IQR)	63.0 (52.0-72.0)	66.0 (55.0-75.0)
Sex, n (%)
Female	952 (46.4)	933 (46.1)
Male	1098 (53.6)	1091 (53.9)
Charlson Comorbidity Index
Charlson
mean ± SD	1.3±1.4	1.2±1.4
median (IQR)	1.0 (0-2.0)	1.0 (0-2.0)
Charlson Severity
mean ± SD	2.3±2.6	1.8±2.4
median (IQR)	2.0 (0-3.0)	1.0 (0-3.0)
Charlson Severity and Age
mean ± SD	4.1±3.1	3.8±2.9
median (IQR)	4.0 (2.0-6.0)	3.0 (2.0-5.0)
Hospital length of stay (d)
mean ± SD	7.1±8.2	4.4±5.3
median (IQR)	5.0 (3.0-8.0)	3.0 (2.0-5.0)
Discharge location, n (%)
Home with/without home care	1822 (89.2)	1868 (93.0)
Intensive inpatient rehabilitation or skilled Nursing facility	221 (10.8)	140 (7.0)
Missing	7	16
PT consultation, n (%)
	300 (14.6)	111 (5.5)
OT consultation, n (%)
	236 (11.5)	81 (4.0)
30-d Hospital readmission, n (%)
	103 (5.3)	80 (4.5)
Missing	118	252
Admission diagnosis, CCS category, n (%)
Diseases of the blood and blood-forming organs and immune system disorders	41 (2.0)	31 (1.5)
Diseases of the circulatory system	268 (13.2)	684 (33.9)
Diseases of the digestive system	369 (18.2)	296 (14.6)
Endocrine, nutritional, and metabolic disease	64 (3.1)	86 (4.3)
Diseases of the genitourinary system	134 (6.6)	92 (4.5)
Infectious and parasitic diseases	164 (8.1)	109 (5.4)
Injury, poisoning, and certain other consequences of external causes	137 (6.7)	113 (5.6)
Mental, behavioral, and neurodevelopmental disorders	10 (0.5)	10 (0.5)
Diseases of the musculoskeletal system and connective tissue	69 (3.4)	51 (2.5)
Neoplasms	492 (24.2)	198 (9.8)
Diseases of the nervous system	26 (1.3)	22 (1.1)
Diseases of the respiratory system	148 (7.3)	160 (7.9)
Symptoms, signs, and abnormal clinical/laboratory findings	56 (2.8)	88 (4.4)
Other*	52 (2.56)	80 (4.0)

Abbreviations: CCS, chronic condition software; OT, occupational therapy; PT, physical therapy.

Other includes 5 CCS categories: diseases of the ear and mastoid process; diseases of the eye and adnexa; congenital malformations, deformations, and chromosomal abnormalities; pregnancy, childbirth, and the puerperium; and diseases of the skin and subcutaneous tissue.

Those who died or were transitioned to hospice were excluded; these statistics are calculated using the cohort data from the prediction article.

Demographic and clinical characteristics of the FAMCAT validation and calibration cohorts Abbreviations: CCS, chronic condition software; OT, occupational therapy; PT, physical therapy. Other includes 5 CCS categories: diseases of the ear and mastoid process; diseases of the eye and adnexa; congenital malformations, deformations, and chromosomal abnormalities; pregnancy, childbirth, and the puerperium; and diseases of the skin and subcutaneous tissue. Those who died or were transitioned to hospice were excluded; these statistics are calculated using the cohort data from the prediction article.

Calibration data collection

Item batching

A key goal of the FAMCAT project was to longitudinally assess patients’ risk for hospital-acquired disability due to immobility. This subpopulation was thought to be best represented among patients admitted to medical services and those readmitted for complications of surgical procedures. Given the frequently stressed, symptomatic, and ill status of these patients, answering all 326 candidate items was deemed neither humane nor practical. The items were therefore separated into 4 batches of roughly equal size. To create batches with equal domain, subdomain, and trait level representation, the IRT item information characteristics were obtained, when available, and items were positioned along each trait continuum. Four representative batches were manually created with checks to assure subdomain representation. Because it was critical that high-quality, DIF-free linking items be selected from the first batch, this batch was slightly larger, n=110. Twenty-four linking items (8 per domain) were identified in the first batch based on maximizing the information coverage along the wide range of the trait levels (ie, standardized trait levels from −3 to 3) for each domain and were included in batches 2-4. Minus the 24 linking items, which were common to all batches, the batch sizes were 86, 72, 73, and 71 items. The 4 batches were programmed into the Qualtrics survey administration and storage platform.a

Item administration

Research assistants read items from each batch to participants from the Qualtrics interface and were instructed not to interpret items or offer other guidance. Items within batches were organized into blocks according to domain. The order of blocks within batches and the order of items within blocks were randomized. Participants had the option to change their answers until the research administrator advanced to the next question. Once >500 participants responded to the items in first batch, the 24 high-performing, DIF-free linking items noted above were identified. The linking items were added to batches 2-4. A sample of n>500 was targeted for each batch, with an anticipated incompletion rate of 10%. The final number of respondents for batches 1, 2, 3, and 4 were 701, 542, 555, and 543, respectively, as outlined in Table 3. Participants in the calibration data collection had a mean age of 61.8 years, 54% were male, 96% were non-Hispanic white, and 78% had 2 or more comorbidities.

Table 3

Item bank completion by batch

Batch	No. of Items in Batch*	Patients Accrued (n)	Patients Completed All Items in the Batch, n (%)	Patients Completed at Least 1 Item, n (%)
1	110	701	481 (68.6)	698 (99.6)
2	96	542	261 (48.2)	536 (98.9)
3	96	555	291 (52.4)	547 (98.6)
4	96	543	351 (64.6)	541 (99.6)
Total		2341	1384 (59.1)	2322 (99.2)

Includes 24 linking items that are common to all batches.

Item bank completion by batch Includes 24 linking items that are common to all batches.

Electronic health record abstraction

Participants’ demographic and clinical information was electronically abstracted from the Mayo Clinic Unified Data Platform, which stores aggregated clinical and administrative data. In addition to comorbidities assigned in the 12 months prior to discharge, discharge location, 30-hospital readmission status, admission/discharge diagnosis, functional items from the Braden Scale recorded by nurses, and FIM items recorded by therapists were abstracted. The 2 functional components of the Braden Scale (ordinal assessments of the degree of physical activity and the ability to change/control body position) for predicting pressure ulcer risk are charted by nurses on every patient at least twice daily. Several items from the FIM, including those related to supine-to-sit and sit-to-stand transfers, ambulation, dressing, and toileting, were abstracted for all participants who underwent therapy evaluations during their hospitalizations.63, 64, 65, 66

MIRT modeling

We conducted an exploratory item factor analysis on each batch separately and, subsequently, on the combined data. Models with 1-, 2-, 3-, and 4-factor structures were compared. Relative model fit indices, Akaike's information criterion and the Bayesian information criterion, revealed that a 3-factor model outperformed the 1- and 2-factor models consistently, but a 4-factor model seemed to fit the data the best. The 4-factor structure suggested dividing the factor of Applied Cognitive into 2 additional factors. However, because probing the underlying factor structure of Applied Cognitive was beyond the focus of the current study, we decided to use a 3-factor IRT model. Then, we used the Expectation-Maximization algorithms implemented in flexMIRT for 3-factor multidimensional graded response model (MGRM) calibration. All item parameters were properly recovered, and their standard errors were from 0.06-0.39, with an average of 0.24.

Unidimensional and multidimensional DIF assessment

Hypotheses were generated, as per recommended best practices for DIF analyses,,70, 71, 72 on the basis of expert qualitative review regarding the likely presence and direction of DIF for all items with respect to age, race, sex, and duration of time in the hospital. In parallel with hypothesis generation, we examined dimensionality across groups as the first step in a hierarchy of invariance tests,, per the National Institutes of Health PROMIS guidelines as recommended by Reise et al., Initial DIF estimates were obtained by treating each item as a “studied” item, while using the remainder as “anchor” (DIF-free) items. We used a modified “all-other” approach that included “iterative purification.” We then used a unidimensional DIF test, the IRT-Wald statistic contained in Item Response Theory for Patient Reported Outcomes, to assess DIF in each of the item banks. Items showing DIF were excluded from the DIF-free anchor set at each iteration until no items showed DIF, and this set was used for final determination of DIF. A model was constructed with all parameters constrained to be equal across comparison groups for the anchor items and item parameters for all studied items freed to be estimated distinctly. An overall simultaneous joint test of differences in the discrimination (“a”) or severity (“b”) parameters was performed followed by step down tests for group differences in the a parameters, followed by conditional tests of the b parameters. Uniform DIF was detected when the b parameters differed and nonuniform DIF when the a parameters differed. To assess DIF magnitude and effect, noncompensatory DIF, reflecting group difference in expected item scores, was used for DIF magnitude assessment; such effect size estimation has been recommended to identify salient DIF.80, 81, 82, 83, 84, 85 Summing the expected item scores provided differences in “test” response functions, an index, of scale-level effect. Initially, we planned to perform multidimensional DIF analyses. Unidimensional IRT DIF tests were used instead because recent simulation studies by members of our team demonstrated that sample size requirements for accurate estimation of item parameters for the multidimensional model was at least 500. The sample sizes of subgroups defined by race, sex, age, and duration of hospital stay were not large enough to perform multidimensional IRT DIF testing. A recent simulation study demonstrated that modeling DIF as unidimensional may be as accurate as multidimensional models for determining effect sizes for binary data. Moreover, the initial dimensionality analyses for each domain examined supported a unidimensional approach to DIF detection within domains.

Evaluation of collateral test-taking information

We assessed the utility of using participants’ response times to items using van der Linden's hierarchical modeling framework. At the measurement model level, the MGRM was used for modeling item responses, whereas a lognormal model was used to model item response times. At the higher-order model level, patients’ 3-dimensional latent traits and the unidimensional latent speed were correlated. Moreover, because during the field testing of the items an interviewer read each item to a patient and recorded their responses and response times, the interviewer was also included as a nominal covariate in the hierarchical model. The hierarchical model was fitted to all batches of data via a concurrent calibration. Results showed that adding response time information did not affect the item parameter estimates and their standard errors significantly. However, adding response time information helped reduce the standard error of patients’ multidimensional latent trait estimates, but adding interviewer as a covariate did not result in further improvement. Hence, using the MGRM for item parameter calibration is enough, but using response time as collateral information would help improve FAMCAT efficiency, although we ultimately chose not to incorporate response times in the MCAT algorithm.

Defining clinically actionable FAMCAT score strata

There is ample precedent for using score strata from IRT-modeled PROMs to bin patients into clinically relevant and actionable categories., Figure 4 illustrates the 4 clinically actionable levels that were hypothesized for each domain as well as their definitions. Three parallel strategies were used to identify candidate cut scores to delineate clinically relevant score strata in each domain using estimates derived from the Functional Assessment for Acute Care Multidimensional (FAM) IRT models.

Fig 4

Four hypothesized levels for each FAMCAT domain that inform individualized mobility preservation plans.

Four hypothesized levels for each FAMCAT domain that inform individualized mobility preservation plans. Although 4-level stratification, as depicted in figure 4, was initially anticipated for all domains, a single cut score for the Applied Cognitive domain was eventually adopted as being more clinically actionable. This cut score was conceptualized as a means of distinguishing patients with potentially severe enough cognitive impairment that their Basic Mobility and Daily Activity scores should be acted on cautiously because of a need for greater supervision or assistance than might be suggested by their mobility and activity scores alone.

Graphic and statistical approaches to identify candidate cut scores

Ordered categorical ratings representing constructs similar to those estimated by the FAM IRT Daily Activity and Basic Mobility models were available in electronic health record (EHR) data for the 2060 calibration cohort participants. These ratings were provided by nurses as well as physical and occupational therapists. However, physical therapy and occupational therapy assessments occurred for only 15% and 12% of participants, respectively. We considered these therapist assessment data insufficient to serve as the basis for establishing cut scores and therefore relied on Braden Activity Scores (BAS) entered by nurses. The Braden Activity Score is 1 of 6 subscales that comprise the Braden Scale, which is used to predict pressure ulcers. The Braden Activity Score assesses mobility with a 4-point ordinal scale: 1 (“patient is confined to bed”); 2 (“severely limited or nonexistent ability to walk; patient cannot bear his own weight and/or must be assisted into chair or wheelchair”); 3 (“patient walks occasionally during the day but for very short distances, with or without assistance; spends majority of each shift in bed or chair”); or 4 (“patient walks outside the room at least twice a day and inside the room at least once every 2 hours during waking hours”). No participants were rated as “bed-based,” hence this category was “missing” from nurse ratings. To find cut scores along the Basic Mobility latent trait that maximized the consistency of classification decisions between FAM IRT estimates and nurse ratings, we plotted the smoothed frequency distribution of the FAM IRT Basic Mobility estimates for subgroups classified by nurse Braden ratings as depicted in figure 5. The 3 intersection points from the 3 distribution curves were considered as cut scores. On average, classifications based on the FAM IRT Basic Mobility predictions and Braden activity item agreed 69% of the time. We therefore plotted smoothed frequency distributions to establish candidate cut scores for the Daily Activity FAM IRT estimates as well, even though the Braden activity and the FAM Daily Activity items evaluate overlapping but distinct constructs. FAM IRT and Braden activity item classifications agreed 59.6% of the time for the Daily Activity domains. Mini-Cog scores were used to determine a single candidate cut score for the FAM IRT Applied Cognitive estimates. Mini-Cog and FAM IRT Applied Cognitive classifications agreed 71.8% of the time.

Fig 5

Smoothed frequency distributions of the basic mobility MIRT model estimates for subgroups classified by nurse mobility ratings.

Bookmark approach to identify candidate cut scores

We used a “bookmark” approach derived from educational settings whereby experts use a data-driven consensus process for setting standards for academic performance.94, 95, 96 A panel of experts was convened composed of 3 rehabilitation physicians, 3 occupational therapists, and 3 physical therapists, all specialized in the care of medically ill hospitalized patients. The modified Delphi technique involved 3 rounds: independent cut score designation, feedback and summary of the independent cut score, and then finalization of the cut score with consensus.

Subgroup analyses to identify candidate cut scores

Among the subgroup of participants dismissed from the hospital within 48 hours of testing, mean FAM IRT domain score differences were compared between patients who went to inpatient facilities, home with rehabilitation services, or home without services.

Consensus

Final FAM IRT domain cut scores were established by a second expert consensus process. A panel that was distinct from participants in the bookmark approach, described above, and composed of 3 occupational therapists, 3 physical therapists, 3 physicians, and 3 nurses reviewed item maps with each of the candidate cut points established using the methods outlined above. The final consensus process considered the “bookmark”-derived cut points, those established through the graphic/statistical approach, as well as the effect of collapsing and/or subdividing stages. A modified Delphi process was used to determine the final cut points.

FAMCAT algorithm development and testing

We developed MCAT algorithms for item selection during FAMCAT test sessions. These algorithms were evaluated through a series of Monte Carlo simulations for implementation in the FAMCAT. Within the context of FAMCAT, different methods for selecting the next item to be administered were compared in Monte Carlo simulations to determine the most effective method for use in the FAMCAT. Java programmers programmed the final FAMCAT algorithms. Proper functioning of the FAMCAT algorithm, user interface, and storage and reporting aspects of the software were validated followed by FAMCAT beta testing and final software adjustments prior to FAMCAT release for data collection from the validation cohort.

FAMCAT validation and psychometric assessment

The final FAMCAT algorithm was specified, tested, and programmed into the FastTest administration platform.b Convergent and predictive validity as well as the presence and magnitude of proxy and mode effects were comprehensively assessed using data collected from a validation cohort composed of 2154 hospitalized patients who contributed a total of 2887 assessments, as outlined in figure 3.

Study designs

The predictive validity study used a prospective design to estimate correlations between participants’ FAMCAT scores and downstream events: (1) discharge to home or PAC and (2) 30-day readmission, which were electronically abstracted from the EHR. The convergent validity study used a cross-sectional design to estimate correlations between FAMCAT scores and patient- and clinician-rated functional outcomes collected concurrently with the FAMCAT. A mode study used a randomized cross-sectional design to assess the presence and magnitude of mode effects between interview and tablet-based FAMCAT administration. Last, a proxy study used a cross-sectional design to estimate the presence and magnitude of proxy effects when the FAMCAT was administered using tablets to 295 patient-proxy dyads.

Participants

Patients

The recruitment strategies used for the FAMCAT validation and psychometric assessments of the validation cohort (n=2154) were identical for all studies and were similar to those used for the calibration phase. However, the Mayo Clinic transitioned to the Epic EHR in the interval between the enrollment of the calibration and validation cohorts. Therefore, for the validation cohort an Epic Report,c rather than the search of the administrative Mayo Clinic Unified Data Platform, was run daily to identify potentially eligible patients. EHR problem lists were reviewed to remove patients with combative behavior, active drug and/or alcohol withdrawal, and advanced dementia. Potential participants’ nurses were queried regarding additional eligibility criteria: English fluency, sufficient auditory acuity to respond to the items, and no receipt of sedation within the past 6 hours. Once a patient's nurse cleared them for participation, the study was described to the patient. Receptive patients provided informed consent and signed a HIPAA authorization form. The majority of participants provided data at only 1 time point; however, given our aim of estimating the AMC, patients were administered the MCAT up to 4 times during their hospital admission. If patients were readmitted to the hospital, they were eligible to participate in additional FAMCAT sessions. Of 2887 FAMCAT patient assessments, 885 were follow-ups.

Clinicians

We collected data to assess the FAMCAT's convergent validity from nurses and physical therapists. Nurses caring for participants provided informed consent on 1 occasion. Because no personal health information was collected from nurses, they were not required to sign HIPAA authorizations. Because therapists provided data in the EHR in the course of delivering routine clinical care, they did not provide informed consent.

Proxies

To be eligible to participate in the proxy study, proxies were required to have resided with the patient for a minimum of 1 week and to have last resided with them no more than 2 days prior to admission. Proxies provided oral informed consent; however, because personal health information was not collected from them, they were not required to provide HIPAA authorization.

Data collection procedures

Data were collected by research coordinators in the participants’ hospital rooms between 7 am and 5 pm on weekdays. Two approaches were used. For FAMCAT sessions administered by interview, items were read to participants who communicated their responses orally. Alternatively, for sessions administered via tablet, iPads were used for the FAMCAT items.d PROMIS Physical Function (PF) items were administered orally to all participants. Participants were given as much time as they needed for tablet sessions. Irrespective of administration mode and similar to data collection during the validation study, research coordinators were instructed not to interpret or explain the items during testing sessions. For all sessions, the FAMCAT was first administered to participants followed by the PROMIS PF SF items. Data were collected from patients’ nurses for use in the convergent validity study either immediately prior to or after participants’ FAMCAT sessions. Nurses were not present in participants’ rooms during FAMCAT administration. Person-reported outcomes, in addition to the FAMCAT, and data automatically abstracted from the EHR for each psychometric study were as follows:

Measures

6-clicks

This AM-PAC SF instrument has 6 questions evaluating a person's need for assistance in completing distinct functional mobility activities., Based on clinician judgment, each question is scored on a 4-point ordinal scale, where a score of 1 indicates that the person is unable to complete the task and 4 indicates that the person is independent in completing that activity.

Johns Hopkins–Highest Level of Mobility

The Johns Hopkins–Highest Level of Mobility evaluates general mobility over a fixed observation period. Scoring is based on a person's observed activity as a 1-item scale with 8 ordinal response options: 1=only lying, 2=bed activities, 3=sitting at edge of bed, 4=transferring to chair, 5=standing for ≥1 minute, 6=walking ≥10 steps, 7=walking approximately ≥7.5 m (≥25 ft), and 8=walking approximately ≥75 m (≥250 ft).

Braden Activity Score

The Braden Activity Score was described previously.

Eight-item PROMIS PF SF

The PROMIS PF validated 8-item SF assesses mobility and daily living activities.101, 102, 103 Because PROMIS items are not scored as sums but rather on a standardized T score metric using IRT, scores obtained from different item subsets are readily comparable.

Analyses

Predictive validity study

To assess the FAMCAT's predictive validity, associations between FAMCAT scores and participants’ discharge locations were estimated: home, home with rehabilitation services, skilled nursing facility, inpatient rehabilitation facility, and long-term acute care hospital. In addition, 30-day hospital readmissions were ascertained. We compared the FAMCAT's capacity to predict discharge location with the 6-clicks and PROMIS PF SF.

Convergent validity study

We characterized the FAMCAT's convergent validity by estimating correlations of FAMCAT scores with clinician-rated Johns Hopkins–Highest Level of Mobility, 6-clicks, and Braden Activity Score and self-rated PROMIS PF SF functional outcomes.

Mode study

A 3-way multivariate analysis of variance was performed to determine whether test mode as well as the patients’ sex and age were associated with at least1 of the 3 latent traits: Applied Cognitive, Daily Activity, and Mobility.

Proxy study

To determine if FAMCAT scores (ie, θ estimates) from the proxies were significantly different from those obtained from the patients, a repeated measures multivariate analysis of variance was conducted with the independent variables being the sex and the age of the patient, as well as the patient vs proxy variable. Additional analyses directly compared each patient's ratings with those of their proxies.

Discussion

The FAMCAT was developed to permit the efficient, low-burden, and precise functional assessment of patients admitted to medical services or readmitted to surgical services for postoperative complications. FAMCAT development was guided by the need to balance 3 key requirements: (1) efficiency, to permit integration into busy clinical work flows; (2) absence of clinical burden because oversubscribed clinicians have proven to be limited in their ability to consistently record high-quality function data; and (3) precision, for the timely, accurate individualization of patients’ mobility preservation plans. We endeavored to further optimize the FAMCAT's efficiency by applying cutting edge methods that have gained traction in academic assessment but have yet to be used in clinical contexts, namely the use of collateral test-taking information (response times) and the AMC. Greater reliance on PROM-based functional assessment among hospitalized patients offers several significant advantages. Principal among these is the capability of performing frequent reassessments without burdening clinicians. Such frequency is critical to detect the rapid changes that often mark the functional status of patients in acute care. These individuals frequently transfer in and out of intensive care units; experience abrupt restoration of homeostasis; and/or respond to treatments that eliminate ischemia, infection, and inflammation. Moribund patients incapable of independent mobility may be transformed in a matter of days. Such changes have clear and immediate implications for patients’ mobility requirements and precautions as well as their PAC needs. The means to detect clinically actionable changes in a precise and timely manner currently eludes the capabilities of most health care systems. Without the development and implementation of better inpatient assessment systems, function-directed care will remain a haphazard iteration of a more effective, needs-matched future state.

Conclusions

The effort to develop the FAMCAT used both novel and established methods to address the long-standing need for a way to obtain frequent, structured, sensitive, and accurate functional assessments of hospitalized patients without increasing clinician workloads. Whether or not this instrument can achieve its goal is currently under assessment with the first assessments of these efforts scheduled to appear in a 2021 supplement of the Archives of Physical Medicine and Rehabilitation.

Suppliers

Qualtrics; Qualtrics International. FastTest; ASC. Epic Report; Epic Systems Corporation. iPad; Apple Inc.

67 in total

1. Improving measurement precision of test batteries using multidimensional item response models.

Authors: Wen-Chung Wang; Po-Hsi Chen; Ying-Yao Cheng
Journal: Psychol Methods Date: 2004-03

2. Can hospitalization-associated disability be prevented?

Authors: Walter H Ettinger
Journal: JAMA Date: 2011-10-26 Impact factor: 56.272

3. Future of outcomes measurement: impact on research in medical rehabilitation and neurologic populations.

Authors: Louis A Quatrano; Theresa H Cruz
Journal: Arch Phys Med Rehabil Date: 2011-10 Impact factor: 3.966

4. U.S. General Population Estimate for "Excellent" to "Poor" Self-Rated Health Item.

Authors: Ron D Hays; Karen L Spritzer; William W Thompson; David Cella
Journal: J Gen Intern Med Date: 2015-04-02 Impact factor: 5.128

5. Analysis of differential item functioning in the depression item bank from the Patient Reported Outcome Measurement Information System (PROMIS): An item response theory approach.

Authors: Jeanne A Teresi; Katja Ocepek-Welikson; Marjorie Kleinman; Joseph P Eimicke; Paul K Crane; Richard N Jones; Jin-Shei Lai; Seung W Choi; Ron D Hays; Bryce B Reeve; Steven P Reise; Paul A Pilkonis; David Cella
Journal: Psychol Sci Q Date: 2009

6. Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS).

Authors: M Rose; J B Bjorner; J Becker; J F Fries; J E Ware
Journal: J Clin Epidemiol Date: 2008-01 Impact factor: 6.437

Review 7. The effect of exercise on outcomes for older acute medical inpatients compared with control or alternative treatments: a systematic review of randomized controlled trials.

Authors: Natalie A de Morton; Jennifer L Keating; Kim Jeffs
Journal: Clin Rehabil Date: 2007-01 Impact factor: 3.477

8. Low mobility during hospitalization and functional decline in older adults.

Authors: Anna Zisberg; Efrat Shadmi; Gary Sinoff; Nurit Gur-Yaish; Einav Srulovici; Hanna Admi
Journal: J Am Geriatr Soc Date: 2011-02 Impact factor: 5.562

9. Loss of independence in activities of daily living in older adults hospitalized with medical illnesses: increased vulnerability with age.

Authors: Kenneth E Covinsky; Robert M Palmer; Richard H Fortinsky; Steven R Counsell; Anita L Stewart; Denise Kresevic; Christopher J Burant; C Seth Landefeld
Journal: J Am Geriatr Soc Date: 2003-04 Impact factor: 5.562

10. Sample Size Requirements for Estimation of Item Parameters in the Multidimensional Graded Response Model.

Authors: Shengyu Jiang; Chun Wang; David J Weiss
Journal: Front Psychol Date: 2016-02-09

6 in total

1. Multidimensional Computerized Adaptive Testing: A Potential Path Toward the Efficient and Precise Assessment of Applied Cognition, Daily Activity, and Mobility for Hospitalized Patients.

Authors: Chun Wang; David J Weiss; Shiyang Su; King Yiu Suen; Jeffrey Basford; Andrea L Cheville
Journal: Arch Phys Med Rehabil Date: 2022-01-25 Impact factor: 4.060

2. Correlation and Crosswalks Between Patient-Reported Functional Outcomes and PROMIS Physical Function Among Medically Ill Patients.

Authors: Elizabeth Marfeo; Pengsheng Ni; Chun Wang; David Weiss; Andrea L Cheville
Journal: Arch Phys Med Rehabil Date: 2021-12-26 Impact factor: 4.060

3. Ability of the Functional Assessment in Acute Care Multidimensional Computerized Adaptive Test (FAMCAT) to Predict Discharge to Institutional Postacute Care.

Authors: Tamra Keeney; David J Weiss; Pengsheng Ni; Chun Wang; Andrea Cheville
Journal: Arch Phys Med Rehabil Date: 2021-10-17 Impact factor: 4.060

4. Can Proxy Ratings Supplement Patient Report to Assess Functional Domains Among Hospitalized Patients?

Authors: David J Weiss; Chun Wang; King Yiu Suen; Jeffrey Basford; Andrea Cheville
Journal: Arch Phys Med Rehabil Date: 2021-10-20 Impact factor: 4.060

5. Does the Mode of PROM Administration Affect the Responses of Hospitalized Patients?

Authors: David J Weiss; Chun Wang; Jeffrey R Basford; King Yiu Suen; Isabella M Alvarado; Andrea Cheville
Journal: Arch Phys Med Rehabil Date: 2021-10-01 Impact factor: 4.060

6. Adaptive Measurement of Change: A Novel Method to Reduce Respondent Burden and Detect Significant Individual-Level Change in Patient-Reported Outcome Measures.

Authors: David J Weiss; Chun Wang; Andrea L Cheville; Jeffrey R Basford; Joseph DeWeese
Journal: Arch Phys Med Rehabil Date: 2021-10-01 Impact factor: 4.060

6 in total