Literature DB >> 34520584

Comparison and optimization of various moving patient-based real-time quality control procedures for serum sodium.

Yuanyuan Li^1,2,3, Qian Yu^1,2,3, Xiaoyan Zhang^1,2,3, Xiaoling Chen^1,2,3.

Abstract

BACKGROUND: Patient-based real-time quality control (PBRTQC) is a valuable tool for monitoring the performance of testing processes. We aimed to compare and optimize various PBRTQC procedures for serum sodium.
METHODS: In a computer simulation, artificial errors were added to 680,000 real patients' results. The characteristics of error detection of various algorithms-moving average, moving median, moving SD and moving proportion of normal results including different control limits (CLs)-were assessed on their ability to detect critical errors early.
RESULTS: The moving average and moving median were sensitive to system error, and the moving SD tended to detect random error. P3SD (moving proportion of normal results, CLs based on mean and SD of proportion of normal results) demonstrated excellent performance for both system error and random error. The increase of block sizes (N) leads to the delay of error detection and the decrease of false rejection, except for QC procedures with minimum and maximum as CLs. CLs calculation with "0.1% false alarm rate" had more effective performance than that set false alarm to zero (minimum and maximum as CLs). The impact of truncation on QC performance depended on truncation limits, algorithms and the types of error. The significant improvement in QC performance due to truncation was only found in moving SD.
CONCLUSION: "P3SD ,N = 50, without truncation" and "moving SD, N = 25, set 0.1% false alarm as CLs and set 1% outliers exclusion as truncation limits" were recommended as the optimized procedures for serum sodium to monitor system error and random error, respectively.

Entities: Chemical

Keywords: moving average; moving median; moving proportion of normal results; moving standard deviation; patient-based real-time quality control

Mesh：

Substances：
Sodium

Year: 2021 PMID： 34520584 PMCID： PMC8529142 DOI： 10.1002/jcla.23985

Source DB: PubMed Journal: J Clin Lab Anal ISSN： 0887-8013 Impact factor: 2.352

INTRODUCTION

Patient‐based real‐time quality control (PBRTQC) is a useful tool for monitoring analytic performance in clinical laboratories. It is also an important application of “big data” in laboratory quality management. Compared with traditional methods of internal quality control (QC), PBRTQC has several advantages: e.g., low cost, no matrix effect, continuous monitoring and pre‐analytical monitoring. , , The concept of moving average QC was first published by Hoffmann and Waid in 1965. Since then, benefiting from the development of laboratory information systems, improved statistical methodology and increased awareness of the limitations of current QC, , , PBRTQC has attracted substantial attention and developed quickly. , , Recently, novel algorithms have been described, such as moving median, moving standard deviation (SD), moving average of delta and moving sum of outliers. , Various parameters and charts , have been developed to quantify their ability to detect error. However, it was still a challenge for the majority of routine clinical laboratories to implement PBRTQC because of the complexity of obtaining optimal PBRTQC settings. , , It was well known that the probability for “error” detection of traditional individualized QC should not be less than 90%. In fact, the “error” involved here refers to “critical error”. The critical error represents the minimum error that should be detected by a QC procedure, or else it will affect clinical practice. Similarly, the optimized PBRTQC procedures should also have the best ability to detect critical error. Nevertheless, rare articles have examined the relationship between critical error and the characteristics of error detection. Furthermore, novel algorithms, such as a moving median and moving standard deviation, have been shown to be superior for error detection under certain conditions. However, most of these algorithms have been studied independently by different research groups, scarcely any articles have compared them using the same database. In the actual application, it seemed to be more feasible that PBRTQC start with several typical tests and then be extended to all tests. Serum sodium is probably the most suitable chemistry test for PBRTQC because of its small biological variation and high requirement for analytical performance. , Therefore, we aimed to investigate and compare the characteristics of error detection of various algorithms, including their different definitions of control limits for serum sodium. Both system and random error were examined, and the relationships between critical error and characteristics of error detection were described in detail to optimize PBRTQC settings.

MATERIALS AND METHODS

Patients’ data collection and errors simulation

A total of 680,000 results of serum sodium were anonymized and exported from the laboratory information system of the First Affiliated Hospital, College of Medicine, Zhejiang University, including inpatient, outpatient and physical examination population. All results were sorted by detection time and divided into 400 virtual days with 1,700 measurements each. The last 200 days served as training dataset and the first 200 days as testing dataset. All optimization of the procedures was conducted on the training dataset, and all verifications of procedure performance were conducted on the testing dataset. The robust normalized spread (RNS) was calculated on all unaltered patient measurements: RNS = interquartile range/median. RNS represents the dispersion degree of original data distribution. Westgard JO et al. assessed the average of normal (AON) patient data algorithms to maximize run lengths for automatic process control. We used the similar method to simulate errors. The CVa (analytical CV), which represents analytical inherent precision, was defined as 1/3 of the allowable total error (TEa); 1/3TEa is the minimum requirement for analytical imprecision. The TEa of sodium, which was obtained from the specification in the Analytical Quality Specification for Routine Analytics in Clinical Chemistry (WS/T 403–2012), was defined as 4%. The system error (SE) was simulated as multiples of CVa (CVa = 1/3TEa) by changing the mean of patients’ data. The SE ranged from 0 to 4.0 CVa (0 ~ 4/3TEa), and both positive and negative errors were added. The random error (RE) was simulated as multiples of CVa by changing the SD of the patients’ data from 1.0 to 5.0 CVa. The artificial error for each day was introduced from the 201st result onwards and sustained for the remaining results.

Parameters of QC procedures

A whole QC procedure consists of four parts: algorithms, quality control limits (CLs), truncation limits (TLs) and block size (N). Table 1 lists the QC procedures investigated in this article.

TABLE 1

Quality control (QC) procedures investigated in this article

QC procedures	Algorithms and QC limits (CLs)		Block size (N)	Truncation (T_n)
A_mm,N = i, T_n	Moving average ( A )	CLs were the minimum and maximum values observed after running a calculation algorithm on the dataset without extra error. ( mm )	Every N consecutive patients’ results were included to calculate a QC data. N=25, 50, 75, 100, 125 and 150.	T₀, T_1% and T_5% were set to exclude the outer 0, 1 and 5% of all results, respectively.
M_mm,N = i, T_n	Moving median ( M )
S_mm,N = i, T_n	Moving SD ( S )
P_mm,N = i, T_n	Moving proportion of normal results ( P )
A_0.1%,N = i, T_n	Moving average ( A )	The 99.95^th percentile of QC data without extra error was defined as the upper CLs and 0.05^th as the lower CLs. ( 0.1% )
M_0.1%,N = i, T_n	Moving median ( M )
S_0.1%,N = i, T_n	Moving SD ( S )
P_0.1%,N = i, T_n	Moving proportion of normal results ( P )
A_RCV,N = i, T_n	Moving average ( A ) CLs = mean ± RCV RCV=2×Z×CVa2+CVi2 Z = 1.96. The CV_a represents analytical precision. CV_i denotes the biological variation within subjects. ( RCV )
A_3.09,N = i, T_n	Moving average ( A ) CLs=mean±3.09SDp/N SD_p is the SD of the patients’ data. ( 3.09 )
S_C4,N = i, T_n	Moving SD ( S ) CLs=SD¯±3SD¯C4(1‐C42) SD¯ is the mean of the moving SD for an in‐control period. C₄ is an unbiased constant related to block size. ( C4 )
P_3SD,N = i, T_n	Moving proportion of normal results ( P ) CLs = mean_proportion ± 3 × SD_proportion. The mean_proportion is the average proportion of normal results. SD_proportion is the square root of the variation for the proportion of normal results. ( 3SD )

A whole quality control procedure (first column) consists of four parts: algorithms, quality control limits (CLs), block size and truncation limits. The italics and boldface in parentheses in the middle columns represent abbreviations.

Quality control (QC) procedures investigated in this article Moving average ( ) CLs = mean ± RCV Z = 1.96. The CVa represents analytical precision. CVi denotes the biological variation within subjects. ( ) Moving average ( ) SDp is the SD of the patients’ data. ( ) Moving SD ( ) is the mean of the moving SD for an in‐control period. C4 is an unbiased constant related to block size. ( ) Moving proportion of normal results ( ) CLs = meanproportion ± 3 × SDproportion. The meanproportion is the average proportion of normal results. SDproportion is the square root of the variation for the proportion of normal results. ( ) A whole quality control procedure (first column) consists of four parts: algorithms, quality control limits (CLs), block size and truncation limits. The italics and boldface in parentheses in the middle columns represent abbreviations. For algorithms, the moving average (A), moving median (M), moving SD (S) and the moving proportion of normal results (P) , were calculated as the QC data for different error conditions and block sizes. Several defining methods of CLs were also investigated. There were two universal methods. The one was the minimum and maximum values (mm) observed after running a calculation algorithm on the dataset without extra SE or RE. So false alarms were set to zero for these procedures. The QC procedures for this method were expressed as Amm, Mmm, Smm and Pmm for moving average, moving median, moving SD and the moving proportion of normal results, respectively. For the other method, the 99.95th percentile of QC data without extra error was defined as the upper CLs and 0.05th as the lower CLs. So false alarms were set to 0.1%. They were expressed as A0.1%, M0.1%, S0.1% and P0.1%. Another two defining methods of CLs were also investigated for the moving average method (A). One was determined by calculating the reference change value (RCV) using the formula and CLs=mean±RCV, where Z=1.96 for the 2SD change in a 2‐tailed distribution ; CVa (analytical CV) represents analytical inherent precision; CVi (intraindividual CV) denoted the biological variation within subjects. , This was expressed as ARCV. The other one, which was related to block size (N), was . SDp was the SD of patients’ data. This was expressed as A3.09. The CLs for moving SD (S) were also defined with the following formula: . is the mean of the moving SD for an in‐control period. C4 is an unbiased constant related to block size, which can be obtained from the “GAMMA” function in Microsoft Excel. This was expressed as SC4. The CLs for the moving proportion of normal results (P) was also defined with the following formula: CLs = meanproportion ± 3 × SDproportion. Here, the meanproportion is the average proportion of normal results. The SDproportion is the square root of the variation for the proportion of normal results. It was expressed as P3SD. To minimize the influence of outlying values in the data, truncation was usually implemented for PBRTQC protocols. According to the conclusion by Bietenbeck et al., we selected “Winsorization” method, which replaces outlying values with the corresponding lower or upper truncation limits that was exceeded. For example, if the truncation limits were 134 ~ 148 mmol/L, the results greater than 148 mmol/L were replaced with 148 mmol/L instead of being eliminated directly. Three types of truncation limits (TLs), T0, T1% and T5%, were investigated. T0 meant all the data were included to QC procedures, and no TLs for serum sodium. The TLs of T1% and T5% were based on the mean and SD of patients’ data (SDp). TLs of T1% was TLs = mean ± 3 × SDp, and that of T5% was TLs = mean ± 2 × SDp. T0, T1% and T5% were set to exclude the outer 0, 1 and 5% of all measurements, respectively. We investigated QC procedures using batch sizes of 25, 50, 75, 100, 125 and 150 consecutive test results as the calculation method.

Performance of QC procedures

The number of patient samples necessary for error detection was counted after introducing extra error (NPed). Then median number of patient results affected before error detection (MNPed) for an increased analytical imprecision or bias was calculated. The MNPed reflected the median number of patient samples processed from the inception of an out‐of‐control error condition until it was detected. In addition, median number of patient samples between QC rejections when the process was in control (MNPfr) was calculated too. The MNPfr was the median number of patient samples between two false rejections. An ideal QC procedure was expected to detect error quickly and lead to rare false rejections. Thus, MNPed should be as small as possible, while MNPfr should be as large as possible. The infinite MNPeds (when the error was not detected) and MNPfrs (when false alarms were set to zero) were imputed with 1,650 (110% of the maximum value).

Optimization of QC procedures

Power function graphs were generated to compare the QC procedures by plotting errors (SE and RE in the form of multiples of CVa) on the x‐axis and MNPed on the y‐axis. To optimize the QC procedure, those procedures with MNPfr <1,500, which would lead to a high false rejection rate and increase QC cost, were excluded. As a false rejection is considered as a defective incident, the defective incidents per million was about 667, corresponding to a 4.75 Sigma for MNPfr = 1,500. The ability of a QC procedure to detect critical SE and RE got particular attention. The critical system error (SEc) was calculated as follows: SEc = (TEa‐bias)/CVa‐1.65. In this formula, 1.65 is a z‐value that sets the maximum defect rate at 5% (i.e., when the mean of patient test results has shifted by an amount that causes 5% of individual patient test results to have errors exceeding the total error requirement, the run will be considered unstable). The critical random error (REc) was calculated as follows: REc = (TE a–bias)/1.65CVa. As all the results were exported from a stable system, bias was deemed to be zero. An accumulative MNPed (∑MNPed) which was the sum of MNPeds for error greater than or equal to critical error was calculated to evaluate the overall performance of QC procedure. The QC procedure (one with a minimum ∑MNPed) was selected as the optimal strategy. All the data simulations and statistics were performed with Microsoft®Office Excel 2019 and its extended functions. The general flowchart for data simulation and QC procedure optimization is shown in Figure 1.

FIGURE 1

Flowchart for data simulation and QC procedure optimization. Serum sodium results of patients were exported from the laboratory information system and sorted by time. All the results were divided into 400 virtual days with 1,700 measurements each. The last 200 days served as training dataset to optimize QC procedures and the first 200 days as testing dataset to verify conclusions. Systematic error was simulated by changing mean of patients’ data, and random error was simulated by changing SD. Various QC procedures which consist of algorithms, truncation limits, control limits and block size were assessed with two basic parameters (MNPfr and MNPed) and one advanced parameter (∑MNPed). The ability of a QC procedure to detect critical errors got particular attention. The QC procedure with minimum ∑MNPed and MNPfr≥1,500 was the optimized QC procedure

RESULTS

Data distribution

As listed in Table 2, serum sodium was nearly normally distributed with low skewness (−0.84 for training set and −0.65 for testing set) and had low RNSs (0.021 for training and 0.014 for testing set). The change of mean from training to testing set, which was the difference of means between training and testing set in relation to the mean of the training set, was −0.26%.

TABLE 2

Statistical properties for serum sodium in the training and testing datasets

Datasets	Mean	SD	Median	Max	Min	Skew	Kurtosis	IQR	RNS
Training	141.29	2.27	141	178	110	−0.84	6.36	3	0.021
Testing	140.92	2.25	141	185	108	−0.65	6.36	2	0.014
All	141.10	2.27	141	185	108	−0.74	6.24	3	0.021

Unit: mmol/L.

Abbreviations: IQR, interquartile range; RNS, robust normalized spread; SD, standard deviation.

Statistical properties for serum sodium in the training and testing datasets Unit: mmol/L. Abbreviations: IQR, interquartile range; RNS, robust normalized spread; SD, standard deviation. A3.09 and SC4 were excluded because their false rejection rate was too high (MNPfr <1,500). Besides them, P3SD,N = 25,T0 and 1% (both MNPfr = 1137) were also ruled out for the same reason. Then the remaining rules were further assessed. The ability to detect SE and RE, which was quantified with MNPed for QC procedures under various block sizes and truncations, is shown in Table S1 and S2, respectively. The moving average and moving median were sensitive to SE, and the moving SD tended to detect RE. Unexpectedly, P3SD demonstrated excellent performance for both SE and RE. Figure 2 shows the ability to detect SE, which was quantified with MNPed for QC procedures under the same block size and optimized TLs. It was demonstrated that the procedures were more capable of detecting negative SE than positive SE. This was due to a low negative skewness in original sodium distribution. P3SD was clearly superior to the others in SE detection. As a whole, the A0.1%, M0.1% and P0.1% were more competent to detect SE than Amm, Mmm and Pmm, but the difference in QC performance among these three procedures (A0.1%, M0.1% and P0.1%) was unobvious and variable. Figure 3 shows the ability to detect RE with the same block size and optimized TLs. On the whole, the ability to detect the RE of P3SD, S0.1%, Smm, P0.1% and Pmm decreased in the sequence.

FIGURE 2

FIGURE 3

Median number of patients affected until error was detected (MNPed) as a function of induced random error magnitude. The MNPed of quality control procedures for random error (RE) are shown in Figure 3A–F. The first capital letter is the quality control algorithm and the subscripts denote quality control limits (CLs). is the moving proportion of normal results with CLs = meanproportion ± 3 × SDproportion. is the moving proportion of normal results with 0.1% false rejection rate as CLs. is the moving proportion of normal results with CLs based on minimum and maximum control data without extra error. is the moving SD with 0.1% false rejection rate as CLs. is the moving SD with CLs based on minimum and maximum control data without extra error. , and were the truncation limits which were set to exclude the outer 0, 1 and 5% of all results, respectively. The procedures had the same performance for different truncations were marked with dotted lines. : the median number of patient samples processed from the start of an out‐of‐control error condition until it was detected. (the analytical CV) represents analytical inherent precision. random error. The critical RE was 1.82 CVa. The MNPed for critical RE was marked with red arrows (↑). Figure 3 shows that the ability to detect the RE of P3SD, S0.1%, Smm, P0.1% and Pmm decreased in the sequence

Median number of patients affected until error was detected (MNPed) as a function of induced systematic error magnitude. The MNPed of quality control procedures for systematic error (SE) are shown in Figure 2A–F. The first capital letter is the quality control algorithm and the subscripts denote quality control limits (CLs). is the moving average with 0.1% false rejection rate as CLs. is the moving average with CLs based on minimum and maximum control data without extra error. is the moving average with CLs based on reference change values. is the moving median with 0.1% false rejection rate as CLs. is the moving median with CLs based on minimum and maximum control data without extra error. is the moving proportion of normal results with 0.1% false rejection rate as CLs. is the moving proportion of normal results with CLs based on minimum and maximum control data without extra error. is the moving proportion of normal results with CLs = meanproportion ± 3 × SDproportion. , and were the truncation limits which were set to exclude the outer 0, 1 and 5% of all results, respectively. The procedures had the same performance for different truncations were marked with dotted lines. : the median number of patient samples processed from the start of an out‐of‐control error condition until it was detected. (the analytical CV) represents analytical inherent precision. system error. The critical SE was 1.35 CVa. The MNPed for critical SE was marked with red arrows (↑). Figure 2 shows that the procedures were more capable of detecting negative SE than positive SE. P3SD was clearly superior to the others in SE detection. As a whole, the A0.1%, M0.1% and P0.1% were more competent to detect SE than Amm, Mmm and Pmm, but the difference among themselves was unobvious and variable Median number of patients affected until error was detected (MNPed) as a function of induced random error magnitude. The MNPed of quality control procedures for random error (RE) are shown in Figure 3A–F. The first capital letter is the quality control algorithm and the subscripts denote quality control limits (CLs). is the moving proportion of normal results with CLs = meanproportion ± 3 × SDproportion. is the moving proportion of normal results with 0.1% false rejection rate as CLs. is the moving proportion of normal results with CLs based on minimum and maximum control data without extra error. is the moving SD with 0.1% false rejection rate as CLs. is the moving SD with CLs based on minimum and maximum control data without extra error. , and were the truncation limits which were set to exclude the outer 0, 1 and 5% of all results, respectively. The procedures had the same performance for different truncations were marked with dotted lines. : the median number of patient samples processed from the start of an out‐of‐control error condition until it was detected. (the analytical CV) represents analytical inherent precision. random error. The critical RE was 1.82 CVa. The MNPed for critical RE was marked with red arrows (↑). Figure 3 shows that the ability to detect the RE of P3SD, S0.1%, Smm, P0.1% and Pmm decreased in the sequence The influence of block size on QC performance was somewhat complex (Figure 4). The main trend was that both MNPed and MNPfr decreased with smaller block sizes. A reduction in block size led to quicker error detection, but it also led to a higher rate of false rejection. The typical cases were A0.1%, ARCV, P0.1% and P3SD for SE, and S0.1% and P3SD for RE. As a result, the performance curves of A0.1% for SE and S0.1% for RE descended with the decrease of block size in Figure 4. However, there were exceptions for procedures with minimum and maximum as CLs. Some QC procedures with a small block size, such as Amm(N = 25), Pmm(N = 25) were not sensitive to error<2.0 CVa. Nevertheless, their performance improved rapidly as the error increased. Compared with QC procedures with block sizes (N ≥ 50), procedures with N = 25 detected a small error (<2.0 CVa) more slowly, but detected a large error (≥2.0 CVa) more quickly. That is why the performance curves of Amm, N = 25 for SE and Smm, N = 25 for RE intersected with that of N = 75 and 125 in Figure 4.

FIGURE 4

Influence of block sizes on QC performance. is the moving average with 0.1% false rejection rate as CLs. is the moving average with CLs based on minimum and maximum control data without extra error. is the moving SD with 0.1% false rejection rate as CLs. is the moving SD with CLs based on minimum and maximum control data without extra error. : the median number of patient samples processed from the start of an out‐of‐control error condition until it was detected. (the analytical CV) represents analytical inherent precision. A0.1% and S0.1% were marked with dotted lines. Amm and Smm were marked with solid lines. (A) shows the performance of A0.1% and Amm (without truncation) for system error (SE). (B) shows the performance of S0.1% and Smm (set 5% outliers’ exclusion as truncation limits) for random error (RE). The main trend was that both MNPed and MNPfr decreased with smaller block sizes. As a result, the performance curves of A0.1% and S0.1% descended with the decrease of block size in Figure 4. However, there were exceptions for procedures with minimum and maximum as CLs. Compared with N = 75 and 125, procedures with N = 25 detected a small error (<2.0CVa) more slowly, but detected a large error (≥2.0CVa) more quickly. That is why the performance curves of Amm, N = 25 and Smm, N = 25 intersected with that of N = 75 and 125 in Figure 4 The impact of truncation on QC performance depended on TLs, QC algorithms and the types of error (Figure 5). Truncation didn't improve the QC performance of moving average and moving median which were sensitive to SE, but resulted in a slight increase of MNPed. Figure 2 which lists the optimized truncation limits for various procedures shows that T0 was the optimal TLs for most procedures of these two algorithms. In contrast, the proper TLs can significantly improve the QC performance of moving SD. Smm and S0.1% with T1% and T5% had much better performance than that without truncation (Figure 5). T1% was slight superior to T5%. Figure 3 which lists the optimized TLs for each procedure also shows that T1% was the optimal TLs for most moving SD procedures. The effect of truncation on the moving proportion of normal results depended on the TLs. If the TLs were wider than reference range, it had no impact on QC performance. If the TLs were within the reference range, its impact was fatal. For example, the TLs of T1% was 134.49 ~ 148.07 mmol/L, and the reference range was 137 ~ 147 mmol/L. There was no difference in MNPeds between moving proportion of normal results with T0 and T1% (Figures 2 and 3). Conversely, the upper truncation limit of T5% was 145.72 mmol/L which was lower than upper limit of reference range (147 mmol/L). Compared with no truncation, the ability of Pmm, P0.1% and P3SD with T5% to detect RE and positive SE decreased sharply or even lost, such as P3SD, N = 25,T5% in Figure 2A.

FIGURE 5

Influence of truncation limits on QC performance. is the moving SD with 0.1% false rejection rate as CLs. is the moving SD with CLs based on minimum and maximum control data without extra error. : the median number of patient samples processed from the start of an out‐of‐control error condition until it was detected. (the analytical CV) represents analytical inherent precision. T, T and T were set to exclude the outer 0, 1 and 5% of all results, respectively. The procedures without truncation (T0) were marked with solid lines, and those with truncations (T1% and T5%) were marked with dotted lines. The proper truncation limits can significantly improve the QC performance of moving SD. Smm and S0.1% with T1% and T5% had much better performance than that without truncation (See Figure 5). T1% was slight superior to T5%

Optimized QC procedures

The critical system error (SEc) was 1.35 CVa for serum sodium. P3SD,N=50,T0&1% detected SEc the fastest (MNPed = 258 tests for positive SEc and 33.5 tests for negative SEc). As T0 and T1% of P3SD,N = 50 had the same performance, T0 which was more convenient was selected. The selection of T0, T1% and T5% for the other procedures followed the same way. The best 10 QC procedures for SE based on ∑MNPed were as follows: P3SD,N = 50,T0,∑MNPed = 716; P3SD,N = 75,T0,∑MNPed = 924; P3SD,N = 100,T0, ∑MNPed = 1,315.5; P3SD,N = 125,T0,∑MNPed = 1,519; P3SD,N = 150,T0,∑MNPed = 1,586; A0.1%,N=25,T0,∑MNPed = 2,412; A0.1%,N = 25,T1%,∑MNPed = 2,460.5; M0.1%,N=50,T0, ∑MNPed = 2,707.5; A0.1%,N = 25,T5%,∑MNPed = 2,767; A0.1%,N = 50,T0,∑MNPed = 3,861. Similarly, the critical random error (REc) was 1.82CVa. S0.1%,N = 25,T1% detected REc the fastest (MNPed = 24 tests). The best 10 QC procedures for RE were as follows: S0.1%,N = 25,T1%,∑MNPed = 107.5; P3SD,N = 50,T0,∑MNPed = 114.5; S0.1%,N = 25,T5%, ∑MNPed = 116.5; P3SD,N = 25,T5%,∑MNPed = 132; P3SD,N = 75,T0,∑MNPed = 152.5; Smm,N = 25,T1%,∑MNPed = 169; S0.1%,N = 50,T1%,∑MNPed = 171; S0.1%,N=50,T5%, ∑MNPed = 191; P3SD,N = 100,T0,∑MNPed = 191; P3SD,N = 50,T5%,∑MNPed = 194.5; P3SD, N = 125,T0,∑MNPed = 229.5. In all, P3SD,N = 50,T0 and S0.1%,N = 25,T1% were the optimized QC procedures for serum sodium, and their detailed parameters and performance are listed in Table 3.

TABLE 3

Parameters of the optimized procedures for serum sodium

Procedures	P_3SD,N = 50,T₀	S_0.1%,N = 25,T_1%
Main function	Monitor system error	Monitor random error
Truncation limits	None	134.49 ~ 148.07 mmol/L
Algorithms	Moving proportion of normal results	Moving standard deviation
Block size	50 tests	25 tests
Control limits	88.92% ~ 100%	0.9601 ~ 3.4546 mmol/L
MNPfr	1,650 tests	1,650 tests
MNPed for critical error	33.5 tests for negative Sec 258 tests for positive SEc	24 tests for REc
∑MNPed	716 tests	107.5 tests

the median number of patient samples between two false rejections. : the median number of patient samples processed from the start of an out‐of‐control error condition until it was detected. the sum of MNPeds for error greater than or equal to critical error. critical system error. critical random error.

Parameters of the optimized procedures for serum sodium 33.5 tests for negative Sec 258 tests for positive SEc the median number of patient samples between two false rejections. : the median number of patient samples processed from the start of an out‐of‐control error condition until it was detected. the sum of MNPeds for error greater than or equal to critical error. critical system error. critical random error.

The stability of the QC performance

To evaluate the stability of the PBRTQC performance over time, MNPeds from the training and the test datasets with the same method and error were compared. Table 4 shows the SE detection performance of QC procedures in the training and test dataset. MNPeds were basically close, except for MNPeds for SE = 1.35CVa. Table 5 shows that these candidate QC procedures had highly consistent RE detection performance in the training and test dataset.

TABLE 4

Difference of system error detection between the training and testing datasets

Procedures	−4	−3.5	−3	−2.5	−2	−1.35	1.35	2	2.5	3	3.5	4
P_3SD,N = 50	6	6	7	9	15	34	258	46	20	11	7	6
P_3SD,N = 50	5	6	6	8	13.5	27	598	64	22	11	7	6
P_3SD,N = 75	7	7	9	12	20	42	334	64	24	14	9	8
P_3SD,N = 75	7	7	8	11	17	34	791	77	30	15	10	8
P_3SD,N = 100	9	9	11	15	24	54	496	80	30	17	11	10
P_3SD,N = 100	8	9	10	13	20	43	975	96	37	18	11	9
P_3SD,N = 125	10	11	13	18	29	64	566	96	37	21	14	12
P_3SD,N = 125	10	11	12	16	25	51	1189	114	42	22	14	11
P_3SD,N = 150	12	13	15	21	34	72	571	112	42	25	16	13
P_3SD,N = 150	11	12	14	18	30	59	1650	136	49	25	17	13
A_0.1%,N = 50	26	29	34	41	50	731	527	54	41	35	30	26
A_0.1%,N = 50	25	28	32	39	48	369	1650	84	44	36	31	27
M_0.1%,N = 50	24	25	27	32	43	436	749	71	36	29	27	26
M_0.1%,N = 50	24	24	26	30	41	149	1650	121	39	31	28	26
A_mm,N = 50	31	35	41	48	436	1650	1650	273	48	40	34	30
A_mm,N = 50	30	34	39	46	234	1650	1650	640	50	42	35	31
M_mm,N = 50	26	27	32	44	436	1650	1650	762	74	37	30	28
M_mm,N = 50	26	27	31	41	151	1650	1650	1650	121	39	31	28

is the moving average with 0.1% false rejection rate as CLs. is the moving average with CLs based on minimum and maximum control data without extra error. is the moving median with 0.1% false rejection rate as CLs. is the moving median with CLs based on minimum and maximum control data without extra error. is the moving proportion of normal results with CLs = meanproportion ± 3 × SDproportion. All the procedures in the table had no truncation and had the same MNPfrs (1,650 tests). The MNPeds for various system errors (from 1.35CVa to 4.0CVa) are listed in the table. The rows marked gray were results from training dataset, and the others were from testing set. MNPeds were basically close, except for MNPeds for SE = 1.35CVa.

TABLE 5

Difference of random error detection between the training and testing datasets

Procedures	MNPfr	1.82	2	2.5	3	3.5	4	4.5	5
S_0.1%,N = 25,T_5%	1650	22	19	16	14	12	12	11	11
S_0.1%,N = 25,T_5%	1650	22	19	17	14	13	12	11	11
S_0.1%,N = 25,T_1%	1650	24	20	15	12	10	10	9	8
S_0.1%,N = 25,T_1%	1650	25	21	16	12	11	9	9	8
P_3SD,N = 50,T_1%	1650	27	23	16	12	10	10	9	8
P_3SD,N = 50,T_1%	1650	27	22	16	13	10	9	9	8
P_3SD,N = 50,T₀	1650	27	23	16	12	10	10	9	8
P_3SD,N = 50,T₀	1650	27	22	16	13	10	9	9	8
P_3SD,N = 75,T₀	1650	36	31	21	17	14	13	11	10
P_3SD,N = 75,T₀	1650	36	29	21	16	13	12	11	10
S_0.1%,N = 50,T_1%	1650	38	33	24	19	16	15	14	13
S_0.1%,N = 50,T_1%	1650	39	33	24	19	16	15	14	13
P_3SD,N = 100,T₀	1650	46	39	26	21	17	16	14	13
P_3SD,N = 100,T₀	1650	46	37	25	20	16	15	14	12
P_3SD,N = 125,T₀	1650	56	47	32	25	20	18	16	16
P_3SD,N = 125,T₀	1650	52	45	30	23	19	18	16	15
S_mm,N = 25,T_1%	1650	50	32	21	17	13	13	12	11
S_mm,N = 25,T_1%	1650	53	33	21	17	14	13	12	11
S_mm,N = 50,T_1%	1650	55	44	32	25	22	19	18	17
S_mm,N = 50,T_1%	1650	55	43	33	25	22	20	18	17
P_3SD,N = 150,T₀	1650	64	55	36	28	24	21	19	18
P_3SD,N = 150,T₀	1650	61	53	36	27	23	21	19	18
S_mm,N = 25,T_5%	1650	140	76	29	23	20	19	18	17
S_mm,N = 25,T_5%	1650	133	69	30	23	21	19	18	17

is the moving proportion of normal results with CLs = meanproportion ± 3 × SDproportion. is the moving SD with 0.1% false rejection rate as CLs. is the moving SD with CLs based on minimum and maximum control data without extra error. and were set to exclude the outer 0, 1 and 5% of all results, respectively. All the procedures in the table had the same MNPfrs (1650 tests). The MNPeds for various random errors (from 1.82 CVa to 5.0 CVa) are listed in the table. The rows marked gray were results from training dataset, and the others were from testing set. These candidate QC procedures had highly consistent random error detection performance in the training and test dataset.

Difference of system error detection between the training and testing datasets is the moving average with 0.1% false rejection rate as CLs. is the moving average with CLs based on minimum and maximum control data without extra error. is the moving median with 0.1% false rejection rate as CLs. is the moving median with CLs based on minimum and maximum control data without extra error. is the moving proportion of normal results with CLs = meanproportion ± 3 × SDproportion. All the procedures in the table had no truncation and had the same MNPfrs (1,650 tests). The MNPeds for various system errors (from 1.35CVa to 4.0CVa) are listed in the table. The rows marked gray were results from training dataset, and the others were from testing set. MNPeds were basically close, except for MNPeds for SE = 1.35CVa. Difference of random error detection between the training and testing datasets is the moving proportion of normal results with CLs = meanproportion ± 3 × SDproportion. is the moving SD with 0.1% false rejection rate as CLs. is the moving SD with CLs based on minimum and maximum control data without extra error. and were set to exclude the outer 0, 1 and 5% of all results, respectively. All the procedures in the table had the same MNPfrs (1650 tests). The MNPeds for various random errors (from 1.82 CVa to 5.0 CVa) are listed in the table. The rows marked gray were results from training dataset, and the others were from testing set. These candidate QC procedures had highly consistent random error detection performance in the training and test dataset.

DISCUSSION

The characteristics of the error detection of various algorithms were analyzed and compared. The moving average and moving median were sensitive to SE, and the moving SD tended to detect RE. P3SD demonstrated excellent performance for both SE and RE. Overall, the A0.1%, M0.1% and P0.1% were more competent to detect SE than Amm, Mmm and Pmm, but the difference among themselves was unobvious and variable. In general, CLs calculation with “0.1% false alarm rate” had more effective performance than that set false alarm to zero (minimum and maximum as CLs). The ability to detect the RE of P3SD, S0.1%, Smm, P0.1% and Pmm decreased in the sequence. For serum sodium, P3SD,N = 50,T0 and S0.1%,N = 25,T1% were the optimized QC procedures for SE and RE, respectively. A3.09 and SC4 were excluded for high false rejection in our research. In the previous reports, , they showed acceptable, even satisfactory performance. That is because they assessed QC performance with simulated patients’ data, instead of real patients’ data. The simulated patients’ data usually have a perfect Gaussian distribution, while the real measurements probably do not follow it. Both MNPed and MNPfr increased with the enlargement of block sizes, except for procedures with minimum and maximum as CLs. It was understandable. For example, there was a sample with 110 mmol/L sodium, and it was incorporated to calculate QC data. Owing to this outlier, the average of 25 tests would obviously decrease while the average of 150 tests would just change slightly. Only after incorporating more such outliers did the QC data for N = 150 begin to apparently decrease. Thus, the main trend was that both MNPed and MNPfr increased with larger block sizes. Additionally, the smaller the block size, the larger the fluctuations in the QC data were. For procedures with small block sizes (such as N = 25), the CLs derived from minimum and maximum values were wide and not sensitive to small errors. The impact of truncation on QC performance depended on TLs, QC algorithms and the types of error. The significant improvement in QC performance due to truncation was only found in moving SD. So only moving SD was recommended to set proper truncation limits. There are well‐known significant differences between traditional QC and patient‐based QC, and the limitations of traditional commercial QC have been increasingly recognized. The 4th edition of the Clinical and Laboratory Standards Institute (CLSI) C24 document recommends laboratories introduce additional QC performance metrics that are more directly related to patient risk. In other words, the traditional performance metrics (probabilities for error detection and false rejection) are not suitable for risk management. Even more important, the C24 document has proposed that the frequency of QC events and their relationship to patient risk should be the focus of QC practices. PBRTQC may be an effective way to solve these problems, due to its ability for real‐time monitoring and focus directly on patients’ results. Additionally, the TEa based on desirable biological variation is usually demanding for serum calcium, chloride, sodium and albumin. , The biological variation of these tests is smaller, so their TEa is stricter. As a result, the sigma metrics are so low that even multiple rules cannot achieve satisfactory performance. In contrast, the smaller the biological variation is, the more powerful the error detection ability of PBRTQC is. So, the tests with smaller biological variation are more suitable for PBRTQC, and they just need PBRTQC to make up for the inability of traditional QC to detect error. That is also why we selected sodium for our research. Nevertheless, PBRTQC is certainly complex and unpredictable compared with traditional QC. In all, these different characteristics between traditional and PBRTQC offer an opportunity to strengthen QC plans by combining them, rather than using one method in place of another. To be specific, Figure 6 shows the proposed flowchart for the serum sodium of PBRTQC in routine clinical chemistry. Laboratory should set parameters of QC procedures firstly. The traditional individualized QC was designed based on sigma metrics of analytical performance. When setting parameters of PBRTQC, enough patient outcomes should be collected from a stable analytical system, one month at least. Then exclude outliers according to TLs, if it was needed. After that, define suitable CLs according to the optimized procedures. After proper parameters setting, the whole protocols will be performed in routine work. Traditional QC, which was usually performed at the initial phase of analysis, was applied as a confirmatory tool. If the traditional QC was in control, the analytical system started to measure patients’ samples. As measurement results were produced, PBRTQC was initiated. PBRTQC was considered as an alarm tool for monitoring performance in real time. In part, PBRTQC also decided when to perform traditional QC again. The combination of P3SD,N = 50,T0 and S0.1%,N = 25,T1% were recommended as the optimized PBRTQC for serum sodium. The detailed parameters of them are listed in Table 3. In practice, a new patient's result corresponds to a new QC data. The large block size wouldn't delay the startup of PBRTQC. Take N = 150 as an example, the first result of today can be combined with 149 last results of yesterday to calculate a new QC data. If PBRTQC was out of control, further measures were needed to confirm the analytical status, such as additional commercial QC or retesting retained samples.

FIGURE 6

Proposed flowchart for the serum sodium of patient‐based real‐time quality control (PBRTQC) charts. is the moving proportion of normal results with CLs = meanproportion ± 3 × SDproportion, without truncation. is the moving SD with 0.1% false rejection rate as CLs and 1% outliers exclusion as truncation limits. system error. random error. : control limits. The traditional individualized QC was designed based on sigma metrics of analytical performance. Enough patient outcomes were collected from a stable analytical system to set PBRTQC parameters. Then outliers were excluded according to truncation limits, if it was needed. After that, proper CLs were defined according to the optimized procedures. At last, the combination of traditional QC and PBRTQC would be performed in the laboratory. Traditional QC, which was performed at the initial phase of analysis, was applied as a confirmatory tool. If the traditional QC was in control, the analytical system started to measure patients’ samples. As measurement results were produced, PBRTQC was initiated. PBRTQC was considered as an alarm tool for monitoring performance in real time. If PBRTQC was out of control, further measures were needed to confirm the analytical status, such as additional commercial QC or retesting retained samples We investigated and compared the characteristics of the error detection of various algorithms (i.e., moving average, moving median, moving SD and moving proportion of normal results), including a variety of definition methods of CLs, simultaneously. For routine laboratories, the QC procedures investigated in this paper is common and easy to implement. In addition, both SE and RE were investigated in this study. Second, the optimized QC procedure was based on the critical error instead of percentage of TEa. The critical error, which was decided by the TEa and analytical performance, was closely related to Sigma metrics of analytical system (SEc = Sigma metrics −1.65). The critical error was initially applied in the designation of traditional QC procedures. Similarly, it should be of concern in PBRTQC too. Thus, it is scientifically reasonable to optimize QC procedures according to their capacity to detect the critical error. Third, ∑MNPed was the advanced parameter to evaluate overall QC performance and decided the optimized QC procedures in this article. MNPed replaced ANPed as the basic parameter of QC performance. ANPed uses the average number of patient results affected before error detection, whereas MNPed uses the median number. As the numbers of results necessary for error detection is not normally distributed, MNPed is more suitable. The ability of a QC procedure to detect critical error should be received with concern. But ideally, any error greater than the critical error should be detected too. As a result, MNPeds for errors greater than the critical error should be valued too. So ∑MNPed was more powerful than MNPed. This study has several specific limitations. First, only serum sodium was investigated in our study, and more analytic tests with significantly different characteristics should be investigated in the future. Nevertheless, both our research and previous research , have demonstrated that serum sodium is probably the most suitable chemistry test for PBRTQC because of its small biological variation. PBRTQC is not suitable for every test, particularly, tests with low production numbers (e.g., iron), tests with an extreme variation in results (e.g., C‐reactive protein and urea) or a combination of both (i.e., lipase and amylase). In practice, it seemed to be more feasible that PBRTQC start with several typical tests, and then be extended to most tests. Second, the series of errors were introduced using a step‐shift strategy, not a gradual degradation in data simulation. In fact, a gradual degradation error may be closer to reality and be more difficult to detect. In conclusion, the combination of P3SD,N = 50,T0 and S0.1%,N = 25,T1%, which were the quickest to detect any type of critical error, are recommended as the optimized QC procedure for serum sodium.

CONFLICT OF INTEREST

The authors declare that there is no conflict of interest or financial disclosure related to this publication.

AUTHOR CONTRIBUTIONS

Yuanyuan Li took part in conceptualization, formal analysis, funding acquisition, writing the original draft and writing, reviewing and editing. Qian Yu carried out investigation and data curation. Xiaoyan Zhang was involved in methodology and data curation. Xiaoling Chen had contributed to writing, reviewing and editing. Table S1 Click here for additional data file. Table S2 Click here for additional data file.

24 in total

1. THE "AVERAGE OF NORMALS" METHOD OF QUALITY CONTROL.

Authors: R G HOFFMANN; M E WAID
Journal: Am J Clin Pathol Date: 1965-02 Impact factor: 2.493

2. "Big Data" in Laboratory Medicine.

Authors: Nicole V Tolan; M Laura Parnas; Linnea M Baudhuin; Mark A Cervinski; Albert S Chan; Daniel T Holmes; Gary Horowitz; Eric W Klee; Rajiv B Kumar; Stephen R Master
Journal: Clin Chem Date: 2015-10-20 Impact factor: 8.327

3. Design and assessment of average of normals (AON) patient data algorithms to maximize run lengths for automatic process control.

Authors: J O Westgard; F A Smith; P J Mountain; S Boss
Journal: Clin Chem Date: 1996-10 Impact factor: 8.327

4. What's New in Laboratory Statistical Quality Control Guidance? The 4th Edition of CLSI C24, Statistical Quality Control for Quantitative Measurement Procedures: Principles and Definitions.

Authors: Curtis A Parvin
Journal: J Appl Lab Med Date: 2017-03-01

5. Implementation and application of moving average as continuous analytical quality control instrument demonstrated for 24 routine chemistry assays.

Authors: Huub H van Rossum; Hans Kemperman
Journal: Clin Chem Lab Med Date: 2017-07-26 Impact factor: 3.694