| Literature DB >> 33042129 |
Claude Lambert1, Gulderen Yanikkaya Demirel2, Thomas Keller3, Frank Preijers4, Katherina Psarra5, Matthias Schiemann6, Mustafa Özçürümez7, Ulrich Sack8.
Abstract
Many anticancer therapies such as antibody-based therapies, cellular therapeutics (e.g., genetically modified cells, regulators of cytokine signaling, and signal transduction), and other biologically tailored interventions strongly influence the immune system and require tools for research, diagnosis, and monitoring. In flow cytometry, in vitro diagnostic (IVD) test kits that have been compiled and validated by the manufacturer are not available for all requirements. Laboratories are therefore usually dependent on modifying commercially available assays or, most often, developing them to meet clinical needs. However, both variants must then undergo full validation to fulfill the IVD regulatory requirements. Flow cytometric immunophenotyping is a multiparametric analysis of parameters, some of which have to be repeatedly adjusted; that must be considered when developing specific antibody panels. Careful adjustments of general rules are required to meet legal and regulatory requirements in the analysis of these assays. Here, we describe the relevant regulatory framework for flow cytometry-based assays and describe methods for the introduction of new antibody combinations into routine work including development of performance specifications, validation, and statistical methodology for design and analysis of the experiments. The aim is to increase reliability, efficiency, and auditability after the introduction of in-house-developed flow cytometry assays.Entities:
Keywords: accreditation; flow cytometry; laboratory diagnostics; procedures; quality control; validation
Mesh:
Substances:
Year: 2020 PMID: 33042129 PMCID: PMC7528430 DOI: 10.3389/fimmu.2020.02169
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Total number of cells to collect in detection of rare events.
| 20 | 5 | 222 | 2,000 | 8,000 | 22,222 |
| 50 | 2 | 556 | 5,000 | 20,000 | 55,556 |
| 100 | 1 | 1,111 | 10,000 | 40,000 | 111,111 |
| 1,000 | 0.1 | 11,111 | 100,000 | 400,000 | 1,111,111 |
| 10,000 | 0.01 | 111,111 | 1,000,000 | 4,000,000 | 11,111,111 |
| 100,000 | 0.001 | 1,111,111 | 10,000,000 | 40,000,000 | 111,111,111 |
| 1,000,000 | 0.0001 | 11,111,111 | 100,000,000 | 400,000,000 | 1,111,111,111 |
For very rare cell populations, number of cells to be analyzed increases substantially.
Clinical performance characteristics given by EU-IVD-R that shall be stated by manufacturers to state “fitness for purpose” need to be maintained during the lifetime of an IVD.
| Diagnostic sensitivity | Test positivity in disease, true positive fraction, ability of a test to correctly identify disease at a particular decision threshold ( | “Diagnostic sensitivity” is used in Europe and “clinical sensitivity” is used in the United States ( | Clinical performance assessment requires sufficient analytical evaluation. The initial analytical performance assessment must include “abnormal” samples, which must be distinguishable from normal or negative samples, respectively. Crucial for any diagnostic performance study are well defined clinical conditions that specify positivity. |
| Diagnostic specificity | Test negativity in healthy, true negative fraction, ability of a test to identify the absence of disease at a particular decision ( | The following question is addressed: To what degree does the test reflect the true disease state? The specificity (spec) is the fraction of patients correctly identified by the test to not have the disease (true test negatives), among all patients without the disease (as defined by an independent reference standard). | As stated for sensitivity, diagnostic specificity assessment also relies on enough initial analytical performance studies. Clinical studies, a retrospective evaluation and thoroughly plausibility checks are proposed that need to be planned and documented with respect to form sheets provided and assessment strategies. |
| Positive predictive value | The percentage of positive test results that are true positives when the test is applied to a population containing both healthy and diseased subjects ( | The following question is addressed: How likely is the disease given the test results? The positive predictive value (PPV) describes the perspective of a physician or a patient in view of a positive test result: It is the probability that the patient has the disease (as defined by an independent reference standard) given a positive test result or (post-test probability). | Immunophenotyping of certain diseases with special markers, provides information on positive predictive value, such as CD200 for diagnosis of Chronic Lymphocytic Leukemia (CLL). It is specific except nodal MCL – Mantle Cell Lymphoma ( |
| Negative predictive value | Test negativity in healthy, true negative fraction, ability of a test to identify the absence of disease at a particular decision threshold. | The following question is addressed: How likely is non-disease given the test results? The negative predictive value (NPV) describes the perspective of a physician or a patient in view of a negative test result: It is the probability that the patient has not got the disease (as defined by an independent reference standard) given a negative test result (post-test probability). | The presence or lack of an antigen provide information on Negative Predictive Value (NPV). A good example is 100% NPV (prevalence = 4%, PPV = 5.4%) for neutrophil expression of CD64 for excluding sepsis cited by ( |
| Likelihood ratio | “Likelihood ratio” means the likelihood of a given result arising in an individual with the target clinical condition or physiological state compared to the likelihood of the same result arising in an individual without that clinical condition or physiological state ( | DLR+: The following question is addressed: By how much does the test change knowledge of the disease status? | Sometimes presence or absence of one marker effect the likelihood ratio of flow cytometry results as CD49d for CLL prognosis. CD49d is an unfavorable prognostic marker, comparison of likelihood ratio along with other performance measures indicated that omission of CD49d significantly reduces the prognostic power of the prediction models ( |
Analytical performance characteristics given by EU-IVD-R that shall be stated by manufacturers to state “fitness for purpose” need to be maintained during the lifetime of an IVD.
| Analytical sensitivity | Quotient of the change in an indication of a measuring system and the corresponding change in a value of a quantity being measured (Slope of an empirical calibration curve (indirect reference measurements). | There are several definitions of “analytical sensitivity” with different meanings. Within this document we use the term “analytical sensitivity” to describe any performance evaluation in terms of LoB, LoD (see below) and/or LoQ (see below), as in the IMDRF framework. Another general term, which is used by CLSI ( | Sensitivity refers to the precision and accuracy of rare events and dim antigen measurements. It is important for measurable/minimal residual disease analysis for leukemia, lymphoma, and multiple myeloma samples. For this type of samples, to reach to high level of sensitivity, minimal number of cell counts are important. Lower Limit of Detection (LOD) is the lowest number of cells counted. Usually 10–50 events are enough for adequate calculations. At least 50 events are necessary for lower limit of quantitation (LOQ). LOD and LOQ can be obtained by below formula: LOD or LOQ = (MRD Cluster/total cells acquired) ×100% ( |
| Analytical specificity | Note: analytical specificity resembles the concept named selectivity. Selectivity gives an indication of how strongly the result is affected by other components in the sample ( | Specificity is how well a flow cytometry test determines the specific cell population and/or the antigen evaluated. This includes all stages of cytometry analysis from sample collection to patient report release. Sample type, antibody selections, panel design, analysis, standardized interpretation of results are important for the analytical specificity ( | |
| Trueness (bias) | Closeness of agreement between the average of an infinite number of replicate measured quantity values and a reference quantity value ( | Measurement trueness is inversely related to systematic measurement error. The estimate for the systematic error is the bias. The bias is measured as the difference between an average of quantity values and a reference quantity value used as measure for “true quantity.” | Not required/not possible to establish in majority of immune-oncological applications. There is no gold standard. Therefore, most EQA use consensus values. |
| Precision | Closeness of agreement between indications or measured quantity values obtained by replicate measurements on the same or similar objects under specified conditions. | Comment: Measurement precision is usually expressed numerically by measures of imprecision, such as standard deviation, variance, or coefficient of variation under the specified conditions of measurement. Precision is inversely related to the random error of a measurement and covers several reasons of it. | Intra-assay and inter-assay precision need to be assessed. Intra-assay precision is determined when same sample is measured repeatedly under the same conditions, and how close the results are. Accepted criteria for immunophenotyping are co-efficient variation (CV) of 10–25% ( |
| Repeatability | Measurement precision under a set of repeatability conditions of measurement with | The most effective and sufficient experiment follows a hierarchical design. Within this design, several variance components (e.g., repeatability, operator-to-operator-variability and day-to-day variability) are evaluated together. A hierarchical design with nested factors (e. g., 3 operators investigate on 5 days 3 replicates (3 × 5 × 3 measurements). In case of 1 factor and repeatability, the analysis can be performed using simple Excel-Spreadsheets. | Repeatability can be measured by preparing 3–6 samples in at least three replicates. In one run all samples can be tested. This assay should be run on one instrument by one technical person. It should be measured on the most representative type of samples and the most representative cell subset, at different levels. |
| Intermediate precision | Measurement precision under a set of intermediate precision conditions of measurement with | This type of measurement can only be assessed with QC samples when available. Because of the sample shortage and the cost of the analysis, repeats cannot be done as many times as usually recommended in biochemistry. Dorn-Beineke et al. recommend higher numbers ( | |
| Reproducibility | Measurement precision under reproducibility conditions of measurement with reproducibility condition: condition of measurement, out of a set of conditions that includes different locations, operators, measuring systems, and replicate measurements on the same or similar objects | Reproducibility measurements for instruments can be performed by two different technicians (one for each instrument). If there is an inconsistency between the results, then the technical person and the instrument need to be evaluated. Stabilized IQC if available can be analyzed daily, keeping in mind that the stabilization procedure alters cell shape and marker expression. Again, because of the sample limited volume and the cost of the analysis, we propose testing at least one IQC per level, per type of sample available, per operating day. Inter operator reproducibility can be estimated by comparing IQC analyses between different operators on different times. | |
| Accuracy (resulting from trueness and precision), | Closeness of agreement between a measured quantity value and a true quantity value of a measurand. | Accuracy is a conceptual term describing the agreement of a single measured value with the true quantity. | If bias could not be established, accuracy given by precision. Comparison of results from different laboratories may be used for calculation of accuracy. Participation to external QC/proficiency testing programs when available will provide the most useful information for systematic error. |
| Limits of detection | Measured quantity value, obtained by a given measurement procedure, for which the probability of falsely claiming the absence of a component in a material is β, given a probability α of falsely claiming its presence. | The LoD signals the presence of a measurand in the sample. Lowest measured quantity value at which it is statistically shown that “something” of the component is in the sample (qualitative statement). α and β are typically set to 5%. | MRD is a good example. There are different options for detection of LOD. FMO (fluorescence minus one) can be used as LOD tool, by omitting the antibody of interest. Using healthy donor samples is also possible. Rare results require high cell counts to be analyzed (Poisson challenge). Cell identification is based on a good separation of positive/negative labeling and the sensitivity of detection that is limited if the fluorescence of the conjugate is poor or if the antigen is expressed at low density on cells, e.g., below 1,000 molecules/cell ( |
| Limit of quantitation | Lowest amount of measurand in a sample can be quantifiably determined with stated acceptable precision and trueness under stated experimental conditions | Similar tools used for obtaining LOD can be used for LOQ determination. Spiking leukemia samples with known dilutions into healthy donor samples can also provide data for determination of LOQ. This resolution allows to distinguish two populations in a mixture of particles that differ in mean signal intensity ( | |
| Measuring range | Working interval set of values of quantities of the same kind that can be measured by a given measuring instrument or measuring system with specified instrumental measurement uncertainty, under defined conditions. | For fit for purpose validation, verification with a minimum of ten donors are recommended when validated IVD/CE assays are used ( | |
| Linearity | Assuming no constant bias, the ability (within a given range) to provide results that are directly proportional to the concentration (amount) of the measurand in the test sample. | According CLSI EP06 ( | Linearity can be achieved by use of standard calibrators to control the efficacy of fluorescence detectors on the measurement device. To achieve linearity measurement on biological samples can be possible by spiking healthy donor samples with known cells such as leukemia cells. |
| Cut-off | The cut-off refers to a specific measurement value which is used as a decision limit to distinguish between different categories of test results, typically between positive and negative test results. | Cut-off level is a test value or statistic that marks the upper (or lower) boundary between diagnostic categories, i.e., between negative (acceptable or unaffected) results and positive (unacceptable or affected) results ( | Cut-off values are used for clinical performance determination and for qualitative tests as detection of allergen-specific basophil granulocytes. For quantitative analysis (expression strength), the minimal level of fluorescent intensity measured on each cell is directly dependent on (a) the antigen density ( |
| Determination of appropriate criteria for specimen collection and handling | Common criteria are defined in the pre-analytical handbook of laboratories. | For different matrix (bone marrow, peripheral blood, body fluids) and different analysis (such as platelets or activated platelets), appropriate specimen collection and handling instructions should be validated and be provided in written format. Clotting, contamination, or mucous must be avoided. | |
| Robustness | Show, that specific factors have no influence on measurement results | When the aim is to show no influence of the factor, the analysis with equivalence tests (TOST) is appropriate. To use criteria like “no statistical significance ( | Robustness can be measured by measuring the tested parameters' impact on results. |
Figure 1Demonstration of a statistically proof using confidence intervals (A). When this problem is formulated as a statistical test, it refers to the two 1-sided test approach (TOST) (B).
Sample sizes necessary to demonstrate equivalence via TOST in a paired design when acceptance criteria cover the range (−1, 1), in dependence on standard deviation of the pairwise differences, real deviation, and power.
| 0.25 | 80% | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
| 0.5 | 4 | 4 | 5 | 5 | 5 | 6 | 8 | |
| 0.75 | 7 | 7 | 8 | 8 | 9 | 12 | 16 | |
| 1 | 11 | 11 | 12 | 13 | 15 | 19 | 27 | |
| 1.25 | 15 | 16 | 18 | 19 | 22 | 29 | 41 | |
| 1.5 | 21 | 22 | 25 | 27 | 30 | 41 | 58 | |
| 1.75 | 28 | 29 | 33 | 36 | 41 | 54 | 78 | |
| 2 | 36 | 37 | 42 | 47 | 53 | 71 | 101 | |
| 0.25 | 90% | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
| 0.5 | 5 | 5 | 6 | 6 | 7 | 8 | 11 | |
| 0.75 | 8 | 9 | 10 | 11 | 12 | 15 | 21 | |
| 1 | 13 | 13 | 15 | 17 | 19 | 26 | 36 | |
| 1.25 | 19 | 20 | 23 | 26 | 29 | 39 | 55 | |
| 1.5 | 26 | 28 | 32 | 36 | 41 | 55 | 79 | |
| 1.75 | 35 | 37 | 43 | 49 | 55 | 75 | 107 | |
| 2 | 45 | 48 | 56 | 63 | 72 | 97 | 139 | |
Overall alpha level is set to 5%. The proportional relationship between acceptance criteria, standard deviation and real deviation can directly be used to derive samples size for other scenarios. Example: Acceptance criteria: +/– 30%, CV of the differences = 15%, real deviation = 0%, power = 80% → sample size = 4 (achieved by using StdDev = 0.5, deviation = 0 and power = 80%). The CV of differences should be the precision of the single experiment multiplied with 1.4 (= square root of 2).
Figure 2Result of 1,000 simulation of results of repeatability experiment when 3, 5, 10, 20, and 50 replicates are used, with mean=10 and standard deviation =2, shown as dot-plots with overlying Box-whisker plots.
Specific method validation and acceptance limits.
| Risks | Sample, reagents operator, data analysis | + | + | + | + | |
| Sample type | Typical | + | + | + | + | |
| Repeatability | RSD (%) | 11 repeats 2 levels. preferentially combined with reproducibility in a hierarchical precision experiment ( | + | NA | 7–10 | <10% |
| Reproducibility | IQC Levey-Jennings, eventual interlaboratory comparison | 18-24 tests 2 levels | NA | NA | NA | <10–15% Precision |
| Trueness (bias) | EQC usual workflow | 3–5/year 2 levels | + | NA | NA | <15% |
| Global uncertainty | Uncertainty2 = | + | + | NA | NA | |
| Working range | 6–10 × 1/3 or 1/4 dil. | clinical relevance | + | NA | + | Set deviations from linearity in relationship with repeatability |
| LOQ (low) | % of leukocytes | 10−3 % (10 cell/μL) | 10−4 −10−5% | Extrapolated | ||
| Sample | 10 fresh samples | Subpopulations | + | + | + | <10% |
| Stability of | 2–3 fresh samples | Subpopulations (%) | + | + | + | <10% |
| Interferences | Atypical phenotype | Generic form | + | Extrapolated | Extrapolated | |
| Carry-over | 3 (very) high, 3 low, | (L1-L3)/(meanH-L3) | + | Extrapolated | Extrapolated | <1% |
| Method comparison | At least 30 double tests | Multiple instruments | Few tests | – | – | Difference~0, Slope~1 |
| Reference values | 30 healthy donors (F/M) | Most representative | – | – | – | |
| Special groups | literature | Children, elderly. | – | – | – | |
Figure 3Presentation of the structure proposed for the accreditation documents. A generic form is to record and report all common information (including environment, material, management, manpower) and method characteristics that cannot be tested for each panel. Then specific forms should be written individually per panel (several parameters, several assays). Technical details (antibodies, clones, conjugates, gating strategy, risks of error, and guidelines for interpretation) should be presented in an easy-to-update SOP. Results with technical and reference information should be managed by the laboratory informatics system to be published for correct interpretation. Any redundancy should be avoided for safety and management reasons.