Literature DB >> 31383961

"Interchangeability" of PD-L1 immunohistochemistry assays: a meta-analysis of diagnostic accuracy.

Emina Torlakovic1,2, Hyun J Lim3, Julien Adam4, Penny Barnes5, Gilbert Bigras6, Anthony W H Chan7, Carol C Cheung8,9, Jin-Haeng Chung10, Christian Couture11, Pierre O Fiset12, Daichi Fujimoto13, Gang Han14, Fred R Hirsch15, Marius Ilie16, Diana Ionescu17, Chao Li18, Enrico Munari19, Katsuhiro Okuda20, Marianne J Ratcliffe21, David L Rimm22, Catherine Ross23, Rasmus Røge24, Andreas H Scheel25, Ross A Soo26, Paul E Swanson27,28, Maria Tretiakova27, Ka F To8, Gilad W Vainer29, Hangjun Wang30, Zhaolin Xu6, Dirk Zielinski31, Ming-Sound Tsao8,9.   

Abstract

Different clones, protocol conditions, instruments, and scoring/readout methods may pose challenges in introducing different PD-L1 assays for immunotherapy. The diagnostic accuracy of using different PD-L1 assays interchangeably for various purposes is unknown. The primary objective of this meta-analysis was to address PD-L1 assay interchangeability based on assay diagnostic accuracy for established clinical uses/purposes. A systematic search of the MEDLINE database using PubMed platform was conducted using "PD-L1" as a search term for 01/01/2015 to 31/08/2018, with limitations "English" and "human". 2,515 abstracts were reviewed to select for original contributions only. 57 studies on comparison of two or more PD-L1 assays were fully reviewed. 22 publications were selected for meta-analysis. Additional data were requested from authors of 20/22 studies in order to enable the meta-analysis. Modified GRADE and QUADAS-2 criteria were used for grading published evidence and designing data abstraction templates for extraction by reviewers. PRISMA was used to guide reporting of systematic review and meta-analysis and STARD 2015 for reporting diagnostic accuracy study. CLSI EP12-A2 was used to guide test comparisons. Data were pooled using random-effects model. The main outcome measure was diagnostic accuracy of various PD-L1 assays. The 22 included studies provided 376 2×2 contingency tables for analyses. Results of our study suggest that, when the testing laboratory is not able to use an Food and Drug Administration-approved companion diagnostic(s) for PD-L1 assessment for its specific clinical purpose(s), it is better to develop a properly validated laboratory developed test for the same purpose(s) as the original PD-L1 Food and Drug Administration-approved immunohistochemistry companion diagnostic, than to replace the original PD-L1 Food and Drug Administration-approved immunohistochemistry companion diagnostic with a another PD-L1 Food and Drug Administration-approved companion diagnostic that was developed for a different purpose.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 31383961      PMCID: PMC6927905          DOI: 10.1038/s41379-019-0327-4

Source DB:  PubMed          Journal:  Mod Pathol        ISSN: 0893-3952            Impact factor:   7.842


Introduction

Clinical trials have shown that it is possible to successfully restore host immunity against various malignant neoplasms even in advanced stage disease by deploying drugs that target the PD-1/PD-L1 axis [1-5]. In most of these studies, higher expression of PD-L1 was associated with a more robust clinical response, suggesting that detection of PD-L1 expression could be used as predictive biomarker. However, anti-PD1/PD-L1 therapy companies developed distinct immunohistochemistry protocols for assessing a single biomarker (PD-L1 expression), as well as different scoring schemes for the readouts. The latter include differences in the cell type assessed for the expression and different cut-off points as thresholds [2-4]. “Intended use” in this context is a part of the so-called “3D” concept, where a “fit-for-purpose” approach to test development and validation establishes explicit links between Disease, Drug, and Diagnostic assay [6]. Several such fit-for-purpose immunohistochemistry kits are commercially available, but in clinical practice, and especially in publicly funded health care, it is challenging to make all such testing available to patients [7-9]. Because of the great need to simplify testing, either by reducing the number of immunohistochemistry assays being used or the number of interpretative schemes employed or both, many studies have been conducted that compared the analytical performance of the various immunohistochemistry PD-L1 assays to determine if they might be deemed “interchangeable”. The concordance in the analytical performance of the immunohistochemistry assays and scoring algorithms derived from these studies have been reviewed by Büttner et al. and Udall et al. respectively [7, 10]. Although most of these studies have compared different PD-L1 immunohistochemistry assays to one another, there is little guidance on how the results of these studies may be applied clinically. The goal of this study is to assess the performance of PD-L1 immunohistochemistry assays based on their diagnostic accuracy at specific cut-points, as defined for specific immunotherapies according to the clinical efficacy demonstrated in their respective pivotal clinical trials. In other words, given an Food add Drug Administration-cleared assay, which other assays can be considered substantially equivalent for that specific purpose? Although comparison of immunohistochemistry assays for their analytical similarities is warranted and useful for clinical immunohistochemistry laboratories, it is an insufficient foundation on which to make an informed decision whether an Food and Drug Administration-approved companion diagnostic with a specific clinical purpose can be replaced by another assay, whether the substitute assay is an Food and Drug Administration-approved companion diagnostic for a different purpose or a laboratory developed test. The more appropriate approach for these qualitative assays would be comparing the results of the candidate assay for its diagnostic accuracy against a comparative method/assay or designated reference standard [11]. We report here the results of our meta-analyses of 376 assay comparisons from 22 studies for different cut-off points, focusing on the sensitivity and specificity of these tests, based on their intended clinical utility.

Methods

Methodology including data sources, study selection, data abstraction, and grading evidence, are detailed in Supplementary Files Methodology. Modified GRADE and QUADAS-2 criteria were used for grading published evidence and designing data abstraction templates to guide independent extraction by multiple reviewers [12-15]. PRISMA was used to guide reporting of the systematic review and meta-analysis and STARD 2015 for reporting the diagnostic accuracy study [16-18]. CLSI EP12-A2 was used to guide test comparisons [11] Data were pooled using a random-effects model.

Framework

A systematic review of literature was conducted as a part of a national project for developing Canadian guidelines for PD-L1 testing. The Canadian Association of Pathologists – Association canadienne des pathologistes (CAP-ACP) National Standards Committee for High Complexity Testing initiated development of CAP-ACP Guidelines for PD-L1 testing to facilitate introduction of PD-L1 testing for various purposes to Canadian clinical immunohistochemistry laboratories. This review was also used to guide the selection of publications to be used in this meta-analysis.

Purpose-based approach

The purposes identified in the systematic review of published literature were based on either the clinical purpose that was specifically identified in the published study or the intended purpose for which the included specific companion diagnostic assay was clinically validated. Although a large number of potential purposes were identified, only few could be included in this meta-analysis. The selection was based on the type of data available, including which immunohistochemistry protocols and which readout was performed by the authors. The greatest limitation in the accrual of data from these published studies was based on the selection of the readout employed to assess the results; in most studies the readout was limited to tumor proportion score with 1% and 50% cut-offs, which is essentially based on the clinically meaningful cut-offs for pembrolizumab and nivolumab therapy. Hence, these two readouts were selected for our analysis and form the basis for outlining different purposes that are derived from the combination of the readouts and immunohistochemistry kits/protocols that use these readouts and are approved by regulatory agencies (e.g., Food and Drug Administration) for different clinical uses. Most published studies on PD-L1 test comparison did not include 2 × 2 tables that would allow calculations of either diagnostic sensitivity and specificity or positive percent agreement and negative percent agreement. The CAP-ACP National Standards Committee for High Complexity Testing requested this information from the authors of studies where it was evident that the authors generated such results, but did not include them in their published manuscript. Most studies required generation of multiple 2 × 2 tables, as each one was designed for a specific purpose and set of candidate’ and ‘comparator’ assays. For primary studies that provided sufficient detail, information on study setting, comparative method/reference standard and 2 x 2 tables for different tumor proportion score cut-offs were extracted, from which accuracy results were reported. Studies of PD-L1 immunohistochemistry assay comparisons that did not compare the performance of the assay to any designated or potential reference standard (e.g., where analytical comparison of PD-L1 assays were all laboratory developed tests and/or no specific purpose was identified or where positive percent agreement and negative percent agreement could not be generated from study data) [19-22] were not included in this meta-analysis, because in such studies, diagnostic accuracy for a specific clinical purpose could not be determined. The acquisition of data resulted in cumulative evidence of 376 assay comparisons from 22 published studies [6, 23–43].

Study tissue model(s)

Most studies evaluated PD-L1 immunohistochemistry in non-small cell lung cancer, resulting in 337 test comparisons. Comparisons of test performance in other tumors were much less common. These included analysis of urothelial carcinoma (20 test comparisons), mesothelioma (9 test comparisons) and thymic carcinoma (9 test comparisons).

Meta-analysis

Reported or calculated diagnostic accuracy (sensitivity and specificity) from the individual studies were summarized. Random-effects models were fitted [44, 45]. For the qualitative review, a forest plot was used to obtain an overview of sensitivity and specificity for each study. Cochran’s heterogeneity statistics Q and I2 were used to examine heterogeneity among studies. Funnel plots and Egger’s test were applied to detect possible publication bias (see Supplementary Files for images of funnel plots) [46, 47]. The significance level of 0.05 was set for all analyses. Meta-analysis was performed using software Strata 15 SE.

Interpretation of results

Clinically acceptable diagnostic accuracy

For the purpose of this study, the immunohistochemistry candidate assays were considered to be acceptable for clinical applications if both sensitivity and specificity for the stated clinical purpose/application were ≥90% [48].

Applicability of meta-analysis results: Food and Drug Administrtion-approved immunohistochemistry kits vs. laboratory developed tests

Assuming that laboratories follow the instructions for use provided with Food and Drug Administration-approved or European CE-marked immunohistochemistry kits, the overall results of this meta-analysis could be considered highly representative and generalizable of Food and Drug Administration-approved assay performance and the diagnostic accuracy of that assay against a designated reference standard for the stated specific purpose in any laboratory. However, this assumption cannot be applied to the results of laboratory developed tests, because laboratory developed test immunohistochemistry protocol conditions were often different in different laboratories even when the same primary antibody was used, and the protocol was performed on the same automated instrument with the same detection system (e.g., different type and duration of antigen retrieval, primary antibody dilution or incubation time, number of steps of amplification, etc.). When the results of the meta-analysis for laboratory developed tests were suboptimal, but one or more laboratories achieved ≥90% sensitivity and specificity; we cannot exclude the possibility that with appropriate immunohistochemistry protocol modification and assay validation, other laboratories could also achieve optimal results. This contrasts with the use of Food and Drug Administration-approved assays, where no protocol modifications are allowed. Therefore, where the results of laboratory developed tests are excellent, they are representative of what could be achieved by laboratory developed tests rather than that they are generalizable and that they will automatically be achieved in all laboratories.

Results

Meta-analysis (all tissue models)

The number of studies comparing different assays in this meta-analysis was larger than the number of published manuscripts, due to the frequent inclusion of multiple test comparisons in single publication as well as use of different cut-off points for “positive” vs. “negative” test result. Table 1A (non-small cell lung cancer), 1B (all tissue types), 2, and 3 summarize the number of studies that included both candidate and comparator test for a specific, clinically relevant purpose/cut-off point for a specific tissue model. Figures 1–13 illustrate forest plots with all studies using non-small cell lung cancer as tissue model (see Supplementary files for Figures 2–13). There was no significant difference in the results when non-small cell lung cancer studies were analyzed separately vs. meta-analysis of all tissue models (compare Table 1A to Table 1B).
Table 1A

Summary of NSCLC results from all studies (combined estimate of sensitivity and specificity)

TPS Cut-offGold StandardCandidate AssayNo. of ComparisonsSensitivity (95% CI)Specificity (95% CI)
1%PD-L1 IHC 22C3 pharmDx22C3 LDT110.99 (0.91–1.00)1.00 (0.78–1.00)
PD-L1 IHC 28-8 pharmDx180.96 (0.93–0.98)0.84 (0.77–0.88)
Ventana PD-L1 (SP263)150.93 (0.90–0.96)0.82 (0.78–0.86)
E1L3N LDT130.84 (0.78–0.89)0.92 (0.87–0.95)
Ventana PD-L1 (SP142)170.60 (0.53–0.66)0.96 (0.93–0.98)
28-8 LDT6See Table B
SP142 LDT3See Table C
SP263 LDT2See Table C
73-10 Assay1See Table C
PD-L1 IHC 28-8 pharmDxVentana PD-L1 (SP263)130.91 (0.87–0.94)0.87 (0.80–0.92)
PD-L1 IHC 22C3 pharmDx210.88 (0.84–0.92)0.93 (0.91–0.95)
28-8 LDT40.82 (0.67–0.91)0.91 (0.82- 0.96)
E1L3N LDT110.81 (0.77–0.85)0.96 (0.89–0.99)
Ventana PD-L1 (SP142)120.57 (0.50–0.64)0.99 (0.89–1.00)
SP142 LDT2See Table C
SP263 LDT2See Table C
73-10 Assay1See Table C
Ventana PD-L1 (SP263)PD-L1 IHC 28–8 pharmDx130.93 (0.86–0.97)0.84 (0.79–0.88)
PD-L1 IHC 22C3 pharmDx160.84 (0.77–0.89)0.91 (0.85–0.94)
E1L3N LDT80.81 (0.75–0.86)0.93 (0.85–0.96)
Ventana PD-L1 (SP142)120.57 (0.49–0.64)0.98 (0.97–0.99)
28–8 LDT4See Table B
SP142 LDT1See Table C
73-10 Assay1See Table C
22C3 LDT1See Table C
50%PD-L1 IHC 22C3 pharmDx22C3 LDT10See Table B
PD-L1 IHC 28-8 pharmDx180.94 (0.88–0.97)0.95 (0.92–0.97)
28-8 LDT60.95 (0.81–0.99)0.76 (0.67–0.83)
Ventana PD-L1 (SP263)150.91 (0.83–0.95)0.92 (0.89–0.95)
E1L3N LDT130.76 (0.62–0.86)0.97 (0.95–0.99)
Ventana PD-L1 (SP142)160.41 (0.29–0.53)1.00 (0.99–1.00)
SP142 LDT3See Table C
SP263 LDT1See Table C
73-10 Assay1See Table C
Ventana PD-L1 (SP263)28-8 LDT4See Table B
PD-L1 IHC 28-8 pharmDx120.68 (0.54–0.79)0.98 (0.97–0.99)
PD-L1 IHC 22C3 pharmDx160.57 (0.45–0.69)0.99 (0.97–1.00)
Ventana PD-L1 (SP142)10See Table B
SP142 LDT1See Table C
22C3 LDT1See Table C
73-10 Assay1See Table C
Table 1B

Summary of results from all studies (combined estimate of sensitivity and specificity)

TPS Cut-offGold StandardCandidate AssayNo. of ComparisonsSensitivity (95% CI)Specificity (95% CI)
1%PD-L1 IHC 22C3 pharmDx22C3 LDT110.99 (0.91 -1.00)1.00 (0.78–1.00)
PD-L1 IHC 28-8 pharmDx210.96 (0.92–0.97)0.84 (0.78–0.88)
Ventana PD-L1 (SP263)160.93 (0.88–0.95)0.83 (0.77–0.86)
E1L3N LDT140.84 (0.78–0.88)0.94 (0.90–0.96)
Ventana PD-L1 (SP142)180.61 (0.55–0.67)0.97 (0.94–0.98)
28-8 LDT6See Table B
SP142 LDT3See Table C
SP263 LDT2See Table C
73-10 Assay1See Table C
PD-L1 IHC 28-8 pharmDxVentana PD-L1 (SP263)140.90 (0.86–0.94)0.88 (0.82–0.93)
PD-L1 IHC 22C3 pharmDx240.88 (0.84–0.91)0.94 (0.92–0.96)
28-8 LDT40.82(0.67–0.91)0.91 (0.82–0.96)
E1L3N LDT120.80 (0.76–0.84)0.99 (0.94–0.99)
Ventana PD-L1 (SP142)130.59 (0.52–0.66)0.99 (0.96–1.00)
SP142 LDT2See Table C
SP263 LDT2See Table C
73-10 Assay1See Table C
Ventana PD-L1 (SP263)PD-L1 IHC 28-8 pharmDx140.93 (0.86–0.96)0.85 (0.80–0.88)
PD-L1 IHC 22C3 pharmDx170.83 (0.76–0.89)0.91 (0.85–0.94)
E1L3N LDT90.78 (0.71–0.84)0.93 (0.88–0.96)
Ventana PD-L1 (SP142)130.58 (0.51–0.66)0.98 (0.96–0.99)
28-8 LDT4See Table B
SP142 LDT1See Table C
73-10 Assay1See Table C
22C3 LDT1See Table C
50%PD-L1 IHC 22C3 pharmDx22C3 LDT10See Table B
PD-L1 IHC 28-8 pharmDx210.94 (0.88–0.97)0.95 (0.93–0.97)
28-8 LDT60.95 (0.81–0.99)0.76 (0.67–0.83)
Ventana PD-L1 (SP263)160.92 (0.84–0.96)0.92 (0.90–0.94)
E1L3N LDT130.76 (0.62–0.86)0.97 (0.95–0.99)
Ventana PD-L1 (SP142)170.42 (0.31–0.54)1.00 (0.99–1.00)
SP142 LDT3See Table C
SP263 LDT1See Table C
73-10 Assay1See Table C
Ventana PD-L1 (SP263)28-8 LDT4See Table B
PD-L1 IHC 28-8 pharmDx130.69 (0.56–0.79)0.98 (0.96–0.99)
PD-L1 IHC 22C3 pharmDx170.57 (0.46–0.68)0.99 (0.98–1.00)
E1L3N LDT90.36 (0.28–0.44)0.99 (0.93–1.00)
Ventana PD-L1 (SP142)10See Table B
SP142 LDT1See Table C
22C3 LDT1See Table C
73-10 Assay1See Table C
Summary of NSCLC results from all studies (combined estimate of sensitivity and specificity) Summary of results from all studies (combined estimate of sensitivity and specificity) Cochran’s heterogeneity statistic Q and I2 for sensitivity and specificity across all studies are shown in Supplementary Files Table 1.

Non-converging data

Where the number of studies was less than four or when the data were sparse due to the presence of a zero result in contingency tables (e.g., where sensitivity or specificity was 100%), the models did not converge and did not allow for meta-analysis calculations. As summarized in Tables 2 and 3, the latter occurred in a number of studies that had excellent results for both sensitivity and specificity (e.g., 22C3 laboratory developed test compared to PD-L1 IHC 22C3 pharmDx) or specificity only (e.g., Ventana PD-L1 (SP142) compared to Ventana PD-L1 (SP263) and other assays).
Table 2

Sensitivity and specificity of individual studies for which meta-analysis was not performed because of non-converging data

TPS Cut-offGold Standard AssayCandidate AssayAuthorYearTumor Type*Sample SizeSensitivitySpecificity
50%PD-L1 IHC 22C3 pharmDx22C3 LDTIlie et al.2018L1201.001.00
Ilie et al2017L1201.001.00
Røge et al.2017L750.931.00
Røge et al.2017L751.001.00
Neuman et al.2016L411.001.00
Ilie et al.2018L1201.001.00
Røge et al.2017L751.001.00
Ilie et al2017L1201.001.00
Neuman et al.2016L411.001.00
Munari et al.2018L1830.850.97
Hendry et al.2018L5510.841.00
50%Ventana PD-L1 (SP263)Ventana PD-L1 (SP142)Adam et al.2018L410.191.00
Adam et al.2018L410.111.00
Adam et al.2018L410.191.00
Adam et al.2018L410.111.00
Chan et al.2018L7130.671.00
Fujimoto et al.2017L400.331.00
Scheel et al.2016L1350.541.00
Soo et al.2018L180.331.00
Tretiakova et al.2018U1610.460.96
Kim et al.2017L970.001.00
Cheung et al.2019L540.601.00
Hendry et al.2018L3550.331.00
Tsao et al.2018L810.281.00
1%Ventana PD-L1 (SP263)28-8 LDTAdam et al.2018L410.701.00
Adam et al.2018L410.731.00
Adam et al.2018L410.761.00
Adam et al.2018L410.670.93
50%Ventana PD-L1 (SP263)28-8 LDTAdam et al.2018L411.000.81
Adam et al.2018L411.000.76
Adam et al.2018L410.820.67
Adam et al.2018L411.000.70
1%PD-L1 IHC 22C3 pharmDx28-8 LDTAdam et al.2018L410.661.00
Adam et al.2018L410.631.00
Adam et al.2018L410.681.00
Adam et al.2018L320.701.00
Adam et al.2018L320.671.00
Adam et al.2018L320.731.00

* L NSCLC, U UC

Table 3

Sensitivity and specificity of individual studies for which meta-analysis was not performed because of insufficient number of studies

TPS Cut-offGold Standard AssayCandidate AssayAuthorYearTumour TypeSample SizeSensitivitySpecificity
1%PD-L1 IHC 22C3 pharmDxSP142 LDTSakane et al.2018TC531.00 (0.90–1.00)0.53 (0.32–0.73)
Soo et al.2018NSCLC181.00 (0.77–1.00)0.00 (0.00–0.43)
Watanabe et al.2018M320.83 (0.61–0.94)0.86 (0.60–0.96)
SP263 LDTSakane et al.2018TC530.97 (0.85–1.0)0.63 (0.41–0.81)
Watanabe et al.2018M320.78 (0.55-0.91)0.93 (0.69–0.99)
73-10 AssayTsao et al.2018NSCLC811.00 (0.92–1.00)0.53 (0.37–0.68)
PD-L1 IHC 28-8 pharmDxSP142 LDTSakane et al.2018TC530.95 (0.84–0.99)0.67 (0.39–0.86)
Watanabe et al.2018M320.82 (0.59–0.94)0.80 (0.55–0.93)
SP263 LDTSakane et al.2018TC530.90 (0.78–0.96)0.75 (0.47–0.91)
Watanabe et al.2018M320.88 (0.66–0.97)1.00 (0.80–1.00)
73-10 AssayTsao et al.2018NSCLC810.96 (0.88–0.99)0.63 (0.44–0.79)
Ventana PD-L1 (SP263)SP142 LDTSoo et al.2018NSCLC181.00 (0.80–1.00)0.00 (0.00–0.56)
73-10 AssayTsao et al.2018NSCLC810.96 (0.87-0.99)0.55 (0.38-0.71)
22C3 LDTMunari et al.2018NSCLC1840.66 (0.55–0.76)0.99 (0.95–1.00)
50%PD-L1 IHC 22C3 pharmDxSP142 LDTSakane et al.2018TC531.00 (0.85–1.00)0.74 (0.56–0.86)
Soo et al.2018NSCLC181.00 (0.34–1.00)0.75 (0.51–0.90)
Watanabe et al.2018M320.67 (0.21–0.94)0.66 (0.47–0.45)
SP263 LDTSakane et al.2018TC530.82 (0.62–0.93)0.97 (0.84–0.99)
Watanabe et al.2018M320.67 (0.21–0.94)0.86 (0.69–0.70)
73-10 AssayTsao et al.2018NSCLC811.00 (0.80-1.00)0.82 (0.71–0.89)
Ventana PD-L1 (SP263)SP142 LDTSoo et al.2018NSCLC181.00 (0.44–1.00)0.80 (0.55–0.93)
22C3 LDTMunari et al.2018NSCLC1840.64 (0.45–0.80)0.99 (0.97–1.00)
73-10 AssayTsao et al.2018NSCLC810.94 (0.74–0.99)0.84 (0.73–0.91)

L NSCLC, M mesothelioma, U UC, TC Thymic Carcinoma

Sensitivity and specificity of individual studies for which meta-analysis was not performed because of non-converging data * L NSCLC, U UC Sensitivity and specificity of individual studies for which meta-analysis was not performed because of insufficient number of studies L NSCLC, M mesothelioma, U UC, TC Thymic Carcinoma

PD-L1 IHC 22C3 pharmDx as reference standard

The highest diagnostic accuracy was shown for well-designed 22C3 laboratory developed tests compared to PD-L1 IHC pharmDx 22C3. The sensitivity and specificity were both 100% in 8/9 assays for the 50% tumor proportion score cut-off point (Table 2). The results were almost identical, and only slightly less robust for the 1% cut-off (Fig. 1a, Table 2). Both PD-L1 IHC 28-8 pharmDx and Ventana PD-L1 (SP263) showed acceptable diagnostic accuracy for the 50% cut-off, but both had <90% specificity against the 1% tumor proportion score cut-off (Fig. 1b–e, Table 1).
Fig. 1

a 22C3 laboratory developed tests (candidate) vs. PD-L1 IHC pharmDx 22C3 (reference standard) for 1% tumor proportion score cut-off; b PD-L1 IHC pharmDx 28-8 (candidate) vs. PD-L1 IHC pharmDx 22C3 (reference standard) for 50% tumor proportion score cut-off; c Ventana PD-L1 (SP263) (candidate) vs. PD-L1 IHC pharmDx 22C3 (reference standard) for 50% tumor proportion score cut-off; d PD-L1 IHC pharmDx 28-8 (candidate) vs. PD-L1 IHC pharmDx 22C3 (reference standard) for 1% tumor proportion score cut-off; e Ventana PD-L1 (SP263) (candidate) vs. PD-L1 IHC pharmDx 22C3 (reference standard) for 1% tumor proportion score cut-off; f E1L3N laboratory developed tests (candidate) vs. PD-L1 IHC pharmDx 22C3 (reference standard) for 1% tumor proportion score cut-of; g Ventana PD-L1 (SP263) (candidate) vs. PD-L1 IHC pharmDx 28-8 (reference standard) for 1% tumor proportion score cut-off, and h PD-L1 IHC pharmDx 22C3 (candidate) vs. PD-L1 IHC pharmDx 28-8 (reference standard) for 1% tumor proportion score cut-off

a 22C3 laboratory developed tests (candidate) vs. PD-L1 IHC pharmDx 22C3 (reference standard) for 1% tumor proportion score cut-off; b PD-L1 IHC pharmDx 28-8 (candidate) vs. PD-L1 IHC pharmDx 22C3 (reference standard) for 50% tumor proportion score cut-off; c Ventana PD-L1 (SP263) (candidate) vs. PD-L1 IHC pharmDx 22C3 (reference standard) for 50% tumor proportion score cut-off; d PD-L1 IHC pharmDx 28-8 (candidate) vs. PD-L1 IHC pharmDx 22C3 (reference standard) for 1% tumor proportion score cut-off; e Ventana PD-L1 (SP263) (candidate) vs. PD-L1 IHC pharmDx 22C3 (reference standard) for 1% tumor proportion score cut-off; f E1L3N laboratory developed tests (candidate) vs. PD-L1 IHC pharmDx 22C3 (reference standard) for 1% tumor proportion score cut-of; g Ventana PD-L1 (SP263) (candidate) vs. PD-L1 IHC pharmDx 28-8 (reference standard) for 1% tumor proportion score cut-off, and h PD-L1 IHC pharmDx 22C3 (candidate) vs. PD-L1 IHC pharmDx 28-8 (reference standard) for 1% tumor proportion score cut-off No other candidate assays reached 90% sensitivity and specificity in the meta-analysis for either the 50% or the 1% tumor proportion score cut-off for PD-L1 IHC 22C3 pharmDx (Table 1 and Table 3). Although the overall performance of E1L3N laboratory developed tests in the meta-analysis was not good, E1L3N laboratory developed tests achieved very high sensitivity and specificity in 3 of 12 comparisons (Fig. 1f–h, Tables 1A and 1B) [18, 27].

PD-L1 IHC 28-8 pharmDx as reference standard

The highest results were achieved by the Ventana PD-L1 (SP263) assay; it had acceptable accuracy in the meta-analysis compared to PD-L1 IHC pharm Dx 28-8 at the 1% cut-off (6/12 tests were clinically acceptable) (Fig. 1h, Table 1). PD-L1 IHC 22C3 pharmDx did not reach ≥ 90% for both sensitivity and specificity in the meta-analysis when compared to PD-L1 IHC pharmDx 28-8 at the 1% cut-off, although 9/19 individual assay comparisons showed sensitivity and specificity of ≥ 90% (Fig. 1i, Table 1A and 1B).

Ventana PD-L1 (SP263) as reference standard

No candidate assays achieved required diagnostic accuracy for either the 1% or the 50% cut-off. Most candidate assays achieved acceptable specificity, but the sensitivity was too low for both cut-off points (Tables 1–3).

Discussion

The most dominant result of this meta-analysis is that properly designed laboratory developed tests that are performed in an individual immunohistochemistry laboratory (usually a reference laboratory or expert-led laboratory) and are developed for the same purpose as the relevant comparative reference method standard may perform essentially equally to the original Food and Drug Administration-approved assay, but also generally better than the Food and Drug Administration-approved companion diagnostics that were originally developed for different purposes. For example, to identify patients with non-small cell lung cancer for second line therapy with pembrolizumab where PD-L1 IHC pharmDx 22C3 is not available, the results of our study indicate that it is more likely that 22C3 or E1L3N well-developed, fit-for-purpose laboratory developed tests would identify the same patients as positive and/or negative as PD-L1 IHC pharmDx 22C3, rather than Ventana PD-L1 (SP263), Ventana PD-L1 (SP142), or PD-L1 IHC pharmDx 28-8, which were developed for different purposes [49-53]. The accuracy of laboratory developed tests varied in our meta-analysis. 22C3 laboratory developed tests achieved the best results, with both sensitivity and specificity of 100% in 8/9 studies. E1L3N also showed excellent results, but in only 3/12 comparisons. Its success in 3 separate comparisons illustrates that it is possible to develop an acceptable laboratory developed test with this clone and that this antibody can be optimized for clinical applications for which the PD-L1 IHC 22C3 pharmDx was developed. The successful applications of some of the laboratory developed tests reinforce the importance of considering the original purpose of the immunohistochemistry assay, a point emphasized in the ISIMM and IQN Path series of papers entitled “Evolution of Quality Assurance for Immunohistochemistry in the Era of Personalized Medicine” [54-57]. It should be pointed out that our meta-analysis indicates that excellent diagnostic accuracy by laboratory developed tests can be achieved in some laboratories where the laboratory developed tests that were included in this study were originally developed; it remains to be determined whether the same laboratory developed tests would perform the same if more widely tested in different laboratories with different operators using different equipment. External quality assurance including inter-laboratory comparisons, as well as proficiency testing demonstrated that as high as 20–30% or more of the participating laboratories may produce poor results with immunohistochemistry laboratory developed test protocols [58-62]. The success of laboratory developed tests depends on multiple parameters, including which test performance characteristics and which tissue tools may have been used for test development and validation [56, 57]. In the case of predictive PD-L1 immunohistochemistry assays, recognition and careful definition of the assay purpose according to the 3D approach (Disease, Drug, Diagnostic assay) must also be considered, along with proper selection of the comparative method for determination of diagnostic accuracy of the newly developed candidate test. Several studies have demonstrated that when laboratories follow this approach, they are able to produce excellent results [24, 32, 36–38]. Our study and previously published results do not imply generalizable analytical robustness of laboratory developed tests, whether de novo laboratory developed tests or “kit-derived laboratory developed tests” [6, 32]. When protocols for laboratory developed tests are shared between laboratories, it is essential that the adopting laboratory conducts initial technical validation, which would increase the likelihood of similar diagnostic accuracy [48, 56]. However, the purpose of predictive PD-L1 immunohistochemistry assays is not to demonstrate the best signal-to-noise ratio (“nice” and highly sensitive results), but to identify patients that are more likely to benefit from specific drug(s) as demonstrated in clinical trials. Therefore, consideration of this purpose and direct or indirect link with the clinical trial results is always required and it should be considered in test development, test validation, test maintenance, as well as in test performance comparison. As so far there are no tools to measure analytical sensitivity and specificity of immunohistochemistry assays; this presents a significant problem in assay development, methodology transfer, and daily monitoring of assay performance, as well as direct comparison of assay calibration. The lack of tools that could assess analytical sensitivity and specificity also hinders attempts of immunohistochemistry protocol standardization/harmonization for the PD-L1 assays; without such tools it is not possible to determine the desirable range of analytical sensitivity and specificity of relevance for diagnostic accuracy for any of the PD-L1 assays. This is one cause that we can identify as a potential source of the discrepancy between previously published works that suggested analytical interchangeability of the several Food and Drug Administration-approved PD-L1 assays, but did not necessarily lead to interchangeability based on calculated diagnostic accuracy as shown in our study. The Ventana PD-L1 (SP263) assay had very high diagnostic sensitivity against all other Food and Drug Administration-approved PD-L1 assays, but its diagnostic specificity was consequently lower. Although several of the studies included in this meta-analysis demonstrated substantial analytical similarity between PD-L1 IHC 22C3 pharmDx, PD-L1 IHC 28-8 pharmDx, and Ventana PD-L1 (SP263), our cumulative results suggest that the diagnostic sensitivity of these various assays (and indirectly their analytical sensitivity) is ordered as follows: PD-L1 IHC 22C3 pharmDx < PD-L1 IHC 28-8 pharmDx < Ventana PD-L1 (SP263). The results of this meta-analysis confirm previous observations that the Ventana PD-L1 (SP142) assay’s analytical sensitivity is significantly lower than that of the three other Food and Drug Administration-approved PD-L1 assays and that the diagnostic sensitivity of Ventana PD-L1 (SP142) against PD-L1 IHC 22C3 pharmDx, PD-L1 IHC 28-8 pharmDx, and Ventana PD-L1 (SP263) assays is prohibitively low for both the 1% and the 50% tumor proportion score in non-small cell lung cancer and other tumor models. Several investigators have evaluated the so-called “interchangeability” of PD-L1 immunohistochemistry assays. The term “interchangeability” has also been used widely by the pharmacological industry to designate drugs that have demonstrated the following characteristics: same amount of the same active ingredients, comparable pharmacokinetics, same clinically significant formulation characteristics, and to be administered in the same way as the drug prescribed [63]. Basically, interchangeable drugs have the same safety profile and therapeutic effectiveness, as demonstrated in clinical trials [64, 65]. To apply this term to an immunohistochemistry predictive assay, the manufacturer of the assay, be it industry for a companion/complementary diagnostic or a clinical immunohistochemistry laboratory for an laboratory developed test, would need to prove that the alternative assay will produce the same clinical outcomes. Since none of the assay comparisons were performed in the setting of a prospective clinical trial, this type of evidence is not available for PD-L1 immunohistochemistry assays and therefore, none can be deemed “interchangeable” with another in this same sense of the word. In addition, candidate assays and comparative assays cannot interchange their positions for the purpose of calculations without consequences [11]. If “interchangeability” would be defined as achieving ≥90% sensitivity and specificity for both the 1% and the 50% tumor proportion score cut-off points, none of the studies in this meta-analysis demonstrated “interchangeability” of the Food and Drug Administration-approved assays PD-L1 IHC 22C3 pharmDx, PD-L1 IHC 28-8 pharmDx, Ventana PD-L1 (SP142), or Ventana PD-L1 (SP263) for each other. Although they cannot be designated as “interchangeable”, the diagnostic accuracy of assays for a specific clinical purpose may be compared. In this manner, the comparison indirectly generates results that can be used to justify clinical usage of assays other than those included in the clinical trials. We employed ≥90% diagnostic sensitivity and ≥90% diagnostic specificity because these values are often used in other settings, including performance of immunohistochemistry assays [66-68]. While it is reasonable that a candidate assay should have at least 90% diagnostic sensitivity, it is unclear whether the required diagnostic specificity should be at the same level, or whether lower specificity could also be clinically acceptable. From the perspective of patient safety, lower diagnostic specificity could potentially be acceptable for those indications/purposes where clinical trials demonstrated that progression free survival, overall survival, and adverse effects in patients with PD-L1-negative tumors treated by immunotherapy are at least comparable if not better to that of conventional chemotherapy. The strengths of this meta-analysis are the focus on diagnostic accuracy, fit-for-purpose approach, and the access to previously unpublished data from a large number of studies, which all resulted in pooled PD-L1 assay comparison in a way that has not been done before. The most significant limitation is that this is a meta-analysis of test comparisons where designated reference standards are other tests rather than clinical outcomes. However, to complete a meta-analysis with clinical outcomes may not be possible for many years, if ever. Other limitations of this meta-analysis are that only two cut-off points were assessed (1% and 50%), no assessment for readout that includes inflammatory cells was included, the impact of pathologists’ readout as potential source of variation between the studies was not assessed, and it is somewhat uncertain how the results apply to tumors other than non-small cell lung cancer due to the smaller number of such studies.

Conclusions

The complexity of the PD-L1 immunohistochemistry testing cannot be safely simplified without consideration of the original test purpose. Determination of the diagnostic accuracy and indirect clinical validation of a candidate assay can be achieved by comparing the results of that assay to a previously designated reference standard assay, when direct access to clinical trial data or clinical outcomes is not possible.

Our meta-analysis indicates that

1) Well-designed, fit-for-purpose PD-L1 laboratory developed test candidate assays may achieve higher accuracy than PD-L1 Food and Drug Administration-approved kits that were designed and approved for a different purpose, when both are compared to an appropriate designated reference standard; 2) More candidate assays achieved ≥ 90% sensitivity and specificity for 50% tumor proportion score cut-off than for 1% tumor proportion score cut-off; 3) The overall diagnostic sensitivity and specificity analyses indicates that the relative analytical sensitivities of the Food and Drug Administration-approved kits for tumor cell scoring, most specifically in non-small cell lung cancer, are as follows: Ventana PD-L1 (SP142) << PD-L1 IHC 22C3 pharmDx < PD-L1 IHC 28-8 pharmDx < Ventana PD-L1 (SP263). Conflict of Interest (Appendix A) Meta-Analysis Methodology Supplementary Files Table 1 Supplemetary Figures 2 - 13 Funnel Plots
  50 in total

1.  GRADE guidelines: 3. Rating the quality of evidence.

Authors:  Howard Balshem; Mark Helfand; Holger J Schünemann; Andrew D Oxman; Regina Kunz; Jan Brozek; Gunn E Vist; Yngve Falck-Ytter; Joerg Meerpohl; Susan Norris; Gordon H Guyatt
Journal:  J Clin Epidemiol       Date:  2011-01-05       Impact factor: 6.437

2.  PD-L1 Testing in Cancer: Challenges in Companion Diagnostic Development.

Authors:  Aaron R Hansen; Lillian L Siu
Journal:  JAMA Oncol       Date:  2016-01       Impact factor: 31.777

3.  Pembrolizumab for the treatment of non-small-cell lung cancer.

Authors:  Edward B Garon; Naiyer A Rizvi; Rina Hui; Natasha Leighl; Ani S Balmanoukian; Joseph Paul Eder; Amita Patnaik; Charu Aggarwal; Matthew Gubens; Leora Horn; Enric Carcereny; Myung-Ju Ahn; Enriqueta Felip; Jong-Seok Lee; Matthew D Hellmann; Omid Hamid; Jonathan W Goldman; Jean-Charles Soria; Marisa Dolled-Filhart; Ruth Z Rutledge; Jin Zhang; Jared K Lunceford; Reshma Rangwala; Gregory M Lubiniecki; Charlotte Roach; Kenneth Emancipator; Leena Gandhi
Journal:  N Engl J Med       Date:  2015-04-19       Impact factor: 91.245

4.  Nivolumab in metastatic urothelial carcinoma after platinum therapy (CheckMate 275): a multicentre, single-arm, phase 2 trial.

Authors:  Padmanee Sharma; Margitta Retz; Arlene Siefker-Radtke; Ari Baron; Andrea Necchi; Jens Bedke; Elizabeth R Plimack; Daniel Vaena; Marc-Oliver Grimm; Sergio Bracarda; José Ángel Arranz; Sumanta Pal; Chikara Ohyama; Abdel Saci; Xiaotao Qu; Alexandre Lambert; Suba Krishnan; Alex Azrilevich; Matthew D Galsky
Journal:  Lancet Oncol       Date:  2017-01-26       Impact factor: 41.316

5.  Programmed Death Ligand-1 Immunohistochemistry--A New Challenge for Pathologists: A Perspective From Members of the Pulmonary Pathology Society.

Authors:  Lynette M Sholl; Dara L Aisner; Timothy Craig Allen; Mary Beth Beasley; Alain C Borczuk; Philip T Cagle; Vera Capelozzi; Sanja Dacic; Lida Hariri; Keith M Kerr; Sylvie Lantuejoul; Mari Mino-Kenudson; Kirtee Raparia; Natasha Rekhtman; Sinchita Roy-Chowdhuri; Eric Thunnissen; Ming Sound Tsao; Yasushi Yatabe
Journal:  Arch Pathol Lab Med       Date:  2016-01-18       Impact factor: 5.534

6.  Nivolumab versus Docetaxel in Advanced Nonsquamous Non-Small-Cell Lung Cancer.

Authors:  Hossein Borghaei; Luis Paz-Ares; Leora Horn; David R Spigel; Martin Steins; Neal E Ready; Laura Q Chow; Everett E Vokes; Enriqueta Felip; Esther Holgado; Fabrice Barlesi; Martin Kohlhäufl; Oscar Arrieta; Marco Angelo Burgio; Jérôme Fayette; Hervé Lena; Elena Poddubskaya; David E Gerber; Scott N Gettinger; Charles M Rudin; Naiyer Rizvi; Lucio Crinò; George R Blumenschein; Scott J Antonia; Cécile Dorange; Christopher T Harbison; Friedrich Graf Finckenstein; Julie R Brahmer
Journal:  N Engl J Med       Date:  2015-09-27       Impact factor: 91.245

7.  Durvalumab after Chemoradiotherapy in Stage III Non-Small-Cell Lung Cancer.

Authors:  Scott J Antonia; Augusto Villegas; Davey Daniel; David Vicente; Shuji Murakami; Rina Hui; Takashi Yokoi; Alberto Chiappori; Ki H Lee; Maike de Wit; Byoung C Cho; Maryam Bourhaba; Xavier Quantin; Takaaki Tokito; Tarek Mekhail; David Planchard; Young-Chul Kim; Christos S Karapetis; Sandrine Hiret; Gyula Ostoros; Kaoru Kubota; Jhanelle E Gray; Luis Paz-Ares; Javier de Castro Carpeño; Catherine Wadsworth; Giovanni Melillo; Haiyi Jiang; Yifan Huang; Phillip A Dennis; Mustafa Özgüroğlu
Journal:  N Engl J Med       Date:  2017-09-08       Impact factor: 91.245

Review 8.  Programmed Death-Ligand 1 Immunohistochemistry Testing: A Review of Analytical Assays and Clinical Implementation in Non-Small-Cell Lung Cancer.

Authors:  Reinhard Büttner; John R Gosney; Birgit Guldhammer Skov; Julien Adam; Noriko Motoi; Kenneth J Bloom; Manfred Dietel; John W Longshore; Fernando López-Ríos; Frédérique Penault-Llorca; Giuseppe Viale; Andrew C Wotherspoon; Keith M Kerr; Ming-Sound Tsao
Journal:  J Clin Oncol       Date:  2017-10-20       Impact factor: 44.544

9.  QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies.

Authors:  Penny F Whiting; Anne W S Rutjes; Marie E Westwood; Susan Mallett; Jonathan J Deeks; Johannes B Reitsma; Mariska M G Leeflang; Jonathan A C Sterne; Patrick M M Bossuyt
Journal:  Ann Intern Med       Date:  2011-10-18       Impact factor: 25.391

Review 10.  PD-L1 diagnostic tests: a systematic literature review of scoring algorithms and test-validation metrics.

Authors:  Margarita Udall; Maria Rizzo; Juliet Kenny; Jim Doherty; SueAnn Dahm; Paul Robbins; Eric Faulkner
Journal:  Diagn Pathol       Date:  2018-02-09       Impact factor: 2.644

View more
  43 in total

1.  SITC cancer immunotherapy resource document: a compass in the land of biomarker discovery.

Authors:  Siwen Hu-Lieskovan; Srabani Bhaumik; Kavita Dhodapkar; Jean-Charles J B Grivel; Sumati Gupta; Brent A Hanks; Sylvia Janetzki; Thomas O Kleen; Yoshinobu Koguchi; Amanda W Lund; Cristina Maccalli; Yolanda D Mahnke; Ruslan D Novosiadly; Senthamil R Selvan; Tasha Sims; Yingdong Zhao; Holden T Maecker
Journal:  J Immunother Cancer       Date:  2020-12       Impact factor: 13.751

2.  Real-world outcomes of anti-PD1 antibodies in platinum-refractory, PD-L1-positive recurrent and/or metastatic non-small cell lung cancer, and its potential practical predictors: first report from Korean Cancer Study Group LU19-05.

Authors:  Hyonggin An; Jin Hyoung Kang; Ji Hyun Park; Gun Lyung You; Myung-Ju Ahn; Sang-We Kim; Min Hee Hong; Ji-Youn Han; Chan-Young Ock; Jong-Seok Lee; In Jae Oh; Shin Yup Lee; Cheol Hyeon Kim; Young Joo Min; Yoon Hee Choi; Jeong-Seon Ryu; Sun Hyo Park; Hee Kyung Ahn; Byoung-Yong Shim; Ki Hyeong Lee; Sung Yong Lee; Jin-Soo Kim; Jiun Yi; Su Kyung Choi
Journal:  J Cancer Res Clin Oncol       Date:  2021-02-01       Impact factor: 4.553

3.  A Phase II Trial of Albumin-Bound Paclitaxel and Gemcitabine in Patients with Newly Diagnosed Stage IV Squamous Cell Lung Cancers.

Authors:  Paul K Paik; Rachel K Kim; Linda Ahn; Andrew J Plodkowski; Ai Ni; Mark T A Donoghue; Philip Jonsson; Miguel Villalona-Calero; Kenneth Ng; Daniel McFarland; John J Fiore; Afsheen Iqbal; Juliana Eng; Mark G Kris; Charles M Rudin
Journal:  Clin Cancer Res       Date:  2020-01-09       Impact factor: 12.531

4.  Routine Molecular Pathology Diagnostics in Precision Oncology.

Authors:  Carina Wenzel; Sylvia Herold; Martin Wermke; Daniela E Aust; Gustavo B Baretton
Journal:  Dtsch Arztebl Int       Date:  2021-04-16       Impact factor: 5.594

5.  Society for Immunotherapy of Cancer (SITC) clinical practice guideline on immunotherapy for the treatment of breast cancer.

Authors:  Leisha A Emens; Sylvia Adams; Ashley Cimino-Mathews; Mary L Disis; Margaret E Gatti-Mays; Alice Y Ho; Kevin Kalinsky; Heather L McArthur; Elizabeth A Mittendorf; Rita Nanda; David B Page; Hope S Rugo; Krista M Rubin; Hatem Soliman; Patricia A Spears; Sara M Tolaney; Jennifer K Litton
Journal:  J Immunother Cancer       Date:  2021-08       Impact factor: 13.751

Review 6.  PD-L1 as a biomarker of response to immune-checkpoint inhibitors.

Authors:  Deborah Blythe Doroshow; Sheena Bhalla; Mary Beth Beasley; Lynette M Sholl; Keith M Kerr; Sacha Gnjatic; Ignacio I Wistuba; David L Rimm; Ming Sound Tsao; Fred R Hirsch
Journal:  Nat Rev Clin Oncol       Date:  2021-02-12       Impact factor: 66.675

Review 7.  Programmed cell death-ligand 1 assessment in urothelial carcinoma: prospect and limitation.

Authors:  Kyu Sang Lee; Gheeyoung Choe
Journal:  J Pathol Transl Med       Date:  2021-04-07

8.  Plasma complement C7 as a target in non-small cell lung cancer patients to implement 3P medicine strategies.

Authors:  Jae Gwang Park; Beom Kyu Choi; Youngjoo Lee; Eun Jung Jang; Sang Myung Woo; Jun Hwa Lee; Kyung-Hee Kim; Heeyoun Hwang; Wonyoung Choi; Se-Hoon Lee; Byong Chul Yoo
Journal:  EPMA J       Date:  2021-11-25       Impact factor: 6.543

Review 9.  Novel uses of immunohistochemistry in breast pathology: interpretation and pitfalls.

Authors:  Ashley Cimino-Mathews
Journal:  Mod Pathol       Date:  2020-10-27       Impact factor: 7.842

10.  Integration of Spatial PD-L1 Expression with the Tumor Immune Microenvironment Outperforms Standard PD-L1 Scoring in Outcome Prediction of Urothelial Cancer Patients.

Authors:  Veronika Weyerer; Pamela L Strissel; Reiner Strick; Danijel Sikic; Carol I Geppert; Simone Bertz; Fabienne Lange; Helge Taubert; Sven Wach; Johannes Breyer; Christian Bolenz; Philipp Erben; Bernd J Schmitz-Draeger; Bernd Wullich; Arndt Hartmann; Markus Eckstein
Journal:  Cancers (Basel)       Date:  2021-05-12       Impact factor: 6.639

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.