Literature DB >> 32216035

A critical review of graphics for subgroup analyses in clinical trials.

Nicolás M Ballarini¹, Yi-Da Chiu^2,3, Franz König¹, Martin Posch¹, Thomas Jaki⁴.

Abstract

Subgroup analyses are a routine part of clinical trials to investigate whether treatment effects are homogeneous across the study population. Graphical approaches play a key role in subgroup analyses to visualise effect sizes of subgroups, to aid the identification of groups that respond differentially, and to communicate the results to a wider audience. Many existing approaches do not capture the core information and are prone to lead to a misinterpretation of the subgroup effects. In this work, we critically appraise existing visualisation techniques, propose useful extensions to increase their utility and attempt to develop an effective visualisation approach. We focus on forest plots, UpSet plots, Galbraith plots, subpopulation treatment effect pattern plot, and contour plots, and comment on other approaches whose utility is more limited. We illustrate the methods using data from a prostate cancer study.

Entities: Chemical

Keywords: Galbraith plot; STEPP; UpSet plot; contour plot; data visualisation; exploratory data analysis; forest plot; treatment effect heterogeneity

Mesh：

Year: 2020 PMID： 32216035 PMCID： PMC8647927 DOI： 10.1002/pst.2012

Source DB: PubMed Journal: Pharm Stat ISSN： 1539-1604 Impact factor: 1.894

INTRODUCTION

Investigating target populations that potentially benefit from an innovative intervention is essential in clinical trials. Even if efficacy is established in the overall population, a complete benefit/risk assessment of subgroups should be undertaken before deciding whether the treatment is administered to the whole population or targeted to specific subgroups.1 Such investigations pose numerous challenges such as recruiting patients with diverse baseline characteristics, which may create a large number of subgroups. The presence of promising results in subgroup analyses can be attributed to small sample sizes or to the fact that many potential subgroups are explored, which affects the credibility of the findings. Subgroup analyses might be prospective or post‐hoc in different settings of clinical trials. Their primary purpose could be to establish efficacy claims, subgroup discovery and/or consistency assessments across subgroups. Many researchers have proposed novel analysis approaches and trial designs for different types of subgroup analysis.2, 3, 4 Subgroups have further received extensive attention in recent clinical research for the development of stratified medicine. Visualisation techniques, when properly used, are powerful tools. It is argued that graphics allow a more direct interpretation of results than tables.5 There is extensive literature on principles for good graphics in general6, 7, 8, 9, 10, 11, 12 particularly in visualisation of healthcare data.13, 14, 15 It is also true that good graphics require careful crafting16 and there is scope to improve when it comes to figures found in clinical trial reports.17, 18 Graphical approaches are routinely employed in subgroup analysis, typically for describing treatment effect sizes of subgroups. Such visualisations encapsulate subgroup information and boost the clinical decision‐making process. However, current literature does not adequately provide solutions to producing effective graphics in subgroup analyses. Existing approaches still have inherent drawbacks and their use may lead to misinterpretations of subgroup effect sizes.2 In this article, we critically evaluate and refine effective visualisation approaches for subgroup analysis. Our considerations apply mainly to exploratory settings. Some of these visualisations have previously been proposed for subgroup analysis and were refined in this work. There are existing alternative techniques primarily developed for other applications which we have applied and/or extended to provide visual insight of subgroup information. The remainder of the article is structured as follows. In Section 2 we describe: the framework for assessment, the dataset we use for illustration, and the graphical approaches for displaying subgroup information. We focus on graphics that allow a direct comparison of subgroup treatment effects. We summarise the findings in the case study and the assessments and features of all graphical approaches in Section 3. Section 4 provides a conclusion with final remarks.

GRAPHICAL APPROACHES TO SUBGROUP PROBLEMS

Framework to assess the properties of the graphical displays

It is fundamental that graphics in subgroup analysis display treatment effects for the subgroups under considerations. There are several other desirable characteristics for graphical approaches as initial subgroup analysis tools. Displaying sample sizes and uncertainty measures underpins the credibility of promising and adverse findings within subgroups. While many subgroup analysis techniques consider subgroups that are defined based on each baseline factor separately (univariate subgroups), it is also important to reveal information on those defined based on multiple factors (multivariate subgroups). For example, instead of looking at the subgroups defined by gender (male/female) and bone metastasis (yes/no) separately, it may be of interest to look at the intersection of the marginal subgroups: male with bone metastasis, male without bone metastasis, female with bone metastasis, and female without bone metastasis. These characteristics can certainly constitute sensible criteria for assessment. Our framework to assess the properties of the graphical displays consists of the criteria outlined in Table 1.

Table 1

Criteria to assess the properties of the graphical displays

Criteria	Label	Description
C1	Effect size	Displays effect sizes for subgroups
C2	Uncertainty	Provides confidence intervals or standard errors of the treatment effect estimates
C3	Sample size	Exhibits subgroup sample sizes
C4	Intersections	Shows effect sizes for multivariate subgroups
C5	Many covariates	Applicable to a large number of subgroup‐defining covariates

Criteria to assess the properties of the graphical displays Each graphical approach is judged according to whether it meets the criteria set out. Even if a criterion is met, the information may be represented or encoded differently. For example, some graphics show the treatment effects in the subgroups using a colour scale while others represent them with the position of a point along a common scale. We discuss different levels of information in each of the graphics.

Case study: The prostate cancer dataset

To illustrate the different graphical approaches, we use data from a prostate carcinoma clinical trial19 which is available on the web20 and has previously been used to demonstrate subgroup selection methods.21 The trial included 506 subjects that were randomised to either a placebo group or one of three dose levels of diethylstilbestrol. In line with previous work, we combine the placebo and the lowest dose level of diethylstilbestrol to give the control arm, and the higher doses to give the experimental arm. Only 475 subjects with complete data are used in our illustration. We aim to describe the estimates for treatment effect across the different subgroups of patients. To illustrate the graphics, we consider six pre‐treatment covariates, four of which are binary and two continuous: existence of bone metastasis (bm: 0, no; 1, yes), disease stage (3 or 4), performance rating (pf: 0, normal; 1, limitation of activity), history of cardiovascular events (hx 0, no; 1, yes), age, and weight index (wt: weight in kg − height in cm + 200). The considered endpoint in this analysis is death from all causes combined, and the log‐hazard ratio for treatment vs control is used as the treatment effect measure.

Visualisation methods

In this subsection, we present the graphical approaches that are best suited for subgroup analysis based on our review. The first three approaches, Galbraith, forest, and UpSet plots, apply to both binary and categorical subgroup‐defining covariates. We also include two methods, subpopulation treatment effect pattern plot (STEPP) and contour plots, that allow exploring changes in the treatment effect over one or two continuous variables, respectively, as it is suggested in the current EMA guideline.1 These five approaches represent or provide a measure of the treatment effect and therefore allow direct comparison across subgroups. Additional graphics that we found less practical are deferred to Appendix available online as Supporting Information while other approaches that may be used to describe subgroup composition but do not fulfil the criterion C1 (effect size) are presented in Supporting Information. In most of the graphics, for simplicity, the treatment effect is estimated by merely partitioning the dataset and using only subjects from the considered subgroups. We acknowledge that there are more advanced approaches that make more efficient use of the data,1 but these are not required to fulfil the purposes of this article (see also Reference 4, 22, and 23). The graphics evaluated in this article can be used to display the treatment effect estimates resulting from such techniques. All graphics are created using the R statistical software24 and the code is publicly available as an R package for reproducibility.25 In most of the cases, we draw the plots using functions from the grid and graphics packages which are part of the base R language. For some of the plots, we use additional packages that are cited in each section accordingly. Although we acknowledge that the choice of colours is an essential and challenging task when producing graphics, we do not discuss this topic in our work as it is discussed elsewhere.26, 27 Several of the considered plots make use of colour coding to represent the magnitude of the treatment effect across subgroups, for which we use a divergent colour palette generated by the colorspace R package.28 We follow Tufte's principles29 to enhance graphical integrity. This is particularly relevant when we depict sample sizes with two‐dimensional areas/shapes which are proportional to one‐dimensional sample sizes. Numerical quantities are then properly represented and comparisons can be made accurately. Additionally, we take into account research on graphical perception30 to judge the graphics.

Forest plot

Although forest plots are a common graphic used in meta‐analysis,31 they are also extensively used for subgroup analysis.32, 33 Figure 1 shows its application for the prostate cancer dataset considering the four binary covariates. The middle panel displays the subgroup treatment effect estimates with their confidence intervals. The squares in the centre of each error bar are proportional to the subgroup sample sizes. A vertical line at the overall treatment effect level is added to facilitate seeing if a subgroup confidence interval differs significantly from the overall effect.32 Additional information in a table format are usually included to provide the magnitude of the estimates. The text in the left panel shows the estimates of the treatment effects, lower/upper bounds of the 95% confidence intervals and subgroup sample sizes (further divided into treatment and control arms). When using continuous endpoints, it is appropriate to display the mean response for each treatment arm in an additional panel. In our implementation for a survival endpoint, we include the Kaplan‐Meier estimate for each subgroup. The summary statistics in the left panel and the survival curves on the right may be dropped if additional space is required. The Kaplan‐Meier estimates are drawn with the ggplot2 package.34

Figure 1

Forest plot for subgroups defined by performance (pf), stage, history of cardiovascular events (hx), and existence of bone metastasis (bm). Effect sizes in terms of the log‐hazard ratio and associated treatment and control group Kaplan‐Meier curves are displayed Forest plots are popular because they are simple and effective. In the main panel, they allow a direct comparison of the treatment effect estimates with low cognitive effort. According to our assessment, forest plots meet C1 and C2 displaying treatment effects and confidence intervals. Criteria C3 and C5 are also met as the subgroup sample sizes are depicted through the area of the treatment effect and many subgroup‐defining covariates can be easily displayed. A downside of forest plots is that as subgroup intersections (C4) are not shown. In Figure 1, it is quite clear that the subgroup defined by a positive outcome for bone metastasis is the subgroup with the largest benefit from the treatment since the log‐hazard ratio is negative. Interestingly, its upper confidence interval does not cover the average treatment effect, therefore suggesting treatment effect heterogeneity. The Kaplan‐Meier curves also allow to rapidly recognise the differential survival pattern for the subgroup with bone metastasis: patients with bone metastasis in the control group have shorter survival when given the control treatment, while those in the treatment group have a survival pattern that is similar to patients without bone metastasis. For the rest of the subgroups, their treatment effect estimates are closer to the estimate in the overall population. While we observe a positive log‐hazard ratio for some subgroups suggesting the experimental treatment is worse than control, all their confidence intervals cover the average treatment effect, which implies that no treatment effect heterogeneity is present.

UpSet plot

UpSet plots are a novel visualisation technique for the quantitative analysis of sets and their intersections.35 It was proposed to overcome the restriction to a small number of sets of Venn diagrams. In Figure 2, we use the UpSetR R package36 to create the plot with four binary subgroup‐defining covariates. The sizes of the univariate subgroups for these covariates are shown in the horizontal bar plot in the bottom‐left corner of the figure. The matrix layout on the bottom allows visualising the composition of the subgroup by showing which sets are intersected. The main bar plot displays the sizes of the subgroups that are defined by the respective intersections. For example, the first and tallest bar indicates there are 115 subjects with normal performance rating (pf = 0), no existence of bone metastasis (bm = 0), disease stage 3, and history of cardiovascular events (hx = 1). Moreover, we display the number of subjects in each treatment arm for each subset.

Figure 2

Upset plot displaying the subgroups formed by the intersection of all binary subgroup‐defining covariates

Upset plot displaying the subgroups formed by the intersection of all binary subgroup‐defining covariates We extend the UpSetR R package to display effect sizes in an extra panel (Figure 3). While the log‐hazard ratio and its confidence interval for each subgroup are shown as in a forest plot, the UpSet plot provides the advantage of displaying intersections of sets. If one were to use a statistical model with treatment‐by‐covariate interactions to derive the treatment effect estimates, then each row would correspond to a linear combination of the coefficients in the model.

Figure 3

Improved UpSet plot for subgroups defined by performance (pf), bone metastasis (bm), and history of cardiovascular events (hx). The panel on the left (matrix) displays how the subgroups are formed by assigning a “+” if the variable is equal to 1 and a “−” if the variable is equal to 0. The bar plot on top of the matrix panel indicates the marginal set sizes in relation to the total sample size, with the black region corresponding to the 1 or “yes” category and the white region corresponding to the 0 or “no” category. Treatment effect sizes and their confidence intervals are displayed in the panel in the middle and the subgroup sizes in the horizontal bar plot on the right Our extension of the UpSet plot also allows displaying lower level intersections as compared to the original UpSet proposal. We implement a new icon for the matrix panel: a “+” symbol if a variable is equal to 1 or “yes,” a “−” if a variable is equal to 0 or “no,” and empty if this variable is not considered for the subgroup definition. For example, the first bar of the plot corresponds to the overall population (no subgroup division), which has a size of 475. The second bar with a size of 428 corresponds to the subgroup of normal performance rating (pf = 0), irrespective of the values of the other two variables. Since the number of subgroups to consider increases dramatically in this modification (3 subgroups when considering p binary covariates), only three covariates are used. One could include more covariates and filter the number of subgroups according to different criteria, such as total subgroup sample size or sample size per treatment. Finally, the bar plot on top of the matrix panel indicates the marginal subgroup sizes with the black region corresponding to the 1 or “yes” category and the white region corresponding to the 0 or “no” category. The UpSet plot loses the simplicity observed in forest plots and requires the beholder to be familiar with the graphical approach before drawing conclusions. Nevertheless, the UpSet plot has some advantages. Effect sizes (C1) and confidence intervals (C2) are displayed as in a forest plot and many covariates (C5) can also be used. Compared to a forest plot, subgroup sample sizes (C3) are displayed in a panel as a bar plot. This is a more effective way to display the information in contrast to the proportional areas in the forest plot. Another advantage is that the UpSet plot shows subgroup intersections (C4) and allows inferring relations among the subgroups. In our example, we order the subgroups in terms of their sizes, but it is also possible to arrange the subgroups according to their effect sizes or the number of subgroup‐defining covariates involved in their composition. As the overall treatment effect and its confidence interval are also included, it allows to compare treatment effects and check for treatment effect heterogeneity. However, unlike a forest plot, it does not show the mean response for treatment and control arms in each subgroup.

Galbraith plot

A Galbraith plot37, 38 is an alternative to a forest plot for examining heterogeneity among studies or subgroups in a meta‐analysis. The variant that is shown in Figure 4 exhibits the estimation of treatment effect sizes for K = 8 subgroups defined by the four binary covariates. The xy‐coordinates correspond to the points: where is the treatment effect estimate in the full population and is the treatment effect estimate in subgroup i, . The grey band can be used to detect effect heterogeneity. Points outside the band show larger than expected heterogeneity. The slope of the line from the origin through each subgroup point corresponds to the effect size estimate of the corresponding subgroup. An additional radial axis is drawn to depict the subgroup effect sizes which are represented with the red tick marks. The central line at y = 0 points to the average treatment effect for the full population. This plot was drawn using the ggplot2 R package together with ggrepel to avoid overlapping labels.

Figure 4

Galbraith plot for subgroups defined by existence of bone metastasis (bm), history of cardiovascular events (hx), stage, and performance rating (pf)

Galbraith plot for subgroups defined by existence of bone metastasis (bm), history of cardiovascular events (hx), stage, and performance rating (pf) We note that, as is itself a random variable, it might better to consider its variance. This can be achieved by considering the xy‐coordinates: The resulting plot is given in Supporting Information. The drawback of this modification is that the x‐axis does no longer represent the standard error of the treatment effect estimates. The result of the graphical assessment of Galbraith plots is satisfactory, since it displays effect sizes (C1), standard deviations (C2), and a large number of subgroup‐defining covariates can undoubtedly be used (C5). On the other hand, this plot does not display sample sizes (C3) nor intersections (C4). Although Galbraith plots might require more effort to be explained and understood, these plots can certainly handle a large number of subgroup covariates, perhaps better than any of the other considered graphics. In this case, special care needs to be paid to the labels of subgroups and the location of red tick marks as they may not be distinguishable. In terms of our example, we conclude, just as in the forest plot, that treatment effect heterogeneity may be present in the subgroup of patients with bone metastasis since its point is immediately visible outside the grey band.

Subpopulation treatment effect pattern plot

The STEPP39, 40 gained popularity in breast cancer recently. It is a non‐parametric method mainly for examining whether treatment‐covariate interactions exist. In Figure 5, we adopted the slide‐window fashion of STEPP to represent the estimation of treatment effect size (log‐hazard ratio) in overlapping subgroups defined by age. To do so, we form subgroups with sample sizes of around with an overlap of subjects with immediately neighbouring subgroups. The band bounded by the blue dashed lines is constructed for 95% simultaneous confidence interval. The other band bounded by the orange dashed lines is built based on individual 95% CI (without multiplicity adjustment). The red line is formed by connecting the point estimates of treatment effect (log‐hazard ratio) for all formed subgroups. The green line represents the log‐hazard ratio estimate for the full patient population. It is worth noting that the point estimates are positioned at the mean value of age for each subgroup for the x‐axis. If the green line does not lie in the region formed by simultaneous confidence intervals, it reveals that interaction may exist.

Figure 5

STEPP plot of overlapping subgroups defined by age. Each subgroup has a sample size of around and is controlled to have about subjects overlapping with the neighboring subgroups. STEPP, subpopulation treatment effect pattern plot In the original publication,40 the points were placed equidistantly along the x‐axis annotating the median values of the variable for each subgroup as reference. An illustration of this alternative plot is given in Supporting Information. We believe it is better to use the proper scale to reflect the mean (or median) values of the variable used to define subgroups. This helps indicate whether the values cover a small or large range of the variable of interest. It is a quite common problem in subgroup analysis to define subgroups based on continuous biomarkers. Since it is advised against using arbitrary cutoff points in initial subgroup investigations, STEPP plots are a good way to characterise changes of the estimated treatment effect over the range of the considered continuous covariate. This is the suggestion from the current EMA Guideline on the investigation of subgroups in confirmatory clinical trials.1 The STEPP approach satisfies C1 displaying effect sizes and C2 for displaying confidence intervals. Here, the subgroup sample sizes (C3) are adopted by design and only annotated in the figure title but are not represented graphically. This plot only considers one continuous covariate and therefore, C4 (intersections) cannot be met. The plot does show intersections of contiguous subgroups, where the total number of subgroups depends on the sample size of subgroups and the overlap proportions. In some situations, it might not be clear how to determine the subgroup sizes or overlap and sensitivity analyses might need to be conducted for different configurations. The analysis results may further be compared with the results when using fractional polynomials41, 42 or non‐parametric methods such as Gaussian processes.43 In Figure 5, we observe that the treatment effect for subgroups defined by age fluctuates closely around the overall treatment effect. When approaching the ends of the range of the covariate the estimate of the log‐hazard ratio departs from the estimate for the full population although the confidence intervals for the subgroup treatment effects still cover the overall effect. This graph may be particularly useful to derive subgroups from a continuous variable.

Contour plot

While STEPP considers only one continuous biomarker, a contour plot could be regarded as an extension to explore continuous changes in two continuous biomarkers. We propose two different implementations of contour plots for the treatment effects across age and weight. In Figure 6A, subgroups of sample sizes N 11 are formed by using a horizontal sliding window across the values of age with an overlap of N 12 subjects. Subsequently, each subgroup is further divided into smaller subgroups of sample sizes, N 21, using a vertical sliding window across the values of weight with an overlap of N 22. Sample sizes and overlap to form subgroups are adopted by design based on sensible judgement. For example, subgroups should have a considerable sample size to ensure that patients in both treatment and control arms are represented. For each formed subgroup, we then calculate the log‐hazard ratio for treatment vs control. The contour areas are obtained through a bivariate interpolation and smooth surface fitting (LOESS) for irregularly distributed data points over the range of values from the subjects under study. A divergent colour scale is used for the effect sizes. A limitation of this approach is that there may be regions of the covariate space in which the treatment effect estimates are not reliable due to small sample sizes or no data points.

Figure 6

Contour plot of treatment effect in terms of the log‐hazard ratio over the plane of age and weight. A, Contour lines are drawn by forming subgroups with neighboring subjects, calculating the treatment effect for subgroups and interpolating the results using loess. N 11 stands for the sample size of a marginal subgroup defined by a range of age, N 12 is the overlap size of the immediate marginal subgroups on age, N 21 is the sample size of the subset of a marginal subgroup on age but further defined by a range of weight, and N 22 is the overlap size of the immediate subgroups (which are the subset of a marginal subgroup on age) on weight. B, Contour lines are drawn by fitting a local regression at each point of the grid, using subjects weights according to their distance to the point of the grid. Points with few subjects in the vicinity of the grid point were left blank We also propose using local regression techniques to calculate the treatment effect at each coordinate. In Figure 6B, a weighted Cox proportional‐hazards model is fitted at each combination of weight and age (using a step of 1 unit). A normal kernel with the centre at the coordinate values under consideration is used to assign weights to each subject. If there are less than 20 subjects within two standard deviations, the effect size is not calculated and the area is left blank. This helps to avoid extrapolating the results to areas in which we do not have enough information. Contour plots match criterion C1 since effect sizes are represented through a colour scale which is one of the least accurate ways to encode information.30 This is because for a particular coordinate in the plot, it might be hard to decipher what is the precise value of the treatment effect. This plot helps to uncover patterns in specific regions of continuous covariates that might not be visible otherwise. Contour plots also meet C4 as the intersection of two subgroup‐defining covariates are used. The uncertainty of the treatment effect estimates (C2) and sample sizes (C3) are not represented in the graphic which is a significant drawback. Contour plots only consider two covariates. Contour plots are particularly useful when there are enough subjects well distributed over the entire range of the covariates of interest. The interpolated treatment effect sizes may be unreliable in regions where there are no data points. In situations where the values of two covariates are sparsely distributed over the region, it can be unclear how smooth the interpolated surface should be. Note that it is also possible to use other local regression algorithms to calculate the treatment effect at each coordinate or even other modelling strategies such as including a generalised additive model with interactions.41 Recent proposals that investigate the predicted individual treatment effect can also be applied to estimate the effect of treatment across the covariate space.44, 45, 46 We observe that older patients seem not to benefit from the new treatment. However, this interpretation should be cautious as the precision of the estimates is not displayed.

Additional graphical approaches

We also consider further graphical approaches that may be applied to the subgroup analysis framework: level plot, mosaic plot, Venn diagram, bar chart, tree plot, L'Abbé plot, chord diagram, and Coxcomb plot. Compared with the aforementioned methods, their assessment is less favourable and hence they are only presented in Appendix available online as Supporting Information. In most of the cases, they convey the information of treatment effect through colour coding. This way of presenting the information is more challenging to decode. Additionally, most of them do not display a measure of uncertainty for the treatment effect estimates which is essential for assessing treatment effect heterogeneity. The use of auxiliary plots might help to display additional information, such as overlap between subgroups, that might be relevant. The Supporting Information provides an overview of some options. Some of the graphics allow visualising subgroup composition or overlap between subgroups by displaying the relative overlap or dissimilarity measures. Other graphics, such as a mosaic plot with a binary response, an alluvial plot or a coxcomb plot, may complement the analysis by displaying absolute response rates in treatment and control arms across subgroups.

SUBGROUP ANALYSIS SUMMARY

Summary of case study: The prostate cancer dataset

Throughout the manuscript, we have analysed the prostate cancer dataset to explore subgroup effects. Here, we present an overall summary of the main findings related to subgroups. In the forest plot (Figure 1), we explored the marginal treatment effects for subgroups defined by binary covariates. The treatment effect was similar across all the subgroups except for the group of patients with bone metastasis. The graph suggests that patients with bone metastasis might have larger benefit from the experimental treatment because the confidence interval for this subgroup does not cover the line that represents the treatment effect in the overall population. The same pattern is observed using a Galbraith plot (Figure 4), as the only point lying outside the (−2, 2) band is the one corresponding to this subgroup. Figure 3 allows, in addition, to observe subgroups formed by the subgroup intersections. It can be seen that patients without bone metastasis and with a history of cardiovascular events might have been harmed by the experimental treatment. The variable age was explored alone in Figure 5 and together with weight in Figures 6. In the latter, we find that the treatment appears more beneficial for younger patients with weight index above 90, while for older patients the treatment may have led to worse outcomes than control. We remind here that these analyses are exploratory and must be interpreted with care. Despite this, they may bring useful insights to plan additional studies and collect more information from subgroups of interest in the future.

Summary of graphical methods

In this section, we provide a summary regarding the criteria C1 to C5 presented in Table 1. We discuss only the graphics presented in the previous section. The assessment and characteristics of the improved graphical approaches are summarised in Table 2 where we also include the graphics in Appendix available online as Supporting Information.

Table 2

The assessment summary of graphical techniques for subgroup problems

	Criterion					Additional features
	C1	C2	C3	C4	C5	T/C effect	Covariate type
Galbraith plota	✓	✓			✓		B/Cat
Forest plota	✓	✓	✓		✓	✓	B/Cat
UpSet plota	✓	✓	✓	✓	✓		B/Cat
STEPP	✓	✓					Cont
Contour plot	✓			✓			Cont
Tree plot	✓	✓		✓			B/Cat
Level plot	✓		✓	✓			B/Cat
Mosaic plot	✓		✓	✓			B/Cat
Venn diagrama	✓		✓	✓			B
Bar chart	✓		✓	✓			B/Cat
L'Abbé plota	✓		✓		✓	✓	B/Cat
Chord diagram	✓		✓		✓		B/Cat
Coxcomb plot	✓		✓	✓			B/Cat

Note: The assessment criteria are: C1: effect size; C2: uncertainty; C3: sample size; C4: intersections; C5: many covariates. T/C effects stands for displaying the average response in treatment and control arms. Covariate types are B = binary, Cat = categorical, Cont = continuous. Graphics in Appendix available online as Supporting Information are also included in this table for comparison purposes.

Abbreviation: STEPP, subpopulation treatment effect pattern plot.

The plot has been improved or modified to make it available for the subgroup analysis framework.

The assessment summary of graphical techniques for subgroup problems Note: The assessment criteria are: C1: effect size; C2: uncertainty; C3: sample size; C4: intersections; C5: many covariates. T/C effects stands for displaying the average response in treatment and control arms. Covariate types are B = binary, Cat = categorical, Cont = continuous. Graphics in Appendix available online as Supporting Information are also included in this table for comparison purposes. Abbreviation: STEPP, subpopulation treatment effect pattern plot. The plot has been improved or modified to make it available for the subgroup analysis framework. C1 (effect size): This information is encoded in different ways in the studied graphics. Forest plots, UpSet plots, Galbraith plots, and STEPP allow a straightforward comparison across subgroups as the treatment effect estimates are illustrated along a common axis. This way of encoding information is the most accurate according to theoretical arguments and experimental results on graphical perception.30 Contour plots use a less accurate encoding that is effective to only give a general overview of the estimated treatment effect over the range of the covariates. Therefore, even if all of the graphical techniques satisfy the primary criterion of displaying subgroup treatment effect sizes, some may be more effective in communicating the results from the analysis than others. The judgement of heterogeneity also depends on the treatment effect estimate in the full population, which is displayed in all the considered graphics. Additionally, forest plots can provide absolute subgroup responses for the treatment and control arms. C2 (uncertainty): Forest plot, STEPP, and UpSet plot display the confidence intervals of the treatment effects while Galbraith plot shows their standard error. This is important since visualisations that do not adequately demonstrate the uncertainty in the estimates may be misleading and can lead to an over‐interpretation of the heterogeneity among subgroup effects. C3 (sample size): Only UpSet plot and forest plot provide a visual display on subgroup sample sizes. The UpSet plot displays the subgroup sample sizes in an additional panel using a bar plot which allows more efficient and accurate comparison of subgroup sizes in contrast to the forest plot. While one can add a bar plot showing sample sizes to any of the other graphics, the particular assembly of the UpSet plot enables to decode the information quickly and efficiently. C4 (intersections): This criterion is only met for UpSet plots and contour plots. UpSet plots can display intersections of two or more subgroups remarkably, allowing great flexibility in how the information is presented. C5 (many covariates): Forest plots, Galbraith plots, and UpSet plots can display a large number of subgroups‐defining covariates. However, Galbraith plots should be highlighted in this criterion as its design makes it more appropriate when considering a large number of covariates.

DISCUSSIONS AND CONCLUSION

We made use of several graphical approaches and assessed their characteristics for subgroup problems. We also attempted to improve some methods correcting flaws or adapting graphics for the subgroup analysis setting. It is important to note that the considered graphical approaches are descriptive only and do not adjust for potential selection bias of point estimates, inflated type 1 errors due to multiple testing, or reduced simultaneous coverage probabilities of confidence intervals. These consequences of multiple testing and selective estimation may become substantial as the number of considered subgroups increases. In exploratory settings where the definition and selection of subgroups are post‐hoc and may be data‐driven, frequentist error rates or coverage probabilities cannot be controlled anyway. In contrast, if the subgroups to be considered are pre‐defined (or selected independently of outcome data) there is a broad range of statistical approaches available to account for the associated multiplicity.3, 47 Most of the considered graphical approaches can be used to show multiplicity adjusted treatment effects and uncertainty measures. One can, for example, use simultaneous confidence intervals based on the Bonferroni correction, post‐selection confidence intervals,46 treatment effects estimates after model averaging,48 bias‐adjusted estimates,21 and so on. Comparative plots showing both the adjusted and unadjusted estimates may also provide valuable insights. In this article, we provide tools to visualise essential information on subgroups, as effect size estimates and subgroup sample sizes. The considered approaches are descriptive only and serve as exploratory tools for hypotheses generation for future investigations. The choice of the visualisation method depends on: the type of biomarkers that define the subgroups, the type of outcome variable, the sample sizes, and the objective of the subgroup analysis. For example, we have seen that contour plots and STEPP are only suitable for continuous covariates, while the other plots allow the use of binary or categorical covariates. On the other hand, Galbraith plots might be particularly suited for the case of a very large number subgroups and Forest plots may show not only the treatment effect estimates but also the average response in each treatment arm. As some graphics do not display all information, combining several plots can be advantageous. In this work, we focused on non‐interactive graphical displays. We recognise the usefulness of adding interactivity which can improve the flexibility of the studied graphics. For example, there exist work on interactive mosaic plots49 which allows easy inclusion of many subgroup‐defining covariates avoiding the problem of overlapping labelling. Interactive UpSet plots allow inclusion/exclusion of covariates, ordering them according to different characteristics, and displaying additional variables; which makes this graphic a powerful analysis tool (https://caleydo.org/tools/upset/). Galbraith plots might benefit from interactivity when using a large number of covariates, by using mouse hover over the points to display the corresponding labels and subgroup effect sizes. The recently published subscreen package50, 51 enables the analysis of thousands of subgroups by using a scatter plot and allowing the user to display additional information thanks to interactive tools like the Shiny R package.52 Existing interactive approaches can be adapted to subgroup analysis, or interactivity can be added to the graphics introduced in this article. Finally, the dataset we used for illustration contained information on causes of death. However, the considered endpoint in the analysis in this article was death from all causes combined. Additionally, while four treatment options were used to treat the patients, we combined them into two categories. These adaptations allowed us to frame the analysis in the typical situation where an experimental treatment is compared against a control. Modifications to the considered graphics could be explored to enable the comparison of multiple treatments or multiple endpoints. Again, interactivity may help in these situations to explore and understand the data. Appendix S1: Supporting Information Click here for additional data file.

34 in total

1. More medical journals should inform their contributors about three key principles of graph construction.

Authors: Milo A Puhan; Gerben ter Riet; Klaus Eichler; Johann Steurer; Lucas M Bachmann
Journal: J Clin Epidemiol Date: 2006-08-07 Impact factor: 6.437

2. Graphical perception and graphical methods for analyzing scientific data.

Authors: W S Cleveland; R McGill
Journal: Science Date: 1985-08-30 Impact factor: 47.728

3. Model averaging for treatment effect estimation in subgroups.

Authors: Björn Bornkamp; David Ohlssen; Baldur P Magnusson; Heinz Schmidli
Journal: Pharm Stat Date: 2016-12-09 Impact factor: 1.894

Review 4. Subgroup analyses with special reference to the effect of antiplatelet agents in acute coronary syndromes.

Authors: D Aronson
Journal: Thromb Haemost Date: 2014-03-06 Impact factor: 5.249

5. A Bayesian credible subgroups approach to identifying patient subgroups with positive treatment effects.

Authors: Patrick M Schnell; Qi Tang; Walter W Offen; Bradley P Carlin
Journal: Biometrics Date: 2016-05-09 Impact factor: 2.571

6. Tutorial on statistical considerations on subgroup analysis in confirmatory clinical trials.

Authors: Mohamed Alosh; Mohammad F Huque; Frank Bretz; Ralph B D'Agostino
Journal: Stat Med Date: 2016-11-28 Impact factor: 2.373

7. Patterns of treatment effects in subsets of patients in clinical trials.

Authors: Marco Bonetti; Richard D Gelber
Journal: Biostatistics Date: 2004-07 Impact factor: 5.899

8. Model-Based Recursive Partitioning for Subgroup Analyses.

Authors: Heidi Seibold; Achim Zeileis; Torsten Hothorn
Journal: Int J Biostat Date: 2016-05-01 Impact factor: 0.968

9. A Systematic Approach for Post Hoc Subgroup Analyses With Applications in Clinical Case Studies.

Authors: Christoph Muysers; Alex Dmitrienko; Hermann Kulmann; Bodo Kirsch; Susanne Lippert; Thomas Schmelter; Anke Schulz; Nicole Mentenich; Heinz Schmitz; Matthias Schaefers; Gerold Meinhardt; Thomas Keil; Stephanie Roll
Journal: Ther Innov Regul Sci Date: 2019-06-16 Impact factor: 1.778

10. Dichotomizing continuous predictors in multiple regression: a bad idea.

Authors: Patrick Royston; Douglas G Altman; Willi Sauerbrei
Journal: Stat Med Date: 2006-01-15 Impact factor: 2.373

6 in total

1. Investigating treatment-effect modification by a continuous covariate in IPD meta-analysis: an approach using fractional polynomials.

Authors: Willi Sauerbrei; Patrick Royston
Journal: BMC Med Res Methodol Date: 2022-04-06 Impact factor: 4.615

2. Engagement With a Behavior Change App for Alcohol Reduction: Data Visualization for Longitudinal Observational Study.

Authors: Lauren Bell; Claire Garnett; Tianchen Qian; Olga Perski; Elizabeth Williamson; Henry Ww Potts
Journal: J Med Internet Res Date: 2020-12-11 Impact factor: 5.428

3. Gut Microbiota Alteration Is Associated With Cognitive Deficits in Genetically Diabetic (Db/db) Mice During Aging.

Authors: Jiawei Zhang; Yaxuan Zhang; Yuan Yuan; Lan Liu; Yuwu Zhao; Xiuzhe Wang
Journal: Front Aging Neurosci Date: 2022-01-26 Impact factor: 5.750

Review 4. A novel concept of screening for subgrouping factors for the association between socioeconomic status and respiratory allergies.

Authors: Christoph Muysers; Fabrizio Messina; Thomas Keil; Stephanie Roll
Journal: J Expo Sci Environ Epidemiol Date: 2021-07-12 Impact factor: 5.563

5. Machine learning for tumor growth inhibition: Interpretable predictive models for transparency and reproducibility.

Authors: Andreas D Meid; Alexander Gerharz; Andreas Groll
Journal: CPT Pharmacometrics Syst Pharmacol Date: 2022-02-01

Review 6. A critical review of graphics for subgroup analyses in clinical trials.

Authors: Nicolás M Ballarini; Yi-Da Chiu; Franz König; Martin Posch; Thomas Jaki
Journal: Pharm Stat Date: 2020-03-25 Impact factor: 1.894

6 in total