**Research** Special Paper

# Visualising harms in publications of randomised controlled trials: consensus and recommendations

### Rachel Phillips, Suzie Cro, Graham Wheeler, et al

*BMJ* 2022; 377 doi: https://doi.org/10.1136/bmj-2021-068983 (Published 16 May 2022)Cite this as: *BMJ* 2022;377:e068983

## Abstract

**Objective** To improve communication of harm in publications of randomised controlled trials via the development of recommendations for visually presenting harm outcomes.

**Design** Consensus study.

**Setting** 15 clinical trials units registered with the UK Clinical Research Collaboration, an academic population health department, Roche Products, and *The**BMJ*.

**Participants** Experts in clinical trials: 20 academic statisticians, one industry statistician, one academic health economist, one data graphics designer, and two clinicians.

**Main outcome** **measures** A methodological review of statistical methods identified visualisations along with those recommended by consensus group members. Consensus on visual recommendations was achieved (at least 60% of the available votes) over a series of three meetings with participants. The participants reviewed and critically appraised candidate visualisations against an agreed framework and voted on whether to endorse each visualisation. Scores marginally below this threshold (50-60%) were revisited for further discussions and votes retaken until consensus was reached.

**Results** 28 visualisations were considered, of which 10 are recommended for researchers to consider in publications of main research findings. The choice of visualisations to present will depend on outcome type (eg, binary, count, time-to-event, or continuous), and the scenario (eg, summarising multiple emerging events or one event of interest). A decision tree is presented to assist trialists in deciding which visualisations to use. Examples are provided of each endorsed visualisation, along with an example interpretation, potential limitations, and signposting to code for implementation across a range of standard statistical software. Clinician feedback was incorporated into the explanatory information provided in the recommendations to aid understanding and interpretation.

**Conclusions** Visualisations provide a powerful tool to communicate harms in clinical trials, offering an alternative perspective to the traditional frequency tables. Increasing the use of visualisations for harm outcomes in clinical trial manuscripts and reports will provide clearer presentation of information and enable more informative interpretations. The limitations of each visualisation are discussed and examples of where their use would be inappropriate are given. Although the decision tree aids the choice of visualisation, the statistician and clinical trial team must ultimately decide the most appropriate visualisations for their data and objectives. Trialists should continue to examine crude numbers alongside visualisations to fully understand harm profiles.

## Recommendations for multiple binary outcomes

### Dot plot

#### Plot description

The dot plot summarises both the absolute and the relative risk for multiple events (fig 1, supplement 2 fig A1). The left panel displays the percentage of participants who had an event (labelled on the vertical axis) in each treatment group. The central panel displays a measure of comparison—in our example, the relative risk of observing each event in the treatment group compared with the control group is shown, along with corresponding 95% confidence intervals on the log10 scale and a line to show the value of no difference (for relative risks, this is 1). Events on the vertical axis are ordered with the highest risk at the top and decreasing in relative risk at the bottom of the graph. The 95% confidence interval shows the uncertainty around the comparative estimate, and its proximity relative to the value of no difference indicates the strength of evidence against the null hypothesis of no difference in event risk between treatment and control groups. The right panel displays a data table containing the number of participants with at least one event and the number of events by treatment group.

#### Implementation and interpretation

In our example (fig 1, supplement 2 fig A1), the overall impression is that point estimates for the relative summary statistic are evenly distributed on either side of the vertical line but with great differences in levels of precision, shown by the length of the confidence interval, due to the marked differences in the frequencies of the outcome. The largest relative risk communicates increased risk of infection in the intervention group, but the absolute risk and frequencies in the data table show small numbers of participants who had this event. The data show a reduced risk of respiratory events and renal and urinary events in the intervention group; again, the absolute risks and the raw numbers in the data table show only small numbers who had these events. Of note are the estimates for blood and lymphatic disorders and gastrointestinal events, where the relative risks indicate a reduced risk in the intervention group with confidence intervals that do not cross 1. Although these estimates look small compared with the other relative risks, the left side of the plot clearly shows a noticeable difference in absolute numbers, and the data table shows the large numbers of patients who had these events. Therefore, this finding suggests a potential beneficial effect of the intervention on these harm outcomes that might warrant closer inspection.

#### Recommendation

The consensus group unanimously endorsed the dot plot for presenting data for multiple binary outcomes. The dot plot provides a comprehensive presentation of the data that incorporates the traditional table of events. The dot plot was the only visualisation to receive 100% endorsement (see supplement 6 for the endorsement consensus for the other recommended plots).

#### Potential amendments

The relative risk, risk difference, odds ratio, or incident rate ratios (adjusted or unadjusted as desired) can be plotted as the measure of comparison in the central panel of this plot. Some researchers might also prefer to present the data table in the central panel so that it appears alongside the absolute summary. This plot can be presented in grayscale without any loss of meaning. A small number of additional arms can be added for multiarm studies through incorporation of multiple non-overlapping estimates on the same plot (eg, by use of jittering); however, an increase in the number of active treatment groups can lead to incomprehensive distinction between arms.

#### Limitations

Confidence intervals around the relative differences are useful to identify potential signals (that is, information that raises the possibility of a causal relationship between the intervention and event) of harm for further investigation. However, confidence intervals should not be used as a proxy for hypothesis testing, which will increase the chance of finding spurious significant differences resulting from multiple hypothesis tests.16 Clinician feedback indicated that trialists should consider varying the horizontal axis range for the absolute summary and scale for the relative summary to ensure clarity without exaggerating effects—for example, presentation of the entire 0-100 scale for the absolute summary might not be appropriate for rare events. When presenting the odds ratios or risk ratios, if no events were reported in one of the treatment groups, a common, simple correction is to add half an event to each group (numerator and denominator). This continuity correction is commonly used but has been shown to be inferior when undertaking meta-analyses for rare events; therefore, alternative corrections might warrant consideration.1718 Although this plot gives a comprehensive overview, some potentially important pieces of information are not included, such as the relative severity of different harm outcomes, and even though recurrent events can be presented using the incident rate ratio, this information cannot be easily displayed on the left panel. In scenarios where information on severity is important, the stacked bar chart can be used, and for recurrent events, the mean cumulative function plot can be used (see later).

#### Software

The dot plot can be produced in Stata by use of the aedot or aedots command, in R with the code available in supplement 4, and in SAS with the code available from the CTSpedia Wiki page (https://www.ctspedia.org/do/view/CTSpedia/ClinAEGraph000). The SAS example does not include code to incorporate the data table.19

### Stacked bar chart

#### Plot description

The horizontal stacked bar chart presents the percentage of participants with an event by treatment group and by maximum severity—that is, if a participant had the same event twice, once classified as mild and once as moderate, this participant would be counted once as having a moderate event (fig 1, supplement 2 fig A2). The bars are labelled with the corresponding number of participants. Bars are split by colour gradient to indicate different severity grades, and the total bar height shows the proportion of participants who have had that event at least once. The most severe grade is displayed closest to the vertical axis to allow ease of informal comparison across treatment groups for the most harmful or burdensome events.

#### Implementation and interpretation

In our example (fig 1, supplement 2 fig A2), the most frequent events reported were at least one event of the blood and lymphatic system or gastrointestinal disorders. Although more blood and lymphatic events were noted in the placebo group, the stacked bar chart shows that the proportion in the most severe categories (severe plus moderate) were similar across treatment groups, and the difference in numbers between treatment groups was because of the difference in participants who had a mild event. For gastrointestinal disorders, the stacked bar chart showed that fewer events were recorded for the intervention group across each of the severity grades compared with those in the placebo group. The plot also displays events classified as other that were dominated by severe and moderate events in the intervention group compared with the placebo group, which could warrant closer inspection of the type of events. The stacked bar chart highlights the most frequent events because of the increased physical space that these events occupy. This display contrasts with the dot plot, in which the most frequent events take up the least space in the central panel because of the increased precision and hence narrower confidence intervals around the treatment effect estimate.

#### Recommendations

The stacked bar chart is easy to understand and is useful when it is important to present information on severity of multiple events. This display can be used to informally compare severe or severe plus moderate events or the overall number of events between groups. Treatment groups are recommended to be displayed directly adjacent to each other for each event and horizontally aligned to allow labelling that is easy to read.

#### Potential amendments

This plot can be adapted to multiarm studies, and graduation from black to white is possible without loss of meaning to avoid use of other colours. The single event setting can make use of this graph by replacing events on the vertical axis with representation of time—for example, visits or treatment cycle, an example of which can be found in Thanarajasingam et al.20

#### Limitations

Direct comparisons of stacked bars within severity ratings between treatment groups are not possible beyond the segment closest to the vertical axis; however, cumulative comparisons such as severe plus moderate are possible and are perhaps more meaningful. Stacked bar charts promote presentation of information on participants with at least one event at maximum severity rather than number of events, and additional information on repeated events should also be presented. In addition, the effect sizes for differences between groups are not explicitly displayed.

#### Software

Stacked bar charts are easily implemented as standard plots across the variety of statistical packages (graph hbar, Stata; barplot or the ggplot2 package with geom_bar, R; proc gchart, SAS).

## Recommendations for single binary outcomes

### Bar chart

#### Plot description

A bar chart presents information on the number or count of adverse events reported per participant (fig 1, supplement 2 figs A3a and b). Each bar represents the percentage of participants by number of events experienced for each treatment group.

#### Implementation and interpretation

Figure 1 (supplement 2 fig A3a) displays the distribution of multiple events, with higher numbers of multiple events recorded more often for participants in the placebo group than participants in the intervention group. In the alternative figure (fig 1 supplement 2 fig A3b) the distributions indicate that participants in either of the intervention groups had multiple events more often than those in the placebo group.

#### Recommendations

The bar chart is recommended to present information on the number of events experienced. This plot is simple and can be useful to illustrate differences in counts of binary events between treatment groups and is potentially useful to highlight differences in the burden of harm experienced by participants. A bar chart can depict an overall summary of events, such as the total number of serious adverse events, a limited number of events of interest, or a single event of interest. This plot can also be used in an exploratory setting to show the distribution of repeated events.2122 Vertical bars with treatment groups presented alongside each other are the recommended format (fig 1 supplement 2 fig 3a) when comparing two treatment groups. For more than two treatment groups, the recommended alternative is to use separate plots stacked above each other for each group (fig 1 supplement 2 fig 3b).

#### Potential amendments

This plot can be easily adapted to multiarm studies and can be produced in grayscale if necessary. Additionally, bars could be labelled with number of participants to ensure accurate communication.

#### Limitations

Although this plot is helpful for summarising and comparing the overall burden of different treatments, it does not make a distinction between the types of events. Therefore, trialists should still explore and report the individual event data, giving careful consideration as to whether such a plot for overall events could be misleading. In addition, although bar charts could potentially reveal patterns in the data, clinician feedback indicated that subtle differences would be less obvious, and careful consideration of when to use this plot and the accompanying message it supports is needed.

#### Software

Bar charts are easily implemented as standard plots across the variety of statistical packages (graph bar, Stata; barplot or the ggplot2 package with geom_bar, R; proc gchart, SAS).

## Recommendations for single time-to-event outcomes

### Kaplan-Meier plot

#### Plot description

The Kaplan-Meier plot for single time-to-event outcomes shows the cumulative proportion of participants remaining event-free over time by treatment group (fig 1, supplement 2 fig A4). The 95% confidence interval bands indicate the precision of the within group estimates of being event-free. The table below the plot shows the number of participants who remain at risk for the specific event of interest, the cumulative number who have been censored, and the cumulative number who had the event of interest at each discrete time point.

#### Implementation and interpretation

In our example the extended risk table (fig 1) indicates that by the end of follow-up, little difference was noted between treatment groups in the number of participants who had an infection or infestation. However, the event curves show that 50% of the placebo group had this event within about 100 days of randomisation, whereas it took until 160 days after randomisation for 50% of the mepolizumab group to experience the event.

#### Recommendations

We recommend the Kaplan-Meier plot with within group confidence intervals and extended risk table for specific events of interest to detect either a large between treatment group difference or a potential disproportionality over time, especially as adverse drug reactions are often time dependent.

#### Potential amendments

For rare events, trialists might want to reverse the vertical axis to display the cumulative proportion with the event to aid interpretation. This plot can be created in grayscale, with different line styles used to differentiate between groups. Extensions to multiple events or multiarm studies are potentially feasible but can become incomprehensible when displaying multiple overlying confidence bands. Therefore, trialists should consider only plotting the survival estimates with extended risk tables, or present separate plots for comparison of each intervention group, with a common comparator or separate plots for different events.

#### Limitations

Kaplan-Meier plots depict only time-to-first event, failing to consider recurrent events. For clarity in presentation, these graphs are also typically limited to one type of event at a time. To present information on recurrent events over time, a plot of the mean cumulative function (see later) is recommended. Some generic limitations of using time-to-event plots in this setting are discussed later*.*

#### Software

Kaplan-Meier plots are easily implemented as standard plots across a variety of statistical packages. To incorporate the extended risk tables, trialists can use the R package KMunicate and a program for implementation in Stata (https://github.com/sarwarislam/kmunicate_stata).23

### Mean cumulative function plot

#### Plot description

For recurrent events or a summary of the total burden of events, the mean cumulative function plot is recommended. This plot is a non-parametric estimate of the mean cumulative number of events per participant (displayed on the vertical axis) as a function of time (horizontal axis) by treatment group (fig 1, supplement 2 fig A5). The 95% confidence interval bands show the precision of the within group estimate. The risk table includes information on the number of participants who remain at risk of an event at discrete time points.

#### Implementation and interpretation

Over the first week after randomisation, the mean number of events per participant is similar across treatment groups, but by day 20 a divergence becomes apparent (fig 1). In the paroxetine group, a mean of two events per participant was observed by day 20, but in the placebo group at that time a mean of approximately 1.5 events per participant was observed. The plot of the mean cumulative function shows the participant burden of recurrent events, highlighting in this example that over follow-up, participants in the paroxetine group had on average a greater number of events than participants in the placebo group, suggesting that some events are associated with the intervention.

#### Recommendations

Unlike the Kaplan-Meier plot, this plot can display information on recurrent events, providing a visual summary of the expected time until a certain number of an event will be recorded per participant by group. This visualisation can show the burden of any event as in the example that we present, or the recurrence of events of special interest. As highlighted in the clinician feedback, these plots are potentially useful when investigating long term treatments for chronic conditions and can provide valuable insight into periods when the treatment might be considered safe or well tolerated. When used to present data for any event, this plot serves as an alternative to the bar chart of counts that incorporates time. This graph also usefully summarises overall burden in place of, or in addition to, summaries of time to discontinuation that are often reported as a proxy for harm.

#### Potential amendments

As with the Kaplan-Meier plot, this plot can be created in grayscale without loss of meaning. Extension to multiarm studies or multiple events is potentially feasible, but displaying multiple overlying confidence bands could make the plot incomprehensible. Similar to the recommendation for the Kaplan-Meier plot, trialists should therefore consider plotting only the mean cumulative function (without confidence bands) and risk table, or present separate plots for comparison of each intervention group with a common comparator, or separate plots for different events.

#### Limitations

For clarity in presentation, mean cumulative function plots are typically limited to one type of event at a time. More generic limitations and cautions of use of time-to-event plots in the harm setting are provided later in this paper*.*

#### Software

The mean cumulative function with confidence interval bands can be implemented using the SAS proc reliability procedure and mcfplot command.

#### Limitations applicable to time-to-event methods

The measure of uncertainty (confidence interval bands) in the Kaplan-Meier plot and the plot of the mean cumulative function are within treatment groups and not between treatment groups, which is the inference of interest in comparative clinical trials. To incorporate an estimate of the between group difference with a measure of uncertainty, the survival ratio plot can be used (see later). Additionally, when time-to-event methods for harm data are used, trialists must remain aware of the limitations around competing risks and consider these when performing the underlying time-to-event analyses. More information on alternative strategies to account for competing risks can be found in Proctor and Schumacher24 and include use of appropriate estimates (eg, Aalen-Johnson estimator or Fine and Gray method) to plot the cumulative incident function.

### Survival ratio plot

#### Plot description

The survival ratio plot displays the ratio of non-parametric estimates of the survival probabilities (ie, the probabilities for being event-free in the harm setting) between treatment groups over time along with 95% confidence intervals. Unlike the Kaplan**-**Meier and mean cumulative function plots, this plot allows a direct comparison between treatment groups (fig 1, supplement 2 fig A6). As the plot displays the ratio of survival probabilities over time, departures from unity (point of unity is the survival ratio of 1) indicate potential differences between treatment groups. The green horizontal bar at the bottom of the plot changes to red if the confidence interval excludes unity.25

#### Implementation and interpretation

The survival ratio plot (fig 1) depicts a point estimate indicating a greater risk of infection and infestation in the placebo group compared with the intervention group, with a value between 0.9 and 1.0 until day 40 and dropping below 0.9 thereafter. Compared with the Kaplan-Meier plot, this plot shows the confidence band for the between group comparison (rather than within-group confidence intervals in the Kaplan-Meier plot). The confidence band includes the point of unity across all time periods and therefore would not provide sufficient evidence to raise a signal for this event to undergo further investigation.

#### Recommendation

The survival ratio plot would be suitable for signal detection analysis for emerging events because it provides a between group comparison that can be used to spot departures from unity and helps to identify the time that such divergences occur, which can help detect potential signals for adverse drug reactions. For events of specific interest when the focus is on accurately estimating survival probabilities over time, this plot is less suitable. This plot can be presented alongside the Kaplan-Meier plot to show both a relative and an absolute measure.

#### Potential amendments

Our example displays the ratio of survival probabilities estimated using the Kaplan-Meier method; alternatively, the display could show the difference in survival probabilities. As with both the Kaplan-Meier and the mean cumulative function plots, multiple lines can be added to one graph to display estimates for different events or multiple treatment comparisons.

#### Limitations

As with Kaplan-Meier plots, the survival ratio plot allows for only time-to-first event; therefore, this graph is not suitable for recurrent events. The plot is also limited to one type of event; however, in some situations multiple estimates can be added to the same plot but with the same considerations as plotting multiple lines on the Kaplan-Meier plot. As with other time-to-event plots, competing risks are important to consider when performing the underlying time-to-event analysis, further details of which are discussed above. The confidence interval band of values around the relative differences are useful to detect signals of potential harm for further monitoring, but we are not encouraging hypothesis testing in this setting.16 Despite survival ratio plots first being proposed in 2006, little evidence exists of their application in the clinical trial literature; use of this plot will need to be accompanied by a detailed explanation until audiences become more familiar with it and its interpretation.25 This postulation was supported in discussions with clinicians, who initially struggled to interpret this plot but who indicated a strong endorsement after further explanation was provided.

#### Software

The survival ratio plot can be implemented in R using the survRatio package with the drsurv function to take the time, censoring indicator, and treatment indicator as inputs. This package returns Kaplan-Meier survival estimates and corresponding confidence intervals to create an object of the survival ratio, survival difference, and pointwise (bootstrap) confidence bands. The ggsurv function is then used to create the plot of the survival ratio and confidence interval bands.

## Recommendations for single continuous outcomes

### Line graph

#### Plot description

In the line graph plot, markers display mean values and vertical lines indicate the standard deviation (not standard error) of raw values at each discrete time point, connected with a line to the point closest in time for each treatment group (fig 1, supplement 2 fig A7). Horizontal reference lines are included to indicate the upper and lower limits of normal values for the outcome, and a table of numbers of participants at risk at each discrete time point is included.

#### Implementation and interpretation

An immediate decrease can be seen in the mean eosinophil count after randomisation in the mepolizumab group (fig 1), and this decreased level is maintained across follow-up. The mean values for the placebo group fluctuate around the baseline value and the error bars exceed the upper limit of normal during follow-up.

#### Recommendations

This plot can be used to describe continuous harm outcomes of interest over time by use of an appropriate summary statistic, together with an indication of variability. This plot can be helpful to identify shifts in distributions between treatment groups and highlight any potential trends; as a result, this display might be better suited to depict clinical outcomes (such as vital signs) rather than blood markers, where clinicians are more often interested in the tails of the distribution (ie, the ends or extremes of the distribution of observed values).

#### Potential adaptations

The summary statistic displayed in this plot should be chosen to reflect each individual dataset and the purpose of the plot, for example, when interest is in presenting descriptions of the distributions, either means and standard deviations or medians and interquartile ranges can be plotted, and if interest is in drawing inferences of between group comparisons, then estimates from mixed effects models for repeated measures with 95% confidence intervals can be presented. This plot can easily incorporate multiple groups or outcomes and can be modified to exclude the use of colour.

#### Limitations

Changes in the tails of the distributions are usually of most interest when monitoring blood markers for harm, and such changes might be difficult to see using this plot. This graph is also unsuitable for skewed distributions; alternative plots for such data are presented below. Appropriate colours and line styles should be considered for clarity, particularly when adapting line graphs to multiarm trials.

#### Software

Line graphs are easily implemented as standard plots across the variety of statistical packages (eg, twoway connected and twoway rspike, Stata; plot and lines or using the ggplot2 package with geom_line and geom_errorbar, R; and proc gplot, SAS).

### Violin plot

#### Plot description

The hollow circle marker on the violin plot indicates the median value, the narrow rectangular boxes indicate the interquartile range, and lines extend from the box to the minimum and maximum points for each group at each time point. These parts are overlaid with kernel density plots (see later), which summarise the distribution of the raw values (fig 1, supplement 2 fig A8). The two horizontal dashed lines indicate the upper and lower limits of normal values.

#### Implementation and interpretation

At time 0 (randomisation) the distributions were similar across treatment groups, but from week 2 onwards the distribution of values in the mepolizumab group was narrower than in the placebo group (fig 1). The distribution of the values in the placebo group was largely unchanged over time and indicated that a proportion of the participants remained in the upper tail exceeding the upper boundary of normal throughout follow-up. This display indicates a benefit for the mepolizumab group by reducing eosinophil concentrations to within the normal limits.

#### Recommendation

This plot is an alternative to the line graph to describe continuous data that can be used even if the outcome of interest is not normally distributed. Outlying values are displayed and these can be labelled to highlight participants who persistently record values of concern.

#### Possible adaptations

In the current format, information is duplicated because the kernel density plot is mirrored. Presenting only one kernel density would improve clarity and produce a more space efficient plot.

#### Limitations

The violin plot only allows for informal between group comparisons of distributions and does not allow for presentation of formal between group inferences such as the estimates from mixed effects models, which can be presented in a line graph. Adaptations to multiarm trials are not as space efficient as for the line graph. Kernel density estimates for some data might extend to values outside the plausible range—for example, some kernel densities are estimated to be below 0 for eosinophil counts, which is not feasible clinically.

#### Software

The violin plot can be implemented in Stata by use of vioplot or by use of the ggplot2 package in R with geom_violin or SAS proc sgpanel.

### Kernel density plot

#### Plot description

The kernel density plot displays the distribution of a continuous outcome. Data can be for a single time point or a derived change score—for example, the difference between the baseline value and maximum value while receiving treatment (fig 1, supplement 2 fig A9). Vertical reference lines can be included to indicate the upper and lower limits of normal values for the outcome.

#### Implementation and interpretation

Although figure 1 shows that values are similarly distributed in the placebo and paroxetine groups when within the normal range (ie, <390 U/L (6.51 μkat/L)), the plot clearly shows a high alkaline phosphate value for some participants in the paroxetine group through the long right tail. This plot highlights the increased alkaline phosphatase concentrations in some participants taking paroxetine as an important event for closer monitoring in future trials or in the postmarketing setting.

#### Recommendations

The kernel density plot is recommended to explore an outcome of interest at a specific time point or a change score—for example, the change from baseline to a specific point in time or maximum change over the entire trial. This plot can be used to informally compare whole distributions of data between treatment groups and can highlight important differences in these distributions.

#### Potential adaptations

This plot can easily incorporate multiple groups and can be modified to not require use of colour.

#### Limitations

The kernel density plot only allows for informal between group comparisons of distributions and the information on repeated measures is lost, only displaying information for one time point.

#### Software

The kernel density plot can be implemented in Stata by use of twoway kdensity or the ggplot2 package in R with geom_density or SAS densityplot.

## Recommendations for multiple continuous outcomes

### Scatterplot matrix

#### Plot description

Multiple scatterplots of continuous outcomes arranged in a matrix, each display the relationship between values at two different time points—for example, baseline values along the horizontal axis and the participant’s maximum value over follow-up along the vertical axis (fig 1, supplement 2 fig A10). The dashed lines represent the boundary between normal and abnormal thresholds.

#### Implementation and interpretation

In our example, where a higher threshold is worse, participants of most concern would be in the top left quadrant (ie, participants’ baseline values were normal and are now abnormal) and the participants who have improved would be in the bottom right (ie, participants’ baseline values were abnormal and are now normal). If more participants from the intervention group than control group were in the top left quadrant this would be cause for concern. In figure 1, slightly more participants in the placebo group (n=4) had higher alanine transaminase (ALTs) when receiving treatment compared with baseline in contrast with participants in the mepolizumab group (n=2).

#### Recommendation

The scatterplot matrix is recommended in an exploratory setting to identify any outliers or patterns of interest. We suggest labelling outlying values with a participant identifier, as shown in figure 1, to assess if one or more participants have abnormal measurements across outcomes. This could be useful to monitor participants in ongoing studies and might also help to raise signals for potential adverse drug reactions in final analyses.

#### Possible adaptations

This plot could be used to explore two continuous measures at any time point over study follow-up. Variations in symbol style and colours should be used to help separate overlapping measurements between groups. Reference lines could be included to indicate both upper and lower limits of normal for each outcome.

#### Limitations

This plot presents several visual problems. Use of solid colours results in occlusion, making it difficult to distinguish individual points; transparency options could help with this issue.

#### Software

Scatterplots are easily implemented as standard plots across the variety of statistical packages. For example, use of twoway scatter in Stata to produce the individual plots and the graph combine command or use of the grc1leg command to produce the scatterplot matrix.

#### Areas for further development

Among the visualisations considered for displaying multiple time-to-event outcomes, the options available were judged to be poor. Although multiple Kaplan-Meier plots could be used to display information on a limited number of prespecified events of interest, a gap remains in how to visualise multiple time-to-event outcomes simultaneously on the same plot. We discussed the development of novel plots in this setting and we will pursue this in future work.