Bankers love to benchmark. Or at least this can be assumed based on the number of benchmarking charts in a pitchbook. Comparing a data set across companies comprises a large part of client conversations, and as such, it’s no surprise that bar charts, such as the below, are often produced by bankers.
In addition to facilitating company-to-company comparison, benchmarking bar charts also can feature summary statistics, such as the comps’ average or median, to give a more digestible comparison.
Also common is “stacking”. In the below, each bar shows total payout ratio for a comps and how that breaks down into component payout metrics.
So it would appear perfectly reasonable to try and combine these two approaches to create a stacked benchmarking bar chart that includes summary statistics.
But not so fast…
In attempting to apply the summary stat overlays to the stacked chart above, you can see there is an issue with clarity. Throwing a summary line across the data set works fine for the bottom stack because the line traverses the data markers (i.e. the bars) at the appropriate point dues to a shared common baseline of zero. Additionally, there is nothing “under them” to disconnect the value of the median’s position on the value axis and the average location of the component bar segments with respect to the axis (i.e their height in the plot area).
However, this isn’t the case for the second (and all ensuing) stack segments. There is no single vertical position where the median share repurchase reference line could be plotted that both exists at the correct position on the value axis and traverses the individual bars at the “right” place (below I’ve used a step line that positions each segment at the vertical position, which is derived by adding the given company’s dividend payout ratio to the median share repurchase payout ratio).
You might be thinking, “Aha! But, I’ve seen you plot summary statistics as bars in other charts. We can just stack the component summary stacks.” But the issue is actually more fundamental than a choice of visual treatment. The math doesn’t work. The median of A and the median of B does not (necessarily) equal the median of A + B.
So while it is more simple visually to just stack the component medians, the issue is that the resultant stack improperly represents what it really should be displaying, the median total payout ratio of the comps.
Unfortunately, there is no simple way to plot component and aggregate summary stats (like medians) on the same chart without making data visualization sacrifices.
What I prefer to do is consider what my chart is supposed to be saying and then choose an approach accordingly. I literally go through the process of asking myself whether benchmarking the components, such as dividend and share repurchase payout ratios, or aggregates like total payout ratio is the main message. Fortunately, several options exist for both cases.
If you want to talk about aggregates
Usually, I do want to talk about aggregates. In this case, stacks should not be thrown together willy-nilly; the sums need to have meaning. In fact, as the principal curator of content for Pellucid, I don’t actually think of them as bottom-up stacks, but instead as top-down “decomposable metrics”. In other words, I use stacked bars for metrics that are naturally defined as the sum of component metrics, like total payout ratio.
If aggregates are the point, they should be the values used in summary statistics. The visual treatment is simple whether your preference is summary lines or summary bars. Pick a neutral color and plot the aggregate metric’s value.
If you want to talk about components
While stacked bar data should ALWAYS maintain the discipline of aggregating to a naturally decomposable metric, it could be more important to focus on the component metrics values, or, more importantly, their proportionate split. After all, that’s what the visualization cognitively facilitates.
While the challenges of applying component summary stats with lines has been demonstrated, it can be done using summary bars, but thoughtful consideration needs to be given to the ease with which the output can mislead. First of all, aggregate data labels should NEVER be displayed on the summary bars as they would be misconstrued as the median aggregates (rather than the sum of component medians).
If you’re willing to go a bit beyond the basics, you can actually make this pretty clear by doing a little custom charting work.
But most of the time, your best bet would be to convert your stacked bars to clustered bars. After all, if components really are the point, this provides a better way to benchmark data not only across entities but across component metrics as well.
Once you’ve flattened out your stacks into clusters, you can easily use either summary lines or summary bars, depending on your preference.
What do you think? I think the resulting visual is worth the little bit extra of thought processing. Any questions about any of this, just email me at email@example.com.
The only summary statistic to which these issues do not apply is the aggregate. Another reason why they’re our favorite. This is because aggregates are actually composite index calculations, and indexes, like individual companies, and like every other entity in the platform, conform to the same data definitions, rules, and relationships. ↩
This is because missing (i.e. “NA”) data values are excluded from summary statistic calculations. For instance, in our example, if Bed Bath & Beyond’s dividend payout ratio data was missing, the average total payout ratio (which would exclude Bed Bath & Beyond’s NA) could be different than the average dividend payout ratio (which would also exclude Bed Bath & Beyond’s NA value) plus the average share repurchase payout ratio (which would take Bed Bath & Beyond’s ratio into account). ↩
Transform data into compelling visualizations and use Pellucid to create your benchmarking charts. Visit www.pellucid.com.