Chief Content Officer, Eric Rattner, shares his thoughts on scatter charts and how these charts should be used in pitchbooks.
1. Why use a scatter chart?
A scatter chart shows a relationship between two variables. While other fundamental chart types—like line charts with two series or side-by-side benchmarking charts—can also depict how variables are related, scatter charts are preferred for illustrating correlations. When causality (i.e., one variable drives the other variable) is assumed, regression models can be fit to scatter data, and their stats added to the chart to provide a predictive analytical element to the visualization.
Scatter plots can also help communicate the degree to which two comparable data sets conform to each other. In this case, an identity line (“45-degree” or “slope = one”) is often added as a reference. The more the two variables “agree”, the more the plot markers tend to concentrate along the identity line; if one metric is generally higher or lower than the other, the dots would exist above or below the reference line, respectively.
2. What kind of visual treatments can be added to scatter charts?
Standard scatter charts can be enhanced to better tease out insights from the data. For example, adding marginal distributions to time series scatters can help reveal the independent variability of each metric, especially when they are highly correlated, thus adding a dimension to the analytical focus of the visualization. Likewise, applying background shading (in semantic “good” and “bad” color shades) for observations lying in the regions above or below the identity line aids comprehension of the overall direction of the sample’s change in a cross-sectional change scatter (and the addition of donut charts indicating the proportion of the sample in these regions surfaces this information even more explicitly). Finally, to the extent that a manifest relationship does exist, it is often useful to fit a regression model to the data to help quantify it. You can do all of this and much more in Pellucid with a few clicks.
3. You've designed a “scatter” series that includes alternative visual representations of the underlying data sets, such as “bivariate bubble distributions”. How do these work and when is this more useful than a single dot per entity?
The “bivariate bubble distribution” charts can be thought of as 2-dimensional histograms. Instead of plotting each individual observation, they plot the frequency with which observations fall within 2-dimensional ranges as bubbles whose areas are proportional to the frequencies.
Generally, traditional scatters are better-suited to evincing variable relationships, but bivariate distributions (including heat maps as well) are preferred for categorical data, such as credit ratings, where there are a discrete set of possible positions along one or both axes (i.e. where lots of scatter markers might exist “on top of” each other).
4. You mentioned “regressions” a couple times. Tell me about the regression features in Pellucid’s scatter charts.
As I mentioned earlier, the central point of a scatter is to show a relationship, and regression analysis is geared towards quantification of a predictive relationship. The slope coefficient—trendline steepness—shows the magnitude of the relationship between the two metrics plotted in a scatter. The R2, also known as the explanatory power, is more about the strength of the model’s fit—how well the movements of the independent variable (the x-axis) predicts the movements of the dependent variable (the y-axis).
In my banking experience, I’ve never seen a banker put a scatter into a pitchbook and NOT add a line-of-best-fit and its R2, whether it made any sense for the data in question or not. Given the first empirical fact, Pellucid allows users to easily fit trend lines and add regression stats to any dataset consumed in a scatter visualization. However, given some of the ridiculous applications of regressions we’ve witnessed in client pitchbooks, we also offer visual annotations of regression statistic meaning to help provide perspective on the interpretation of a regression result as applied to a particular dataset.
5. How long would it typically take to create these charts in Excel?
Appearance options in the scatter series provide flexibility to produce visualizations with a varying degree of complexity. The more basic scatters probably wouldn’t take too long to create in Excel, and these are the charts you typically see in pitchbooks because of that. But the scatter charts that have additional visual elements such as marginal histograms could take the better part of a day to create in Excel. And this is the reason why we offer these more complex charts in addition to the staple pitchbook content.
Moreso than the visual complexity of making scatters in Excel, scatter use in banking is limited (as are histograms) as its difficult to access, collect, and manage larger data sets, which are a calling card of well-executed scatters and histograms. Most banking scatters just plot comps. Using Pellucid, bankers can quickly pull and calculate metrics over much larger cross-sections.
6. Are there any of these charts that you think clients should check out?
I really like adding marginal histograms to scatters for the different dimensionality of data, so I would recommend bankers check out cross-sectional scatter with marginal histograms:
The cross-sectional change scatter is also a good one. It extends the visualization treatment you usually see in an investment banking pitchbook. Both of these charts make use of pretty simple visual extensions to a standard concept that really calls attention to new concepts.
8. Are there any scatter chart “gotchas”?
One of the main “gotchas” with scatter charts is trying to label all data. If you are plotting a data point for each of the S&P 500 constituents, you shouldn’t even try to label each one. However, for smaller comps sets, it is often important to call out individual observations. Finding space for each of those labels in a legible way can often be manual, time-consuming, and annoying to update. I've developed a process which algorithmically optimizes label placement to put each label as close to its marker as possible, which avoiding all “collisions” with other labels and chart elements.
The other gotcha is outliers. They are tricky to handle in scatter charts using the standard banker approach of broken axes. Extending axis ranges to include all observations can obscure the relationship you are trying to show while cutting off axes and excluding is just analytically shady. I prefer to plot outlier data in dedicated panels to indicate their existence while minimizing their impact on the real estate dedicated to showing the core data.
Explore Pellucid’s content, created specifically for pitchbooks and financial analysis. Request a consultation.