FIX THE PITCH

×

Fix the Pitch

How to craft compelling, dazzling pitchbooks. Thoughts, ideas, and inspiration to help construct advanced financial analysis, build stunning data visualizations and tips for mastering client meetings.

Fix your pitchbook


Join the Fix the Pitch newsletter

Popular tags


Scatter charts, regressions, and what to look out for when plotting data

Eric RattnerEric Rattner


Eric Rattner

The Pellucid Library is filled with charts, pages, and books designed specifically for investment banking pitchbooks and analysis. Chief Content Officer, Eric Rattner, shares his thoughts on the newly expanded scatter chart series and how these charts should be used.

1. Why use a scatter chart?

A scatter chart shows a relationship between two variables. While other fundamental chart types—like line charts with two series or side-by-side benchmarking charts—can also depict how variables are related, scatter charts are preferred for illustrating correlations. When causality (i.e., one variable drives the other variable) is assumed, regression models can be fit to scatter data, and their stats added to the chart to provide a predictive analytical element to the visualization.


Pellucid scatter chart typically found in Pellucid pitchbook

Scatter plots can also help communicate the degree to which two comparable data sets conform to each other. In this case, an identity line (“45-degree” or “slope = one”) is often added as a reference. The more the two variables “agree”, the more the plot markers tend to concentrate along the identity line; if one metric is generally higher or lower than the other, the dots would exist above or below the reference line, respectively.

2. What types of data are in the Pellucid scatters?

In Pellucid, we offer cross sectional scatters (data across a selection of companies, like a market index) and time series scatter (data over a time horizon, like all days over the past year). Both perspectives comprise two data sub-categories. For a cross section, you can look at the relationship between two metrics at a single point in time or the change in one metric over two points in time. For a time series, you can evaluate the relationship between two metrics for a single entity, or the correlation between a given metric for two distinct entities.

3. What kind of visual treatments can be added to scatter charts?

Standard scatter charts can be enhanced to better tease out insights from the data. For example, adding marginal distributions to time series scatters can help reveal the independent variability of each metric, especially when they are highly correlated, thus adding a dimension to the analytical focus of the visualization. Likewise, applying background shading (in semantic “good” and “bad” color shades) for observations lying in the regions above or below the identity line aids comprehension of the overall direction of the sample’s change in a cross sectional change scatter (and the addition of donut charts indicating the proportion of the sample in these regions surfaces this information even more explicitly). Finally, to the extent that a manifest relationship does exist, it is often useful to fit a regression model to the data to help quantify it. You can do all of this and much more in Pellucid with a few clicks.

4. The “scatter” series includes alternative visual representations of the underlying data sets, such as “bivariate bubble distributions”. How do these work and when is this more useful than a single dot per entity?

The “bivariate bubble distribution” charts can be thought of as 2-dimensional histograms. Instead of plotting each individual observation, they plot the frequency with which observations fall within 2-dimensional ranges as bubbles whose areas are proportional to the frequencies.


Pellucid bubble bivariate bubble scatter chart

Generally, traditional scatters are better-suited to evincing variable relationships, but bivariate distributions (including heat maps as well) are preferred for categorical data, such as credit ratings, where there are a discrete set of possible positions along one or both axes (i.e. where lots of scatter markers might exist “on top of” each other).

5. You mentioned “regressions” a couple times. Tell me about the regression features in Pellucid’s scatter charts.

As I mentioned earlier, the central point of a scatter is to show a relationship, and regression analysis is geared towards quantification of a predictive relationship. The slope coefficient—trendline steepness—shows the magnitude of the relationship between the two metrics plotted in a scatter. The R2, also known as the explanatory power, is more about the strength of the model’s fit—how well the movements of the independent variable (the x-axis) predicts the movements of the dependent variable (the y-axis).

In my banking experience, I’ve never seen a banker put a scatter into a pitchbook and NOT add a line-of-best-fit and its R2, whether it made any sense for the data in question or not. Given the first empirical fact, Pellucid allows users to easily fit trend lines and add regression stats to any dataset consumed in a scatter visualization. However, given some of the ridiculous applications of regressions we’ve witnessed in client pitchbooks, we also offer visual annotations of regression statistic meaning to help provide perspective on the interpretation of a regression result as applied to a particular dataset.

6. How long would it typically take to create these charts in Excel?

Appearance options in the scatter series provide flexibility to produce visualizations with a varying degree of complexity. The more basic scatters probably wouldn’t take too long to create in Excel, and these are the charts you typically see in pitchbooks because of that. But the scatter charts that have additional visual elements such as marginal histograms could take the better part of a day to create in Excel. And this is the reason why we offer these more complex charts in additional to the staple pitchbook content. A sophisticated scatter chart with donuts, summary stats, and other contextual elements would only take a couple of minutes to order in Pellucid.

Moreso than the visual complexity of making scatters in Excel, scatter use in banking is limited (as are histograms) as its difficult to access, collect, and manage larger data sets, which are a calling card of well-executed scatters and histograms. Most banking scatters just plot comps. Using Pellucid, bankers can quickly pull and calculate metrics over much larger cross sections.

7. Are there any of these charts that you think our users should really check out?

I really like adding marginal histograms to scatters for the different dimensionality of data, so I would recommend bankers check out cross sectional scatter with marginal histograms:


Pellucid pitchbook scatter chart with marginal histogram

The cross sectional change scatter is also a good one. It extends the visualization treatment you usually see in an investment banking pitchbook. Both of these charts make use of pretty simple visual extensions to a standard concept that really call attention to new concepts.


Pellucid pitchbook scatter chart showing cross sectional change

8. Are there any scatter chart “gotchas”?

One of the main “gotchas” with scatter charts is trying to label all data. If you are plotting a data point for each of the S&P 500 constituents, you shouldn’t even try to label each one. However, for smaller comps sets, it is often important to call out individual observations. Finding space for each of those labels in a legible way can often be manual, time-consuming, and annoying to update. In Pellucid, we algorithmically optimize label placement to put each label as close to its marker as possible, which avoiding all “collisions” with other labels and chart elements.

The other gotcha is outliers. They are tricky to handle in scatter charts using the standard banker approach of broken axes. Extending axis ranges to include all observations can obscure the relationship you are trying to show while cutting off axes and excluding is just analytically shady. I prefer to plot outlier data in dedicated panels (similar to the “outlier bins” in Pellucid’s histogram series) to indicate their existence while minimizing their impact on the real estate dedicated to showing the core data.

Explore Pellucid’s content, created specifically for pitchbooks and financial analysis. Request a demo.

Eric Rattner
Author

Eric Rattner

Investment banking lifer and native New Yorker. Broadway and movie buff and aspiring soccer star. Building innovative, beautiful charts filled with really smart data analysis.

Comments