This post is part of a series covering the topic of donor insights. Here, we review the concerns associated with skewed data and tips for reviewing fundraising data most effectively. To continue learning about how to leverage donor data to raise more money, we recommend what to read next at the close of the post below.
While fundraising reports often cite the mean, or average, this data point is not always the best summary statistic to use when representing a data set. To most efficiently interpret and leverage data, it’s important to know the implications of using different summary statistics. For this reason, in our report, the State of Modern Philanthropy, we chose to report both the median and the mean.
This blog post hopes to clarify some of these concepts and aid better data-based decision making and benchmarking at your nonprofit. Here, we’ll go over a couple different examples of when mean and median are interchangeable, when you should use median, and when you should use mean. Below, we’ll discuss:
- The difference between mean and median
- When it’s better to use mean or median
- The difference between normal and skewed data
Let’s get started.
The Difference Between Mean and Median
More and more leaders in the nonprofit space are learning about the importance of data. Sometimes the sheer amount of data can be overwhelming, and how best to talk about it unclear.
Enter summary statistics, or a single number, or set of numbers, that describe the data set. In the nonprofit space an example of these statistics are “standard donation size” or “typical number of participants.” The most common summary statistic used is mean (often also called average). Another summary statistic you’re likely familiar with is median.
We’ll start with some quick definitions for our three summary statistics:
- The mean is the sum of all values divided by the number of values
- The median is the middle number in an ordered list
- Average is usually just another word for mean, but people will use it in place of mean or median (or mode…but let’s not worry about that)
To understand the difference between mean and median let’s take a very simple example. Say you had five recurring donors last year. Over the course of the year they each gave respective totals of $10, $20, $30, $40, and $60.
- The mean is the sum of those numbers (10 + 20 + 30 + 40 + 60 = 160) divided by the number of donors (160 / 5 = $32)
- The median takes a sorted list, generally in ascending order, (10, 20, 30, 40, 60) and finds the middle number ($30)
- In this case mean and median are pretty similar ($32 and $30, respectively), but slightly different because we used different methods. Again, either of these statistics could be used to talk about the “average” of the data.
Mean is Typical for “Normal” Data
Let’s use another simple example, but one with a full dataset: San Diego weather. San Diego (where Classy is based) is known for its perfect weather year-round. Yet, each day isn’t exactly the same—some are a little colder and some are a little hotter. To visualize this data I pulled the daily temperature (to get daily temperature we averaged the daily minimum and maximum) for every day in 2017 from the National Oceanic and Atmospheric Administration (NOAA).
Instead of looking at every single day individually, I’m going to summarize the data with a histogram.
Some key observations about the plot are:
- On the x-axis are ranges of temperatures (for example, 50°F to 55°F)
- The height of the bar (the y-axis) is the number of days in that temperature range
- From this plot, I know there were about 10 days in 2017 with a temperature between 50°F and 55°F (or winter as we call it in San Diego).
Notice that bar height (the number of days in a temperature range) increases as we get to middle temperatures, and then decreases as we get to higher temperatures. A plot like this, short bars on either side and tall bars in the middle, is called a normal distribution—or bell curve.
This plot tells us a lot about temperatures in San Diego, but ideally we want one number to talk about the “typical” temperature in San Diego. To summarize the data:
- The mean temperature on a given day in San Diego is 66.1°F
- The median temperature is 65.6°F
- The lines for mean and median on the histogram are located near the tallest bars, which shows that our mean and median represent what most days in San Diego are like
- While the two numbers are different, they are pretty similar, so probably either one could be used to represent the data (I at least can’t tell the difference between a 66.1°F day and a 65.6°F day!)
- Plot your data in a histogram to get a good look at the shape of your data
- For normally distributed data, mean or median can be used to talk “typical” measurements
Mean is Atypical for Skewed Data
Our last example showed some normal data, where either mean or median was a good summary statistic to use. However, not all data looks like this, and this is especially true for fundraising data.
Fundraising data tends to look abnormal, meaning there is a skew. Skew is when instead of there being a similar number of low and high values (short bars on either side), there are, for example, a lot of small values (one or two tall bars at the beginning) and then some large numbers that continue for a long time (many small bars).
As you can see in the above image, there is still one bar that is the tallest, but pushed to one side of the histogram. Let’s walk through an example with recurring donations to see how this might come up when looking at fundraising data.
Below, I examine a histogram of how much recurring donors gave in 2017 on Classy.
In this histogram:
- The x-axis is ranges of dollar values
- The height of the bars is number of recurring donors
We want to know how much the “typical” recurring donor gave last year. Our summary statistics are:
- The mean is $231
- The median is $100
In this case, the mean is 2.31 times larger than the median. Why are these numbers so different, and which one should we use?
Looking at the plot, we see that most of the recurring donors fell in that first bar of $0 to $100.
However, there were some donors who gave a lot more than that—so much so, that some of them didn’t even fit on the plot! If we compute the mean, by adding up all donors and dividing by the number of donors, the donors who gave very large donations contribute more to the mean than those who gave very little. We can say they “pull up” the mean. Conversely, median is not impacted by a few really high donors, all it cares about is the middle.
To better understand what’s going on, let’s go back to our simple example with 5 donors.
- Our donors gave $10, $20, $30, $40, $60
- What if the fifth donor actually gave $6,000?
- The median is still $30 (the middle number in 10, 20, 30, 40, 6,000)
- The mean is now $1,220 ((10 + 20 + 30 + 40 + 6,000) / 5) up from $32
- The donor who gave $6,000 had a big effect on the mean, but not median
- Fundraising data is highly skewed and typically not “normal”
- With skewed data, the median indicates the “typical” donor; most donors give values closer to the median than the mean
- With skewed data, big differences between median and mean can help indicate the presence of large donations
So…What Do We Use When?
Based on what we’ve looked at so far you may be wondering: should I ever use mean, then? When data was normal, using either mean or median was fine, and when data was skewed median was better. There still are times when it’s better to use mean, though. If you are doing some kind of forecasting (such as total predicted raised for a campaign) it is useful (and maybe even necessary) to use mean.
For example, imagine if instead of wanting to know how much a typical recurring donor gave, you wanted to predict how much you will raise from all recurring donors in 2018 based on values from 2017. The only numbers you have are the mean, median, and the number of recurring donors from 2017. Let’s use our example from earlier with our five donors who gave $10, $20, $30, $40, $6,000.
- 2017 numbers:
- Mean = $1,220
- Median = $30
- Number of recurring donors = 5
- The mean ($1,220) times the number of donors (5) results in a total of $6,100
- The median ($30) times number of donors (5) results in a total of $150
- We can see that the calculation using the mean returns the correct total raised (10 + 20 + 30 + 40 + 6,000 = 6,100)
- If you’re doing any forecasting, use mean over median
- Median produces an artificially low total when forecasting
- Mean gives us the exact, more likely to be accurate number
Leveraging Summary Statistics in Your Data
Summary statistics are extremely helpful when reviewing fundraising data. As you examine your results throughout the year, keep these things in mind to leverage the right statistics.
- Fundraising data is highly unbalanced (is “skewed”). For example, most people give small amounts, but there are some people who give much larger amounts.
- Always try to visualize your data through a histogram to get a sense of how your data is distributed. You can use Excel or Google Sheets to make a histogram of your data.
- Compute both mean and median. If they are different, make note of that and think about why. Use your histogram to see if the difference has to do with the shape of the data.
- If they are different:
- and you want to talk about a “typical” example of something, use the median.
- and you want to do forecasting to get numbers like total amount of money, use mean.
Also, be sure to keep these ideas in mind both when reading reports with summary statistics and computing your own numbers.
Interested in learning more about fundraising data? Check out our report, The State of Modern Philanthropy.
Happy data analysis!
This is the final post in our series covering donor insights. For more tips around how to leverage data at your organization, be sure to download the free guide, The Quick Start Guide to Data-Driven Fundraising.