Explaining the Empirical Rule for Normal Distribution

The empirical rule, also known as the 68-95-99.7 rule, represents the spread of data within a normal distribution. Here’s what you need to know.

Written by Michael Galarnyk
Published on Jul. 29, 2022
Empirical Rule Julius Caeser
Image: Shutterstock / Built In
Brand Studio Logo

Normal distribution is commonly associated with the 68-95-99.7 rule, or empirical rule, which you can see in the image below. Sixty-eight percent of the data is within one standard deviation (σ) of the mean (μ), 95 percent of the data is within two standard deviations (σ) of the mean (μ), and 99.7 percent of the data is within three standard deviations (σ) of the mean (μ).

empirical rule graph
Sixty-eight percent of the data is within one standard deviation, 95 percent is within two standard deviation and 99.7 percent is within three standard deviations. | Image: Michael Galarnyk

This post explains how those numbers were derived in the hope that they can be more interpretable for your future endeavors. 

What Is the Empirical Rule? 

The empirical rule, also known as the 68-95-99.7 rule, represents the percentages of values within an interval for a normal distribution. That is, 68 percent of data is within one standard deviation of the mean; 95 percent of data is within two standard deviation of the mean and 99.7 percent of data is within three standard deviation of the mean.  

As always, the code used to make everything — including the graphs — is available on my GitHub. With that, let’s get started.

A tutorial explaining the empirical rule for a normal distribution. | Video: Joshua Emmanuel


Empirical Rule & the Probability Density Function

To understand where the 68-95-99.7 percentages come from, it’s important to first understand the probability density function, known as the PDF. A PDF is used to specify the probability of the random variable falling within a particular range of values, as opposed to taking on any one value. The integral of the variable’s PDF over the range gives its probability. 

What Is a Probability Density Function?

A probability density function (PDF) specifies the probability of the random variable falling within a particular range of values, as opposed to taking on any one value. 

That is, it’s given by the area under the density function but above the horizontal axis, and by the area between the lowest and greatest values of the range. This definition might not make much sense, so let’s graph the probability density function for a normal distribution to clear it up. The probability density function for a normal distribution is represented in the equation below:

empirical rule formula
PDF for a Normal Distribution

Let’s simplify it by assuming we have a mean (μ) of zero and a standard deviation (σ) of one.

empirical rule probability density function
PDF for a Normal Distribution

Now that the function is simpler, let’s graph this function with a range from -3 to 3.

# Import all libraries for the rest of the blog post
from scipy.integrate import quad
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon
%matplotlib inline

x = np.linspace(-3, 3, num = 100)
constant = 1.0 / np.sqrt(2*np.pi)
pdf_normal_distribution = constant * np.exp((-x**2) / 2.0)
fig, ax = plt.subplots(figsize=(10, 5));
ax.plot(x, pdf_normal_distribution);
ax.set_title('Normal Distribution', size = 20);
ax.set_ylabel('Probability Density', size = 20);
empirical rule
Image: Michael Galarnyk

More on DataUnderstanding Train Test Split


How to Find the Probability of Events 

The graph above does not show you the probability of events but their probability density. We will need to integrate to get the probability of an event within a given range. Suppose we are interested in finding the probability of a random data point landing within one standard deviation of the mean. We need to integrate from -1 to 1. This can be done with SciPy.

# Make a PDF for the normal distribution a function
def normalProbabilityDensity(x):
   constant = 1.0 / np.sqrt(2*np.pi)
   return(constant * np.exp((-x**2) / 2.0) )

# Integrate PDF from -1 to 1
result, _ = quad(normalProbabilityDensity, -1, 1, limit = 1000)
empirical rule probability of events 68 percent
Code to integrate the PDF of a normal distribution (left) and visualization of the integral (right). | Image: Michael Galarnyk

You’ll see that 68 percent of the data is within one standard deviation (σ) of the mean (μ).

If you are interested in finding the probability of a random data point landing within two standard deviations of the mean, you need to integrate from -2 to 2.

Empirical Rule probability of events graph and equation
Code to integrate the PDF of a normal distribution (left) and visualization of the integral (right). | Image: Michael Galarnyk

Now, 95 percent of the data is within two standard deviations (σ) of the mean (μ).

If you are interested in finding the probability of a random data point landing within three standard deviations of the mean, you need to integrate from -3 to 3.

Empirical rule probability of events 99.7
Code to integrate the PDF of a normal distribution (left) and visualization of the integral (right). | Image: Michael Galarnyk

And now, 99.7 percent of the data is within three standard deviations (σ) of the mean (μ).

It is important to note that for any probability density function, the area under the curve must be one. The probability of drawing any number from the function’s range is always one.

You will also find that it is also possible for observations to fall four, five or even more standard deviations from the mean, but this is very rare if you have a normal, or nearly normal, distribution.

empirical rule probability of events
If you want to learn how I made some of my graphs or how to make your data visualizations better, please consider taking my Python for Data Visualization course. | Image: Michael Galarnyk

You can now take this knowledge and apply it to boxplots.

Explore Job Matches.