If you want to convert a NumPy array to Pandas DataFrame, you have three options. The first two boil down to passing in a one-dimensional or two-dimensional NumPy array to a call to pd.DataFrame
, and the last one leverages the built-in from_records()
method. You’ll learn all three approaches today, with a ton of hands-on examples.
How to Convert a NumPy Array to Pandas DataFrame
The following code snippet converts a NumPy array to Pandas DataFrame with a column name:
arr = np.array([1, 2, 3])
data = pd.DataFrame(arr)
data
While there are many more ways to convert a NumPy array to DataFrame, you only need these three. Everything else is just a modification and brings no novelty to the table.
Regarding library imports, you’ll need both NumPy and Pandas today, so stick these two lines at the top of your Python script or notebook:
import numpy as np
import pandas as pd
1. Convert One and Two-Dimensional NumPy Arrays to Pandas DataFrame
Think of one-dimensional arrays as vectors or distinct features in the data set. For example, a one-dimensional array can represent age, first name, date of birth or job title, but it can’t represent all of them. You’d need four one-dimensional arrays to do so.
The following code snippet converts a one-dimensional NumPy array to Pandas DataFrame:
arr = np.array([1, 2, 3])
data = pd.DataFrame(arr)
data
It’s just a vector of numbers, so the resulting DataFrame won’t be too interesting:
data:image/s3,"s3://crabby-images/bdadb/bdadb4c72780b36839cb0096f5b4e739437ca79e" alt="DataFrame from NumPy array"
In case you want to convert a NumPy array to Pandas DataFrame with a column name, you’ll have to provide a value to the columns argument. It has to be a list, so keep that in mind:
arr = np.array([1, 2, 3])
data = pd.DataFrame(arr, columns=["Number"])
data
The resulting DataFrame has a bit more context now:
data:image/s3,"s3://crabby-images/b4103/b4103233aa0fb249faa261b2dcce82da410dcffb" alt="DataFrame from NumPy array with column name"
Now, DataFrames with only a single feature aren’t the most interesting, so let’s see how we can spice things up with multidimensional NumPy arrays.
Two-Dimensional NumPy Array to Pandas DataFrame
Think of two-dimensional arrays as matrices. We have rows and columns, where each row represents the values for one observation, measured across multiple features (columns). Each column contains information on the same feature across multiple observations.
Let’s go through a dummy example first, just so you can grasp how to leverage Pandas to create DataFrame from an array:
arr = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])
data = pd.DataFrame(arr, columns=["Num 1", "Num 2", "Num 3"])
data
The DataFrame has three observations (rows) measured through three features (columns):
data:image/s3,"s3://crabby-images/5b62c/5b62c1b5b7d847dd4719f4fb030374c154813c9f" alt="DataFrame from Multidimensional NumPy array"
Maybe the dummy example doesn’t paint you the full picture, so take a look at the following example. In it, we’re declaring a two-dimensional NumPy array of employees.
Each row is a single observation showcasing details of each employee’s first name, last name and email address. Each column is essentially a one-dimensional array (vector) representing either first names, last names or emails, across all records:
employees = np.array([
["Bob", "Doe", "[email protected]"],
["Mark", "Markson", "[email protected]"],
["Jane", "Swift", "[email protected]"],
["Patrick", "Johnson", "[email protected]"]
])
data = pd.DataFrame(employees, columns=["First name", "Last name", "Email"])
Here’s the resulting DataFrame:
data:image/s3,"s3://crabby-images/4269e/4269e8aae9329c8f4e406ff1581b61c31211c3bf" alt="DataFrame from real data in NumPy arrays"
And that’s how you can convert both one–dimensional and two-dimensional NumPy arrays to Pandas DataFrames. Let’s take a look at another way of doing the same thing, which is with the built-in from_records()
method.
2. Convert NumPy Array to Pandas DataFrame With the from_records() Method
Pandas has a built-in method that allows you to convert a multidimensional NumPy array to Pandas DataFrame. It’s called from_records()
, and it’s specific to the DataFrame class.
Truth be told, you don’t have to use it, since it provides no advantage over the conversion approaches we’ve covered so far. But still, if you want a dedicated method, here’s how to use it:
employees = np.array([
["Bob", "Doe", "[email protected]"],
["Mark", "Markson", "[email protected]"],
["Jane", "Swift", "[email protected]"],
["Patrick", "Johnson", "[email protected]"]
])
data = pd.DataFrame.from_records(employees, columns=["First name", "Last name", "Email"])
data
The resulting Pandas DataFrame is identical to the one from the previous section:
data:image/s3,"s3://crabby-images/24865/248650b4a474f7aa0d73c047b45ff47d33deb542" alt="DataFrame from NumPy array with from_records()"
3. Convert NumPy Array to DataFrame Column
You can use NumPy to add additional columns to an existing Pandas DataFrame. For example, the following code snippet declares a Pandas DataFrame from a two-dimensional NumPy array:
employees = np.array([
["Bob", "Doe", "[email protected]"],
["Mark", "Markson", "[email protected]"],
["Jane", "Swift", "[email protected]"],
["Patrick", "Johnson", "[email protected]"]
])
data = pd.DataFrame(employees, columns=["First name", "Last name", "Email"])
data
Image 6 - DataFrame from 2D NumPy array (Image by author)
To convert a NumPy array to a DataFrame column, you only have to declare a new NumPy array and assign it to a new column. Here’s the code:
years_of_experience = np.array([5, 3, 8, 12])
data["Years of Experience"] = years_of_experience
data
The DataFrame now has four columns instead of three:
data:image/s3,"s3://crabby-images/e671d/e671d56f5c150015b7803489edccc428a5b63821" alt="Adding a DataFrame column from NumPy array"
And that’s all for today. Let’s make a short recap next.
Understanding How to Convert NumPy Array to Pandas DataFrame
To conclude, Python’s Pandas library provides a user-friendly API for converting most common data types into Pandas DataFrames, with NumPy array being one of them. This article covered three ways to convert a NumPy array to Pandas DataFrame.
There are some variations to these approaches, but they have nothing to do with Pandas. Learn these three, and you’ll be ready for any data analysis project coming your way.
Frequently Asked Questions
Can Pandas work with NumPy arrays?
Yes, Pandas can work with NumPy arrays, just as well as with plain Python lists. You can declare either a bunch of 1D arrays or a single 2D NumPy array and convert it to a Pandas DataFrame by passing it into the pd.DataFrame()
call. Just remember to specify the column names, otherwise, the default range index will be used.
How can you convert a NumPy array into a Pandas DataFrame?
You can use either a call to pd.DataFrame()
or the pd.DataFrame.from_records()
method. Both of these work identically, and you can leverage them to convert a 2D NumPy array (matrix) to a Pandas DataFrame.
Can a NumPy array contain a list?
You can create a NumPy array from a plain Python list, or even multiple lists. If lists have equal lengths, NumPy will convert them into a multi-dimensional array of numbers. If they have different lengths, NumPy will default to `dtype=object`
construction, meaning the NumPy array will contain separate list objects.