Unique values from multiple columns in Pandas DataFrame In a typical data science project, the dataset is often large and complex. It may contain many columns with different types of attributes. Sometimes you will need to extract values from multiple columns in a single cell for further computation or visualization.
In this article, we will discuss various methods to obtain different values from multiple columns in Pandas DataFrame.
Table of Contents
Use Pandas Unique() and Concat() Methods to Filter Out Uniques
This blog post will cover how to use pandas Unique() and Concat() methods. The pandas series aka columns has a unique method that filters out only unique items from a column. The first output shows only unique FirstNames, which is what we want in this case.
We can extend the method using pandas concat() and concat all the desired columns into 1 single column and then find the unique of the resultant column while also saving it as a new dataframe.
Related Topics
Example Code:
import pandas as pd
import numpy as np
df = pd.DataFrame({'FirstName': ['Anmol', 'Sakshi', 'Maryam'],
'LastName': ['Lohana', 'Chawla', 'Pathan'],
'Age': [23, 25, 27]})
print(f"Unique FN: {df['FirstName'].unique()}")
print(f"Unique Values from 3 Columns:\
{pd.concat([df['FirstName'],df['LastName'],df['Age']]).unique()}")
Output

Using Numpy.unique() to Get Unique Values from an Array
Numpy.unique() method is used to get the unique items from an array given as parameter in np.unique() method.
This approach has one limitation i.e., we cannot combine str and numerical columns together, and therefore if such a situation arises where we need to club different datatypes columns together then go for Method 1 which deals with both types of data type column combination appropriately.
Example Code
import pandas as pd
import numpy as np
df = pd.DataFrame({'FirstName': ['Anmol', 'Sakshi', 'Maryam'],
'LastName': ['Lohana', 'Chawla', 'Pathan'],
'Age': [23, 25, 27]})
print(np.unique(df[['LastName', 'FirstName']].values))
Output

Creating Sets in Python: Union of Unique Values
The set object in Python is a mutable data structure that has the property of containing only unique values. This means that it can be used to identify duplicates.
The set union method allows for unions of different datatype combinations, unlike Method 2 which requires specific datatypes to work properly.
Example Code
import pandas as pd
import numpy as np
df = pd.DataFrame({'FirstName': ['Anmol', 'Sakshi', 'Maryam'],
'LastName': ['Lohana', 'Chawla', 'Pathan'],
'Age': [23, 25, 27]})
print(set(df.FirstName) | set(df.LastName) | set(df.Age))
Output

Conclusion
When it comes to analysis, there are a number of ways in which you can obtain the unique value from one or more columns. In this post we’ve covered three different methods for doing so.