ListenData: Python : 10 Ways to Filter Pandas DataFrame

In this article, we will cover various methods to filter pandas dataframe in Python. Data Filtering is one of the most frequent data manipulation operation. It is similar to WHERE clause in SQL or you must have used filter in MS Excel for selecting specific rows based on some conditions. In terms of speed, python has an efficient way to perform filtering and aggregation. It has an excellent package called pandas for data wrangling tasks. Pandas has been built on top of numpy package which was written in C language which is a low level language. Hence data manipulation using pandas package is fast and smart way to handle big sized datasets.

Examples of Data Filtering

It is one of the most initial step of data preparation for predictive modeling or any reporting project. It is also called ‘Subsetting Data’. See some of the examples of data filtering below.

  • Select all the active customers whose accounts were opened after 1st January 2019
  • Extract details of all the customers who made more than 3 transactions in the last 6 months
  • Fetch information of employees who spent more than 3 years in the organization and received highest rating in the past 2 years
  • Analyze complaints data and identify customers who filed more than 5 complaints in the last 1 year
  • Extract details of metro cities where per capita income is greater than 40K dollars
filter pandas dataframe
Import Data

Make sure pandas package is already installed before submitting the following code. You can check it by running !pip show pandas statement in Ipython console. If it is not installed, you can install it by using the command !pip install pandas.

We are going to use dataset containing details of flights departing from NYC in 2013. This dataset has 32735 rows and 16 columns. See column names below. To import dataset, we are using read_csv( ) function from pandas package.

['year', 'month', 'day', 'dep_time', 'dep_delay', 'arr_time',
'arr_delay', 'carrier', 'tailnum', 'flight', 'origin', 'dest',
'air_time', 'distance', 'hour', 'minute']
import pandas as pd
df = pd.read_csv("https://dyurovsky.github.io/psyc201/data/lab2/nycflights.csv")

Filter pandas dataframe by column value

Select flights details of JetBlue Airways that has 2 letters carrier code B6 with origin from JFK airport

READ MORE »

Planet Python

Leave a Reply

Your email address will not be published. Required fields are marked *