It is one of the most initial step of data preparation for predictive modeling or any reporting project. It is also called ‘Subsetting Data’. See some of the examples of data filtering below.
- Select all the active customers whose accounts were opened after 1st January 2019
- Extract details of all the customers who made more than 3 transactions in the last 6 months
- Fetch information of employees who spent more than 3 years in the organization and received highest rating in the past 2 years
- Analyze complaints data and identify customers who filed more than 5 complaints in the last 1 year
- Extract details of metro cities where per capita income is greater than 40K dollars
Make sure pandas package is already installed before submitting the following code. You can check it by running
!pip show pandas statement in Ipython console. If it is not installed, you can install it by using the command
!pip install pandas.
We are going to use dataset containing details of flights departing from NYC in 2013. This dataset has 32735 rows and 16 columns. See column names below. To import dataset, we are using
read_csv( ) function from pandas package.
['year', 'month', 'day', 'dep_time', 'dep_delay', 'arr_time',
'arr_delay', 'carrier', 'tailnum', 'flight', 'origin', 'dest',
'air_time', 'distance', 'hour', 'minute']
import pandas as pd
df = pd.read_csv("https://dyurovsky.github.io/psyc201/data/lab2/nycflights.csv")
Filter pandas dataframe by column value
B6with origin from