Join two DataFrames and filter the rows based on a condition in Python
PythonLet's walk through an example where we have two DataFrames representing two tables, and we want to join them based on a common column, filter the rows based on a condition, and then iterate through the resulting rows to print the data.
Example Scenario:
- Table 1 (df1): Contains employee information (EmployeeID, Name, DepartmentID).
- Table 2 (df2): Contains department information (DepartmentID, DepartmentName).
We want to join these two tables on the DepartmentID
column, filter the rows where
the DepartmentName
is "Sales", and then iterate through the resulting rows to print
the employee details.
Step-by-Step Implementation:
- Import necessary libraries:
import pandas as pd
- Create the DataFrames:
# DataFrame for employee information
data1 = {
'EmployeeID': [1, 2, 3, 4],
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'DepartmentID': [101, 102, 101, 103]
}
df1 = pd.DataFrame(data1)
# DataFrame for department information
data2 = {
'DepartmentID': [101, 102, 103],
'DepartmentName': ['Sales', 'HR', 'IT']
}
df2 = pd.DataFrame(data2)
- Join the DataFrames:
# Perform an inner join on 'DepartmentID'
merged_df = pd.merge(df1, df2, on='DepartmentID', how='inner')
- Filter rows based on a condition:
# Filter rows where DepartmentName is 'Sales'
filtered_df = merged_df[merged_df['DepartmentName'] == 'Sales']
- Iterate through the filtered rows and print the data:
# Iterate through the rows and print the employee details
for index, row in filtered_df.iterrows():
print(f"EmployeeID: {row['EmployeeID']}, Name: {row['Name']}, DepartmentID: {row['DepartmentID']}, DepartmentName: {row['DepartmentName']}")
Full Code:
import pandas as pd
# DataFrame for employee information
data1 = {
'EmployeeID': [1, 2, 3, 4],
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'DepartmentID': [101, 102, 101, 103]
}
df1 = pd.DataFrame(data1)
# DataFrame for department information
data2 = {
'DepartmentID': [101, 102, 103],
'DepartmentName': ['Sales', 'HR', 'IT']
}
df2 = pd.DataFrame(data2)
# Perform an inner join on 'DepartmentID'
merged_df = pd.merge(df1, df2, on='DepartmentID', how='inner')
# Filter rows where DepartmentName is 'Sales'
filtered_df = merged_df[merged_df['DepartmentName'] == 'Sales']
# Iterate through the rows and print the employee details
for index, row in filtered_df.iterrows():
print(f"EmployeeID: {row['EmployeeID']}, Name: {row['Name']}, DepartmentID: {row['DepartmentID']}, DepartmentName: {row['DepartmentName']}")
Output:
EmployeeID: 1, Name: Alice, DepartmentID: 101, DepartmentName: Sales
EmployeeID: 3, Name: Charlie, DepartmentID: 101, DepartmentName: Sales
Explanation:
- Join: We used
pd.merge()
to joindf1
anddf2
on theDepartmentID
column. - Filter: We filtered the merged DataFrame to include only rows where the
DepartmentName
is "Sales". - Iterate: We used
iterrows()
to iterate through the filtered DataFrame and print the relevant employee details.
This example demonstrates how to join two DataFrames, filter the resulting rows based on a condition, and then iterate through the filtered rows to print the data.