Monday, 24 February 2025

Join two DataFrames and filter the rows based on a condition in Python

Python 

Let's walk through an example where we have two DataFrames representing two tables, and we want to join them based on a common column, filter the rows based on a condition, and then iterate through the resulting rows to print the data.

Example Scenario:

  • Table 1 (df1): Contains employee information (EmployeeID, Name, DepartmentID).
  • Table 2 (df2): Contains department information (DepartmentID, DepartmentName).

We want to join these two tables on the DepartmentID column, filter the rows where the DepartmentName is "Sales", and then iterate through the resulting rows to print the employee details.

Step-by-Step Implementation:

  1. Import necessary libraries:
import pandas as pd
  1. Create the DataFrames:
# DataFrame for employee information
data1 = {
    'EmployeeID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'DepartmentID': [101, 102, 101, 103]
}
df1 = pd.DataFrame(data1)

# DataFrame for department information
data2 = {
    'DepartmentID': [101, 102, 103],
    'DepartmentName': ['Sales', 'HR', 'IT']
}
df2 = pd.DataFrame(data2)
  1. Join the DataFrames:
# Perform an inner join on 'DepartmentID'
merged_df = pd.merge(df1, df2, on='DepartmentID', how='inner')
  1. Filter rows based on a condition:
# Filter rows where DepartmentName is 'Sales'
filtered_df = merged_df[merged_df['DepartmentName'] == 'Sales']
  1. Iterate through the filtered rows and print the data:
# Iterate through the rows and print the employee details
for index, row in filtered_df.iterrows():
    print(f"EmployeeID: {row['EmployeeID']}, Name: {row['Name']}, DepartmentID: {row['DepartmentID']}, DepartmentName: {row['DepartmentName']}")

Full Code:

import pandas as pd

# DataFrame for employee information
data1 = {
    'EmployeeID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'DepartmentID': [101, 102, 101, 103]
}
df1 = pd.DataFrame(data1)

# DataFrame for department information
data2 = {
    'DepartmentID': [101, 102, 103],
    'DepartmentName': ['Sales', 'HR', 'IT']
}
df2 = pd.DataFrame(data2)

# Perform an inner join on 'DepartmentID'
merged_df = pd.merge(df1, df2, on='DepartmentID', how='inner')

# Filter rows where DepartmentName is 'Sales'
filtered_df = merged_df[merged_df['DepartmentName'] == 'Sales']

# Iterate through the rows and print the employee details
for index, row in filtered_df.iterrows():
    print(f"EmployeeID: {row['EmployeeID']}, Name: {row['Name']}, DepartmentID: {row['DepartmentID']}, DepartmentName: {row['DepartmentName']}")

Output:

EmployeeID: 1, Name: Alice, DepartmentID: 101, DepartmentName: Sales
EmployeeID: 3, Name: Charlie, DepartmentID: 101, DepartmentName: Sales

Explanation:

  • Join: We used pd.merge() to join df1 and df2 on the DepartmentID column.
  • Filter: We filtered the merged DataFrame to include only rows where the DepartmentName is "Sales".
  • Iterate: We used iterrows() to iterate through the filtered DataFrame and print the relevant employee details.

This example demonstrates how to join two DataFrames, filter the resulting rows based on a condition, and then iterate through the filtered rows to print the data.



Search