Monday, 24 February 2025

Extract rows that meet specific conditions from a Python DataFrame

Python 

Here, let's walk through an example of how to extract rows from a Pandas DataFrame that meet specific conditions and then iterate over those rows.

Example Scenario:

Let's assume we have a DataFrame containing information about employees, including their names, departments, salaries, and ages. We want to extract all rows where the employee's salary is greater than 50,000 and their age is less than 40. After extracting these rows, we will iterate over them to perform some operations.

Step 1: Import Pandas and Create the DataFrame

import pandas as pd

# Sample data
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Department': ['HR', 'Finance', 'IT', 'IT', 'Finance'],
    'Salary': [60000, 45000, 70000, 55000, 80000],
    'Age': [35, 42, 38, 29, 45]
}

# Create DataFrame
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

Step 2: Extract Rows Based on Conditions

We can use boolean indexing to filter the DataFrame based on the conditions.

# Define the conditions
condition = (df['Salary'] > 50000) & (df['Age'] < 40)

# Apply the conditions to the DataFrame
filtered_df = df[condition]

print("\nFiltered DataFrame:")
print(filtered_df)

Step 3: Iterate Over the Filtered Rows

Now that we have the filtered DataFrame, we can iterate over the rows using the iterrows() method or itertuples() method.

Using iterrows():

print("\nIterating over filtered rows using iterrows():")
for index, row in filtered_df.iterrows():
    print(f"Index: {index}, Name: {row['Name']}, Department: {row['Department']}, Salary: {row['Salary']}, Age: {row['Age']}")

Using itertuples():

print("\nIterating over filtered rows using itertuples():")
for row in filtered_df.itertuples(index=False):
    print(f"Name: {row.Name}, Department: {row.Department}, Salary: {row.Salary}, Age: {row.Age}")

Full Example Code:

import pandas as pd

# Sample data
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Department': ['HR', 'Finance', 'IT', 'IT', 'Finance'],
    'Salary': [60000, 45000, 70000, 55000, 80000],
    'Age': [35, 42, 38, 29, 45]
}

# Create DataFrame
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Define the conditions
condition = (df['Salary'] > 50000) & (df['Age'] < 40)

# Apply the conditions to the DataFrame
filtered_df = df[condition]

print("\nFiltered DataFrame:")
print(filtered_df)

# Iterate over filtered rows using iterrows()
print("\nIterating over filtered rows using iterrows():")
for index, row in filtered_df.iterrows():
    print(f"Index: {index}, Name: {row['Name']}, Department: {row['Department']}, Salary: {row['Salary']}, Age: {row['Age']}")

# Iterate over filtered rows using itertuples()
print("\nIterating over filtered rows using itertuples():")
for row in filtered_df.itertuples(index=False):
    print(f"Name: {row.Name}, Department: {row.Department}, Salary: {row.Salary}, Age: {row.Age}")

Output:

Original DataFrame:
      Name Department  Salary  Age
0    Alice         HR   60000   35
1      Bob   Finance   45000   42
2  Charlie        IT   70000   38
3    David        IT   55000   29
4      Eva   Finance   80000   45

Filtered DataFrame:
      Name Department  Salary  Age
0    Alice         HR   60000   35
2  Charlie        IT   70000   38
3    David        IT   55000   29

Iterating over filtered rows using iterrows():
Index: 0, Name: Alice, Department: HR, Salary: 60000, Age: 35
Index: 2, Name: Charlie, Department: IT, Salary: 70000, Age: 38
Index: 3, Name: David, Department: IT, Salary: 55000, Age: 29

Iterating over filtered rows using itertuples():
Name: Alice, Department: HR, Salary: 60000, Age: 35
Name: Charlie, Department: IT, Salary: 70000, Age: 38
Name: David, Department: IT, Salary: 55000, Age: 29

Explanation:

  • Boolean Indexing: We used boolean indexing to filter the DataFrame based on the conditions (df['Salary'] > 50000) and (df['Age'] < 40).
  • iterrows(): This method returns an iterator yielding index and row data as Series. It is useful when you need both the index and the row data.
  • itertuples(): This method returns an iterator yielding namedtuples of the rows. It is generally faster than iterrows() and is useful when you don't need the index.

This example demonstrates how to filter rows based on specific conditions and then iterate over the filtered rows in a Pandas DataFrame.



Search