Extract rows that meet specific conditions from a Python DataFrame
PythonHere, let's walk through an example of how to extract rows from a Pandas DataFrame that meet specific conditions and then iterate over those rows.
Example Scenario:
Let's assume we have a DataFrame containing information about employees, including their names, departments, salaries, and ages. We want to extract all rows where the employee's salary is greater than 50,000 and their age is less than 40. After extracting these rows, we will iterate over them to perform some operations.
Step 1: Import Pandas and Create the DataFrame
import pandas as pd
# Sample data
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
'Department': ['HR', 'Finance', 'IT', 'IT', 'Finance'],
'Salary': [60000, 45000, 70000, 55000, 80000],
'Age': [35, 42, 38, 29, 45]
}
# Create DataFrame
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Step 2: Extract Rows Based on Conditions
We can use boolean indexing to filter the DataFrame based on the conditions.
# Define the conditions
condition = (df['Salary'] > 50000) & (df['Age'] < 40)
# Apply the conditions to the DataFrame
filtered_df = df[condition]
print("\nFiltered DataFrame:")
print(filtered_df)
Step 3: Iterate Over the Filtered Rows
Now that we have the filtered DataFrame, we can iterate over the rows using the iterrows()
method or itertuples()
method.
Using iterrows()
:
print("\nIterating over filtered rows using iterrows():")
for index, row in filtered_df.iterrows():
print(f"Index: {index}, Name: {row['Name']}, Department: {row['Department']}, Salary: {row['Salary']}, Age: {row['Age']}")
Using itertuples()
:
print("\nIterating over filtered rows using itertuples():")
for row in filtered_df.itertuples(index=False):
print(f"Name: {row.Name}, Department: {row.Department}, Salary: {row.Salary}, Age: {row.Age}")
Full Example Code:
import pandas as pd
# Sample data
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
'Department': ['HR', 'Finance', 'IT', 'IT', 'Finance'],
'Salary': [60000, 45000, 70000, 55000, 80000],
'Age': [35, 42, 38, 29, 45]
}
# Create DataFrame
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# Define the conditions
condition = (df['Salary'] > 50000) & (df['Age'] < 40)
# Apply the conditions to the DataFrame
filtered_df = df[condition]
print("\nFiltered DataFrame:")
print(filtered_df)
# Iterate over filtered rows using iterrows()
print("\nIterating over filtered rows using iterrows():")
for index, row in filtered_df.iterrows():
print(f"Index: {index}, Name: {row['Name']}, Department: {row['Department']}, Salary: {row['Salary']}, Age: {row['Age']}")
# Iterate over filtered rows using itertuples()
print("\nIterating over filtered rows using itertuples():")
for row in filtered_df.itertuples(index=False):
print(f"Name: {row.Name}, Department: {row.Department}, Salary: {row.Salary}, Age: {row.Age}")
Output:
Original DataFrame:
Name Department Salary Age
0 Alice HR 60000 35
1 Bob Finance 45000 42
2 Charlie IT 70000 38
3 David IT 55000 29
4 Eva Finance 80000 45
Filtered DataFrame:
Name Department Salary Age
0 Alice HR 60000 35
2 Charlie IT 70000 38
3 David IT 55000 29
Iterating over filtered rows using iterrows():
Index: 0, Name: Alice, Department: HR, Salary: 60000, Age: 35
Index: 2, Name: Charlie, Department: IT, Salary: 70000, Age: 38
Index: 3, Name: David, Department: IT, Salary: 55000, Age: 29
Iterating over filtered rows using itertuples():
Name: Alice, Department: HR, Salary: 60000, Age: 35
Name: Charlie, Department: IT, Salary: 70000, Age: 38
Name: David, Department: IT, Salary: 55000, Age: 29
Explanation:
- Boolean Indexing: We used boolean indexing to filter the DataFrame based on
the conditions
(df['Salary'] > 50000)
and(df['Age'] < 40)
. iterrows()
: This method returns an iterator yielding index and row data as Series. It is useful when you need both the index and the row data.itertuples()
: This method returns an iterator yielding namedtuples of the rows. It is generally faster thaniterrows()
and is useful when you don't need the index.
This example demonstrates how to filter rows based on specific conditions and then iterate over the filtered rows in a Pandas DataFrame.