Saturday, 22 February 2025

Load data into a Pandas DataFrame in Python

Python 

Using Python to load data is highly beneficial for several reasons:

  1. Ease of Use: Python's syntax is intuitive and straightforward, which makes writing and understanding code easier, especially for loading data.
  2. Comprehensive Libraries: Python has a wealth of libraries like pandas, numpy, and csv that provide powerful tools for loading and manipulating data from various sources such as CSV files, Excel files, databases, and more.
  3. Flexibility: Python supports multiple data formats, including CSV, Excel, JSON, XML, and databases, making it a versatile choice for data loading tasks.
  4. Performance: Libraries like pandas and numpy are optimized for performance, allowing for efficient data loading and processing even with large datasets.
  5. Community Support: Python has a large and active community, meaning there are abundant resources, tutorials, and forums available for troubleshooting and learning best practices.
  6. Integration: Python can easily integrate with other technologies and tools, allowing for seamless data workflows and pipelines.

Below is an example of sample data and how to load it into a Pandas DataFrame.

Sample Data

Let's assume we have the following data about employees:

ID Name Age City Income
1 Alice 25 New York 50000
2 Bob 30 Los Angeles 60000
3 Charlie 35 Chicago 70000
4 David 40 Houston 80000
5 Eve 45 Phoenix 90000

1. Loading Data from a CSV File

If the data is stored in a CSV file (e.g., employees.csv), you can load it into a Pandas DataFrame as follows:

Step 1: Create the CSV File

Save the following content into a file named employees.csv:

ID,Name,Age,City,Income
1,Alice,25,New York,50000
2,Bob,30,Los Angeles,60000
3,Charlie,35,Chicago,70000
4,David,40,Houston,80000
5,Eve,45,Phoenix,90000

Step 2: Load the CSV File into a DataFrame

import pandas as pd

# Load the CSV file into a DataFrame
df = pd.read_csv('employees.csv')

# Display the DataFrame
print(df)

Output:

   ID     Name  Age         City  Income
0   1    Alice   25     New York   50000
1   2      Bob   30  Los Angeles   60000
2   3  Charlie   35      Chicago   70000
3   4    David   40      Houston   80000
4   5      Eve   45      Phoenix   90000

2. Loading Data from a Dictionary

If the data is stored in a Python dictionary, you can create a DataFrame directly from it.

Step 1: Define the Data as a Dictionary

data = {
    'ID': [1, 2, 3, 4, 5],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Income': [50000, 60000, 70000, 80000, 90000]
}

Step 2: Create a DataFrame from the Dictionary

import pandas as pd

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)

# Display the DataFrame
print(df)

Output:

   ID     Name  Age         City  Income
0   1    Alice   25     New York   50000
1   2      Bob   30  Los Angeles   60000
2   3  Charlie   35      Chicago   70000
3   4    David   40      Houston   80000
4   5      Eve   45      Phoenix   90000

3. Loading Data from a List of Lists

If the data is stored as a list of lists, you can also create a DataFrame from it.

Step 1: Define the Data as a List of Lists

data = [
    [1, 'Alice', 25, 'New York', 50000],
    [2, 'Bob', 30, 'Los Angeles', 60000],
    [3, 'Charlie', 35, 'Chicago', 70000],
    [4, 'David', 40, 'Houston', 80000],
    [5, 'Eve', 45, 'Phoenix', 90000]
]

# Define column names
columns = ['ID', 'Name', 'Age', 'City', 'Income']

Step 2: Create a DataFrame from the List of Lists

import pandas as pd

# Create a DataFrame from the list of lists
df = pd.DataFrame(data, columns=columns)

# Display the DataFrame
print(df)

Output:

   ID     Name  Age         City  Income
0   1    Alice   25     New York   50000
1   2      Bob   30  Los Angeles   60000
2   3  Charlie   35      Chicago   70000
3   4    David   40      Houston   80000
4   5      Eve   45      Phoenix   90000

4. Loading Data from an Excel File

If the data is stored in an Excel file (e.g., employees.xlsx), you can load it into a DataFrame as follows:

Step 1: Create the Excel File

Save the data into an Excel file named employees.xlsx with a sheet named Employees.

Step 2: Load the Excel File into a DataFrame

import pandas as pd

# Load the Excel file into a DataFrame
df = pd.read_excel('employees.xlsx', sheet_name='Employees')

# Display the DataFrame
print(df)

Output:

   ID     Name  Age         City  Income
0   1    Alice   25     New York   50000
1   2      Bob   30  Los Angeles   60000
2   3  Charlie   35      Chicago   70000
3   4    David   40      Houston   80000
4   5      Eve   45      Phoenix   90000

5. Loading Data from a JSON File

If the data is stored in a JSON file (e.g., employees.json), you can load it into a DataFrame as follows:

Step 1: Create the JSON File

Save the following content into a file named employees.json:

[
    {"ID": 1, "Name": "Alice", "Age": 25, "City": "New York", "Income": 50000},
    {"ID": 2, "Name": "Bob", "Age": 30, "City": "Los Angeles", "Income": 60000},
    {"ID": 3, "Name": "Charlie", "Age": 35, "City": "Chicago", "Income": 70000},
    {"ID": 4, "Name": "David", "Age": 40, "City": "Houston", "Income": 80000},
    {"ID": 5, "Name": "Eve", "Age": 45, "City": "Phoenix", "Income": 90000}
]

Step 2: Load the JSON File into a DataFrame

import pandas as pd

# Load the JSON file into a DataFrame
df = pd.read_json('employees.json')

# Display the DataFrame
print(df)

Output:

   ID     Name  Age         City  Income
0   1    Alice   25     New York   50000
1   2      Bob   30  Los Angeles   60000
2   3  Charlie   35      Chicago   70000
3   4    David   40      Houston   80000
4   5      Eve   45      Phoenix   90000

These examples demonstrate how to load data into a Pandas DataFrame from various sources.



Search