Saltar navegación

1.2.4 Data Analysis and Manipulation in Pandas

Información

In this page, you will find the content of the section in both video and text formats. Videos are interactive and contain embedded content (explanations, links or exercises) throughout their playback.

At the end of this page, you have a link to the Jupyter/Colab notebook where you can practice the theory from this section.

Vídeo

Data Analysis and Manipulation in Pandas

Welcome back to our Pandas course module. In this section, we will delve into data analysis and manipulation techniques. These techniques are essential for deriving insights from your data and preparing it for further analysis. We will cover grouping and aggregating data, merging and joining DataFrames, and performing basic statistical operations. Let's get started!

Grouping and Aggregating Data

Grouping and aggregating data allows you to summarize large datasets based on specific criteria. The groupby method is used to split data into groups based on some criteria, and then you can apply aggregation functions to each group independently.

# Grouping data by 'City' and calculating the mean age
grouped = dataframe.groupby('City')['Age'].mean()
print(grouped)

In this example, we group the DataFrame by the 'City' column and then calculate the mean age for each city.

You can also apply multiple aggregation functions using the agg method:

# Applying multiple aggregation functions
aggregated = dataframe.groupby('City').agg({'Age': ['mean', 'max'], 'Name': 'count'})
print(aggregated)

This code groups the data by 'City' and then calculates the mean and maximum age, as well as the count of names in each city.

Merging and Joining DataFrames

Merging and joining DataFrames are crucial operations when working with multiple datasets. Pandas provides several methods for combining DataFrames, such as merge, join, and concat.

Here’s how to merge two DataFrames:

# Creating two DataFrames to merge
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'value2': [4, 5, 6]})

# Merging DataFrames on the 'key' column
merged_df = pd.merge(df1, df2, on='key', how='inner')
print(merged_df)

In this example, we merge df1 and df2 on the 'key' column using an inner join, which includes only the rows with matching keys in both DataFrames.

Performing Basic Statistical Operations

Pandas makes it easy to perform basic statistical operations on your data. Here are some common operations:

# Calculating summary statistics
print(dataframe['Age'].sum())  # Sum of ages
print(dataframe['Age'].mean())  # Mean age
print(dataframe['Age'].describe())  # Summary statistics

These operations provide quick insights into the distribution and central tendency of your data.

Practical Example

Let’s apply what we've learned so far in a practical example. Suppose we have a DataFrame with sales data, and we want to analyze the total and average sales per product category:

# Sample sales data
sales_data = {
    'Product': ['A', 'B', 'A', 'B', 'C'],
    'Category': ['Electronics', 'Furniture', 'Electronics', 'Furniture', 'Electronics'],
    'Sales': [200, 150, 300, 100, 250]
}
sales_df = pd.DataFrame(sales_data)

# Grouping and aggregating sales by category
category_sales = sales_df.groupby('Category').agg({'Sales': ['sum', 'mean']})
print(category_sales)

In this example, we group the sales data by 'Category' and calculate the total and average sales for each category.

Summary

In this section, we explored key data analysis and manipulation techniques in Pandas, including grouping and aggregating data, merging and joining DataFrames, and performing basic statistical operations. These skills are essential for any data analysis workflow and will help you derive meaningful insights from your data.

Next, we will cover data visualization with Pandas, which will allow you to create graphical representations of your data to uncover trends and patterns. Stay tuned!

Feito con eXeLearning (Nova xanela)