Welcome back to our Pandas course module. In this section, we will delve into data analysis and manipulation techniques. These techniques are essential for deriving insights from your data and preparing it for further analysis. We will cover grouping and aggregating data, merging and joining DataFrames, and performing basic statistical operations. Let's get started!
Grouping and Aggregating Data
Grouping and aggregating data allows you to summarize large datasets based on specific criteria. The groupby
method is used to split data into groups based on some criteria, and then you can apply aggregation functions to each group independently.
# Grouping data by 'City' and calculating the mean age
grouped = dataframe.groupby('City')['Age'].mean()
print(grouped)
In this example, we group the DataFrame by the 'City' column and then calculate the mean age for each city.
You can also apply multiple aggregation functions using the agg
method:
# Applying multiple aggregation functions
aggregated = dataframe.groupby('City').agg({'Age': ['mean', 'max'], 'Name': 'count'})
print(aggregated)
This code groups the data by 'City' and then calculates the mean and maximum age, as well as the count of names in each city.
Merging and Joining DataFrames
Merging and joining DataFrames are crucial operations when working with multiple datasets. Pandas provides several methods for combining DataFrames, such as merge
, join
, and concat
.
Here’s how to merge two DataFrames:
# Creating two DataFrames to merge
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'value2': [4, 5, 6]})
# Merging DataFrames on the 'key' column
merged_df = pd.merge(df1, df2, on='key', how='inner')
print(merged_df)
In this example, we merge df1
and df2
on the 'key' column using an inner join, which includes only the rows with matching keys in both DataFrames.
Performing Basic Statistical Operations
Pandas makes it easy to perform basic statistical operations on your data. Here are some common operations:
# Calculating summary statistics
print(dataframe['Age'].sum()) # Sum of ages
print(dataframe['Age'].mean()) # Mean age
print(dataframe['Age'].describe()) # Summary statistics
These operations provide quick insights into the distribution and central tendency of your data.
Practical Example
Let’s apply what we've learned so far in a practical example. Suppose we have a DataFrame with sales data, and we want to analyze the total and average sales per product category:
# Sample sales data
sales_data = {
'Product': ['A', 'B', 'A', 'B', 'C'],
'Category': ['Electronics', 'Furniture', 'Electronics', 'Furniture', 'Electronics'],
'Sales': [200, 150, 300, 100, 250]
}
sales_df = pd.DataFrame(sales_data)
# Grouping and aggregating sales by category
category_sales = sales_df.groupby('Category').agg({'Sales': ['sum', 'mean']})
print(category_sales)
In this example, we group the sales data by 'Category' and calculate the total and average sales for each category.
Summary
In this section, we explored key data analysis and manipulation techniques in Pandas, including grouping and aggregating data, merging and joining DataFrames, and performing basic statistical operations. These skills are essential for any data analysis workflow and will help you derive meaningful insights from your data.
Next, we will cover data visualization with Pandas, which will allow you to create graphical representations of your data to uncover trends and patterns. Stay tuned!