Welcome back to our Pandas course module. In this section, we will explore the core data structures in Pandas: Series and DataFrames. Understanding these structures is crucial as they form the foundation of data manipulation and analysis in Pandas. Let's dive in!
Series
A Series is a one-dimensional labeled array that can hold any data type, such as integers, strings, floats, and even Python objects. Think of a Series as a single column in an Excel spreadsheet. Each element in a Series is assigned a label, also known as an index.
import pandas as pd
# Creating a Series
serie = pd.Series([1, 2, 3, 4, 5])
print(serie)
In this example, we create a Series from a list of numbers. Pandas automatically generates an integer index starting from 0. You can also specify custom indices:
# Creating a Series with custom indices
serie_custom_index = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
print(serie_custom_index)
DataFrames
A DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It is similar to a table in a database or an Excel spreadsheet. DataFrames are incredibly versatile and can be created in several ways, such as from dictionaries, lists, or other data structures.
# Creating a DataFrame from a dictionary
data = {
'Name': ['Ana', 'Brais', 'Carlos', 'Diana'],
'Age': [23, 24, 22, 25],
'City': ['Santiago', 'Vigo', 'Ourense', 'Lugo']
}
dataframe = pd.DataFrame(data)
print(dataframe)
In this example, we create a DataFrame with three columns: 'Name', 'Age', and 'City'. Each column is a Series, and the DataFrame is essentially a collection of Series that share the same index.
Accessing Data in Series and DataFrames
You can access elements in a Series using the index:
# Accessing elements in a Series
print(serie[0]) # Access by position
print(serie_custom_index['a']) # Access by custom index
Similarly, you can access data in a DataFrame using column names and row indices:
# Accessing columns in a DataFrame
print(dataframe['Name'])
# Accessing rows in a DataFrame using .loc and .iloc
print(dataframe.loc[0]) # Access by label/index
print(dataframe.iloc[0]) # Access by position
Basic Operations on DataFrames
DataFrames support a wide range of operations, such as adding new columns, deleting columns, and performing arithmetic operations. Here are some examples:
# Adding a new column
dataframe['Country'] = 'Spain'
print(dataframe)
# Deleting a column
dataframe = dataframe.drop('Country', axis=1)
print(dataframe)
# Performing arithmetic operations
dataframe['Age'] = dataframe['Age'] + 1
print(dataframe)
Summary
In this section, we introduced the two primary data structures in Pandas: Series and DataFrames. Series are one-dimensional arrays, while DataFrames are two-dimensional tables with labeled axes. Understanding these structures and how to manipulate them is key to effectively using Pandas for data analysis.
Next, we will cover basic operations in Pandas, including reading and writing data, selecting and indexing data, filtering, and modifying data. These operations will build on the foundational knowledge you’ve gained in this section. Stay tuned!