Saltar la navegación

1.2.1 Introduction to Pandas

Información

In this page, you will find the content of the section in both video and text formats. Videos are interactive and contain embedded content (explanations, links or exercises) throughout their playback.

At the end of this page, you have a link to the Jupyter/Colab notebook where you can practice the theory from this section.

Vídeo

Introduction to Pandas

What is Pandas?

Pandas is an open-source data analysis and manipulation library built on top of the Python programming language. It provides data structures and functions needed to work with structured data seamlessly. The name "Pandas" is derived from the term "panel data," which refers to multidimensional data.

Why Use Pandas?

Pandas is highly favored for its ability to:

  • Handle large amounts of data efficiently.
  • Perform data cleaning and preprocessing.
  • Provide powerful data aggregation and transformation tools.
  • Integrate seamlessly with other Python libraries, such as NumPy, Matplotlib, and Scikit-Learn.

Installing Pandas

To get started with Pandas, you need to have it installed on your system. You can install it using pip, Python’s package installer, with the following command:

!pip install pandas

Once installed, you can import it into your Python environment:

import pandas as pd

Exploring Pandas Data Structures

Series

A Series is a one-dimensional labeled array that can hold any data type, such as integers, strings, floats, and even Python objects. Think of a Series as a single column in an Excel spreadsheet. Each element in a Series is assigned a label, also known as an index.

# Creating a Series
serie = pd.Series([1, 2, 3, 4, 5])
print(serie)

In this example, we create a Series from a list of numbers. Pandas automatically generates an integer index starting from 0. You can also specify custom indices:

# Creating a Series with custom indices
serie_custom_index = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
print(serie_custom_index)

DataFrame

A DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It is similar to a table in a database or an Excel spreadsheet. DataFrames are incredibly versatile and can be created in several ways, such as from dictionaries, lists, or other data structures.

# Creating a DataFrame from a dictionary
data = {
    'Name': ['Ana', 'Brais', 'Carlos', 'Diana'],
    'Age': [23, 24, 22, 25],
    'City': ['Santiago', 'Vigo', 'Ourense', 'Lugo']
}
dataframe = pd.DataFrame(data)
print(dataframe)

In this example, we create a DataFrame with three columns: 'Name', 'Age', and 'City'. Each column is a Series, and the DataFrame is essentially a collection of Series that share the same index.

Course Outline

Throughout this course, we will cover the following key areas:

  1. Basic Operations with Pandas:
    • Reading and writing data to files (CSV, Excel).
    • Selecting and indexing data.
    • Filtering and modifying data.
  2. Data Analysis and Manipulation:
    • Grouping and aggregating data.
    • Merging and joining DataFrames.
    • Performing basic statistical operations.
  3. Data Visualization:
    • Creating basic plots using Pandas.
    • Visualizing data trends and distributions.

By the end of this module, you will have a solid understanding of how to use Pandas to manage and analyze data effectively.

Let's dive in and start exploring the capabilities of Pandas!

Practice

Below, you have a link to the Jupyter/Colab notebook where you can practice the theory from this section:

Introduction to Pandas

Creado con eXeLearning (Ventana nueva)