Saltar la navegaciĆ³n

3.2 Convolutional Networks

Introduction to CNNs

Welcome to the third section of our Deep Learning module, where we will explore Convolutional Neural Networks, or CNNs. CNNs are a specialized kind of neural network designed for processing data that has a grid-like topology, such as images. Unlike traditional neural networks, CNNs use a convolutional operation to exploit the spatial structure of the data, making them highly effective for image and video recognition tasks.

Convolutional Layers

Convolutional Layers

The core building block of a CNN is the convolutional layer. Convolutional layers apply a set of filters (also known as kernels) to the input image, which allows the network to detect various features such as edges, textures, and patterns. Each filter slides over the input image and performs element-wise multiplications and summations, producing a feature map.

Example:

To illustrate, let's implement a simple convolutional layer using TensorFlow:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D

# Define a simple model with one convolutional layer
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))  # 32 filters, 3x3 kernel size
])

# Summary of the model
model.summary()

This code snippet defines a simple CNN with one convolutional layer using the Keras API in TensorFlow.

Pooling Layers

After the convolutional layer, it's common to apply a pooling layer to reduce the spatial dimensions of the feature maps. Pooling layers help to down-sample the input, reducing the computational load and helping to make the detected features more invariant to scale and translation. The most common type of pooling is max pooling, which selects the maximum value from each patch of the feature map.

Example:

Let's add a max pooling layer to our CNN:

from tensorflow.keras.layers import MaxPooling2D

# Add a max pooling layer to the model
model.add(MaxPooling2D(pool_size=(2, 2)))

# Summary of the model
model.summary()

This code snippet adds a max pooling layer to the CNN.

Applications of CNNs

Computer Vision

One of the most significant applications of CNNs is in the field of computer vision. CNNs are widely used for tasks such as image classification, object detection, and image segmentation. For example, in image classification, a CNN can be trained to recognize different classes of objects, such as cats and dogs, from a dataset of labeled images.

Example:

Let's build a simple CNN for image classification using the MNIST dataset of handwritten digits:

from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import Flatten, Dense

# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255
X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Define the model
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    MaxPooling2D(pool_size=(2, 2)),
    Flatten(),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
history = model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test), verbose=2)

# Evaluate the model
loss, accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f'Test accuracy: {accuracy}')

This code snippet defines and trains a simple CNN for image classification on the MNIST dataset.

Image Recognition

Image recognition is another area where CNNs excel. Beyond simple classification, CNNs can be used for more complex tasks such as identifying and localizing multiple objects within an image. Advanced CNN architectures like ResNet, Inception, and YOLO are used in applications ranging from autonomous driving to medical image analysis.

Example:

Let's visualize the performance of our CNN on the MNIST dataset:

import matplotlib.pyplot as plt

# Plot the training and validation accuracy
plt.plot(history.history['accuracy'], label='train_accuracy')
plt.plot(history.history['val_accuracy'], label='val_accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)
plt.show()

# Plot the training and validation loss
plt.plot(history.history['loss'], label='train_loss')
plt.plot(history.history['val_loss'], label='val_loss')
plt.title('Model Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)
plt.show()

This code snippet plots the training and validation accuracy and loss of the CNN model.

Summary

In this section, we've covered the basics of Convolutional Neural Networks, including convolutional and pooling layers. We also explored practical examples of building and training CNNs for image classification. These concepts and techniques are fundamental to understanding and working with CNNs in various computer vision applications.

Practice

Below, you have a link to the Jupyter/Colab notebook where you can practice the theory from this section:

Creado con eXeLearning (Ventana nueva)