Saltar la navegación

3.1 Artificial Neural Networks (ANN)

Introducción

Basic Concepts of Neural Networks

Welcome to the second section of our Deep Learning module, where we delve into Artificial Neural Networks, commonly known as ANNs. ANNs are the backbone of Deep Learning, inspired by the biological neural networks that constitute animal brains. At a high level, an ANN is composed of interconnected units called neurons, which are organized in layers.

Each neuron receives input, processes it, and passes on the output to the next layer of neurons. This process allows the network to learn and make predictions. The power of ANNs lies in their ability to learn complex patterns and representations from data, making them suitable for a wide range of applications.

Example:

To illustrate the basic concept, consider a simple neural network with one input layer, one hidden layer, and one output layer. Here’s how you can implement this using TensorFlow:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Define the model
model = Sequential([
    Dense(32, input_shape=(10,), activation='relu'),  # Input layer with 10 inputs
    Dense(16, activation='relu'),                     # Hidden layer with 16 neurons
    Dense(1, activation='sigmoid')                    # Output layer with 1 neuron
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Summary of the model
model.summary()

This code snippet defines a simple ANN using the Keras API in TensorFlow.

Architecture of a Neural Network

Input, Hidden, and Output Layers

The architecture of an ANN consists of three main types of layers: the input layer, hidden layers, and the output layer. The input layer receives the initial data, which could be anything from an image's pixel values to numerical data points. This data is then passed through one or more hidden layers where the actual computation and learning occur.

Hidden layers are made up of neurons that apply transformations to the input data using weights, biases, and activation functions. These layers allow the network to learn intricate patterns by adjusting the weights during the training process. Finally, the output layer produces the final prediction or classification, depending on the task at hand.

Example:

Let’s visualize a simple neural network architecture:

from tensorflow.keras.utils import plot_model

# Visualize the model
plot_model(model, show_shapes=True, show_layer_names=True)

This code snippet generates a graphical representation of the neural network, showing the structure and the shapes of the layers.

Activation Functions

Activation functions play a crucial role in neural networks by introducing non-linearity into the model. This non-linearity enables the network to learn complex patterns that linear models cannot capture. There are several activation functions commonly used in ANNs:

  • Sigmoid Function: Outputs values between 0 and 1, often used in binary classification problems.
  • Tanh Function: Outputs values between -1 and 1, providing stronger gradients than the sigmoid function.
  • ReLU (Rectified Linear Unit): Outputs zero for negative inputs and the input itself for positive inputs, helping mitigate the vanishing gradient problem.

Choosing the right activation function can significantly impact the performance and convergence of the neural network.

Example:

Let’s plot these activation functions to understand their behavior:

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-10, 10, 100)

# Sigmoid function
sigmoid = 1 / (1 + np.exp(-x))

# Tanh function
tanh = np.tanh(x)

# ReLU function
relu = np.maximum(0, x)

# Plotting
plt.figure(figsize=(10, 6))
plt.plot(x, sigmoid, label='Sigmoid')
plt.plot(x, tanh, label='Tanh')
plt.plot(x, relu, label='ReLU')
plt.title('Activation Functions')
plt.xlabel('Input')
plt.ylabel('Output')
plt.legend()
plt.grid(True)
plt.show()

This code snippet plots the sigmoid, tanh, and ReLU activation functions.

Training Process

Backpropagation Algorithm

Training a neural network involves adjusting its weights and biases to minimize the error in its predictions. This is done using a process called backpropagation. Backpropagation is an algorithm that calculates the gradient of the loss function with respect to each weight by applying the chain rule. These gradients are then used to update the weights in a direction that reduces the error.

Example:

Here’s a simple example to illustrate backpropagation:

# Assume we have a simple model and some data
X = np.array([[0, 1], [1, 0], [1, 1], [0, 0]])
y = np.array([[1], [1], [0], [0]])

# Define a simple neural network model
model = Sequential([
    Dense(2, input_dim=2, activation='relu'),
    Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
history = model.fit(X, y, epochs=1000, verbose=0)

# Plot the loss over epochs
plt.plot(history.history['loss'])
plt.title('Model Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.grid(True)
plt.show()

This code trains a simple neural network and plots the loss over epochs.

Optimization and Gradient Descent

The optimization process in neural networks aims to find the best set of weights that minimize the loss function. Gradient descent is the most common optimization algorithm used for this purpose. It iteratively adjusts the weights in the opposite direction of the gradient of the loss function. There are various forms of gradient descent, including:

  • Stochastic Gradient Descent (SGD): Updates weights using one training example at a time.
  • Mini-Batch Gradient Descent: Uses a small batch of training examples to update weights.
  • Batch Gradient Descent: Uses the entire training dataset to compute the gradients.

These variations help in speeding up the convergence and avoiding local minima.

Example:

Here’s how you can use different optimizers in TensorFlow:

# Using SGD optimizer
model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X, y, epochs=100, verbose=0)

# Using Adam optimizer
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X, y, epochs=100, verbose=0)

# Compare results
loss_sgd = model.evaluate(X, y, verbose=0)[0]
loss_adam = model.evaluate(X, y, verbose=0)[0]
print(f'SGD Loss: {loss_sgd}, Adam Loss: {loss_adam}')

This code snippet compares the performance of the SGD and Adam optimizers.

Summary

In this section, we covered the basic concepts of neural networks, the architecture of ANNs including input, hidden, and output layers, and the importance of activation functions. We also explored the training process involving backpropagation and gradient descent optimization, supported by practical examples and code snippets. These foundational concepts are essential for understanding how ANNs learn and make predictions.

Practice

Below, you have a link to the Jupyter/Colab notebook where you can practice the theory from this section:

Creado con eXeLearning (Ventana nueva)