Saltar la navegación

3.3 Recurrent Neural Networks

Introduction to RNNs

Welcome to the fourth section of our Deep Learning module, where we will explore Recurrent Neural Networks, or RNNs. Unlike traditional neural networks, RNNs are designed to recognize patterns in sequences of data, such as time series, text, and speech. The key feature of RNNs is their ability to maintain a memory of previous inputs through internal states, allowing them to capture temporal dependencies.

Architecture of an RNN

Long Short-Term Memory (LSTM) Networks

One of the most common challenges with traditional RNNs is the vanishing gradient problem, which makes it difficult for the network to learn long-term dependencies. Long Short-Term Memory (LSTM) networks address this issue by introducing a more complex architecture that includes gates to control the flow of information. LSTMs are capable of learning long-term dependencies and are widely used in various applications of sequence modeling.

Example:

Let’s implement a simple LSTM using TensorFlow:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

# Define a simple LSTM model
model = Sequential([
    LSTM(50, input_shape=(10, 1)),  # 50 LSTM units, input shape (timesteps, features)
    Dense(1, activation='linear')   # Output layer with 1 neuron
])

# Compile the model
model.compile(optimizer='adam', loss='mse')

# Summary of the model
model.summary()

This code snippet defines a simple LSTM model using the Keras API in TensorFlow.

Gated Recurrent Units (GRU)

Gated Recurrent Units (GRU) are a variant of LSTMs that simplify the architecture by combining the forget and input gates into a single update gate. GRUs have fewer parameters than LSTMs and can be faster to train while still addressing the vanishing gradient problem.

Example:

Let’s implement a simple GRU using TensorFlow:

from tensorflow.keras.layers import GRU

# Define a simple GRU model
model = Sequential([
    GRU(50, input_shape=(10, 1)),   # 50 GRU units, input shape (timesteps, features)
    Dense(1, activation='linear')   # Output layer with 1 neuron
])

# Compile the model
model.compile(optimizer='adam', loss='mse')

# Summary of the model
model.summary()

This code snippet defines a simple GRU model using the Keras API in TensorFlow.

Applications of RNNs

Natural Language Processing (NLP)

RNNs are extensively used in Natural Language Processing (NLP) for tasks such as language translation, sentiment analysis, and text generation. By processing sequences of words, RNNs can learn contextual relationships and generate coherent sentences.

Example:

Let’s build a simple RNN for text generation using an LSTM:

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np

# Sample text data
text = "Deep learning is a subset of machine learning in artificial intelligence."

# Tokenize the text
tokenizer = Tokenizer()
tokenizer.fit_on_texts([text])
total_words = len(tokenizer.word_index) + 1

# Create sequences of words
input_sequences = []
for i in range(1, len(tokenizer.word_index)):
    sequence = tokenizer.texts_to_sequences([text])[0][:i+1]
    input_sequences.append(sequence)

# Pad sequences
max_sequence_len = max([len(x) for x in input_sequences])
input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre'))

# Split data into inputs and labels
X = input_sequences[:, :-1]
y = input_sequences[:, -1]
y = tf.keras.utils.to_categorical(y, num_classes=total_words)

# Define the model
model = Sequential([
    LSTM(50, input_shape=(max_sequence_len-1, 1)),
    Dense(total_words, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy')

# Train the model
model.fit(X, y, epochs=100, verbose=2)

Time Series Forecasting

Another significant application of RNNs is in time series forecasting, where the goal is to predict future values based on historical data. RNNs can capture temporal dependencies and patterns, making them suitable for tasks such as stock price prediction and weather forecasting.

Example:

Let’s build a simple RNN for time series forecasting using an LSTM:

import numpy as np

# Generate sample time series data
def generate_time_series(n_steps):
    freq1, freq2, offsets1, offsets2 = np.random.rand(4, 100)
    time = np.linspace(0, 1, n_steps)
    series = 0.5 * np.sin((time - offsets1) * (freq1 * 10 + 10))  # wave 1
    series += 0.2 * np.sin((time - offsets2) * (freq2 * 20 + 20)) # + wave 2
    series += 0.1 * (np.random.rand(n_steps) - 0.5)              # + noise
    return series[..., np.newaxis].astype(np.float32)

# Generate dataset
n_steps = 50
series = generate_time_series(n_steps + 1)
X_train, y_train = series[:-1], series[1:]

# Reshape data
X_train = X_train.reshape((1, n_steps, 1))
y_train = y_train.reshape((1, n_steps, 1))

# Define the model
model = Sequential([
    LSTM(50, input_shape=(n_steps, 1), return_sequences=True),
    Dense(1)
])

# Compile the model
model.compile(optimizer='adam', loss='mse')

# Train the model
model.fit(X_train, y_train, epochs=200, verbose=2)

This code snippet defines and trains a simple LSTM model for time series forecasting.

Summary

In this section, we've covered the basics of Recurrent Neural Networks, including LSTMs and GRUs. We also explored practical examples of building and training RNNs for natural language processing and time series forecasting. These concepts and techniques are fundamental to understanding and working with RNNs in various sequence modeling applications.

Practice

Below, you have a link to the Jupyter/Colab notebook where you can practice the theory from this section:

Creado con eXeLearning (Ventana nueva)