You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
ImperialDeepLeaning/CW3/Coursework_3_Full_questions...

1061 lines
93 KiB

This file contains invisible Unicode characters!

This file contains invisible Unicode characters that may be processed differently from what appears below. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to reveal hidden characters.

{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Coursework 3 - Full questions.ipynb",
"provenance": [],
"collapsed_sections": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "zNMD5M6mQ2YY"
},
"source": [
"# Coursework 3: RNNs\n",
"\n",
"#### Instructions\n",
"\n",
"Please submit on CATe a zip file named *CW3_RNNs.zip* containing a version of this notebook with your answers. Write your answers in the cells below for each question.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LKehhGDF-Qte"
},
"source": [
"## Recurrent models coursework\n",
"\n",
"This coursework is separated into a coding and a theory component.\n",
"\n",
"For the first part, you will use the Google Speech Commands v0.02 subset that you used in the RNN tutorial: http://www.doc.ic.ac.uk/~pam213/co460_files/ \n",
"\n",
"### Part 1 - Coding\n",
"In this part you will have to:\n",
"\n",
"- Implement an LSTM\n",
"- Implement a GRU\n",
"\n",
"### Part 2 - Theory\n",
"\n",
"Here you will answer some theoretical questions about RNNs -- no detailed proofs and no programming."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qaI4P8SZ-U2j"
},
"source": [
"### Part 1: Coding"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bWGb-eUeXtex"
},
"source": [
"### Dataset\n",
"\n",
"We will be using the Google [*Speech Commands*](https://www.tensorflow.org/tutorials/sequences/audio_recognition) v0.02 [1] dataset.\n",
"\n",
"[1] Warden, P. (2018). [Speech commands: A dataset for limited-vocabulary speech recognition](https://arxiv.org/abs/1804.03209). *arXiv preprint arXiv:1804.03209.*"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "SB5975CR7Gjg",
"outputId": "2a95c3e1-1347-4319-f376-b83691cea922"
},
"source": [
"from google.colab import drive\n",
"drive.mount('/content/drive')"
],
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"text": [
"Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "5FWXvc8G6le4"
},
"source": [
"## MAKE SURE THIS POINTS INSIDE THE DATASET FOLDER.\n",
"dataset_folder = \"/content/drive/MyDrive/data/\" # this should change depending on where you have stored the data files"
],
"execution_count": 2,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "SPZTUa2i6le5"
},
"source": [
"### Initial code before coursework questions start:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "Qt3KzJzBPdHU"
},
"source": [
"import math\n",
"import os\n",
"import random\n",
"from collections import defaultdict\n",
"\n",
"import torch\n",
"import torch.nn as nn\n",
"from torch.autograd import Variable\n",
"from torch.utils.data import Dataset\n",
"import numpy as np\n",
"from scipy.io.wavfile import read\n",
"import librosa\n",
"from matplotlib import pyplot as plt\n",
"\n",
"cuda = True if torch.cuda.is_available() else False\n",
"\n",
"Tensor = torch.cuda.FloatTensor if cuda else torch.FloatTensor\n"
],
"execution_count": 3,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "D_huYnDW6le6"
},
"source": [
"def set_seed(seed_value):\n",
" \"\"\"Set seed for reproducibility.\n",
" \"\"\"\n",
" random.seed(seed_value)\n",
" np.random.seed(seed_value)\n",
" torch.manual_seed(seed_value)\n",
" torch.cuda.manual_seed_all(seed_value)\n",
"\n",
"set_seed(42)"
],
"execution_count": 4,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "QKSnqpAJLVwx"
},
"source": [
"class SpeechCommandsDataset(Dataset):\n",
" \"\"\"Google Speech Commands dataset.\"\"\"\n",
"\n",
" def __init__(self, root_dir, split):\n",
" \"\"\"\n",
" Args:\n",
" root_dir (string): Directory with all the data files.\n",
" split (string): In [\"train\", \"valid\", \"test\"].\n",
" \"\"\"\n",
" self.root_dir = root_dir\n",
" self.split = split\n",
"\n",
" self.number_of_classes = len(self.get_classes())\n",
"\n",
" self.class_to_file = defaultdict(list)\n",
"\n",
" self.valid_filenames = self.get_valid_filenames()\n",
" self.test_filenames = self.get_test_filenames()\n",
"\n",
" for c in self.get_classes():\n",
" file_name_list = sorted(os.listdir(self.root_dir + \"data_speech_commands_v0.02/\" + c))\n",
" for filename in file_name_list:\n",
" if split == \"train\":\n",
" if (filename not in self.valid_filenames[c]) and (filename not in self.test_filenames[c]):\n",
" self.class_to_file[c].append(filename)\n",
" elif split == \"valid\":\n",
" if filename in self.valid_filenames[c]:\n",
" self.class_to_file[c].append(filename)\n",
" elif split == \"test\":\n",
" if filename in self.test_filenames[c]:\n",
" self.class_to_file[c].append(filename)\n",
" else:\n",
" raise ValueError(\"Invalid split name.\")\n",
"\n",
" self.filepath_list = list()\n",
" self.label_list = list()\n",
" for cc, c in enumerate(self.get_classes()):\n",
" f_extension = sorted(list(self.class_to_file[c]))\n",
" l_extension = [cc for i in f_extension]\n",
" f_extension = [self.root_dir + \"data_speech_commands_v0.02/\" + c + \"/\" + filename for filename in f_extension]\n",
" self.filepath_list.extend(f_extension)\n",
" self.label_list.extend(l_extension)\n",
" self.number_of_samples = len(self.filepath_list)\n",
"\n",
" def __len__(self):\n",
" return self.number_of_samples\n",
"\n",
" def __getitem__(self, idx):\n",
" sample = np.zeros((16000, ), dtype=np.float32)\n",
"\n",
" sample_file = self.filepath_list[idx]\n",
"\n",
" sample_from_file = read(sample_file)[1]\n",
" sample[:sample_from_file.size] = sample_from_file\n",
" sample = sample.reshape((16000, ))\n",
" \n",
" sample = librosa.feature.mfcc(y=sample, sr=16000, hop_length=512, n_fft=2048).transpose().astype(np.float32)\n",
"\n",
" label = self.label_list[idx]\n",
"\n",
" return sample, label\n",
"\n",
" def get_classes(self):\n",
" return ['one', 'two', 'three']\n",
"\n",
" def get_valid_filenames(self):\n",
" class_names = self.get_classes()\n",
"\n",
" class_to_filename = defaultdict(set)\n",
" with open(self.root_dir + \"data_speech_commands_v0.02/validation_list.txt\", \"r\") as fp:\n",
" for line in fp:\n",
" clean_line = line.strip().split(\"/\")\n",
"\n",
" if clean_line[0] in class_names:\n",
" class_to_filename[clean_line[0]].add(clean_line[1])\n",
"\n",
" return class_to_filename\n",
"\n",
" def get_test_filenames(self):\n",
" class_names = self.get_classes()\n",
"\n",
" class_to_filename = defaultdict(set)\n",
" with open(self.root_dir + \"data_speech_commands_v0.02/testing_list.txt\", \"r\") as fp:\n",
" for line in fp:\n",
" clean_line = line.strip().split(\"/\")\n",
"\n",
" if clean_line[0] in class_names:\n",
" class_to_filename[clean_line[0]].add(clean_line[1])\n",
"\n",
" return class_to_filename"
],
"execution_count": 5,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "vx8ptirGKa9u"
},
"source": [
"\n",
"train_dataset = SpeechCommandsDataset(dataset_folder,\n",
" \"train\")\n",
"valid_dataset = SpeechCommandsDataset(dataset_folder,\n",
" \"valid\")\n",
"\n",
"test_dataset = SpeechCommandsDataset(dataset_folder,\n",
" \"test\")\n",
"\n",
"batch_size = 100\n",
"\n",
"\n",
"num_epochs = 5\n",
"valid_every_n_steps = 20\n",
"train_loader = torch.utils.data.DataLoader(dataset=train_dataset,\n",
" batch_size=batch_size,\n",
" shuffle=True)\n",
"valid_loader = torch.utils.data.DataLoader(dataset=valid_dataset,\n",
" batch_size=batch_size,\n",
" shuffle=False)\n",
"\n",
"test_loader = torch.utils.data.DataLoader(dataset=test_dataset,\n",
" batch_size=batch_size,\n",
" shuffle=False)"
],
"execution_count": 6,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "eJwOesOQOSh9"
},
"source": [
"### Question 1: Finalise the LSTM and GRU cells by completing the missing code\n",
"\n",
"You are allowed to use nn.Linear."
]
},
{
"cell_type": "code",
"metadata": {
"id": "LQu9Yxfy-Wqj"
},
"source": [
"class LSTMCell(nn.Module):\n",
" def __init__(self, input_size, hidden_size, bias=True):\n",
" super(LSTMCell, self).__init__()\n",
" self.input_size = input_size\n",
" self.hidden_size = hidden_size\n",
" self.bias = bias\n",
" \n",
" ########################################################################\n",
" ## START OF YOUR CODE - Question 1a) Complete the missing code\n",
" ########################################################################\n",
"\n",
" self.w_c_x = nn.Linear(input_size, hidden_size, bias = bias)\n",
" self.w_c_h = nn.Linear(hidden_size, hidden_size, bias = bias)\n",
"\n",
" self.w_i_x = nn.Linear(input_size, hidden_size, bias = bias)\n",
" self.w_i_h = nn.Linear(hidden_size, hidden_size, bias = bias)\n",
"\n",
" self.w_f_x = nn.Linear(input_size, hidden_size, bias = bias)\n",
" self.w_f_h = nn.Linear(hidden_size, hidden_size, bias = bias)\n",
"\n",
" self.w_o_x = nn.Linear(input_size, hidden_size, bias = bias)\n",
" self.w_o_h = nn.Linear(hidden_size, hidden_size, bias = bias)\n",
" \n",
" ########################################################################\n",
" ## END OF YOUR CODE\n",
" ########################################################################\n",
" self.reset_parameters()\n",
"\n",
" def reset_parameters(self):\n",
" std = 1.0 / math.sqrt(self.hidden_size)\n",
" for w in self.parameters():\n",
" w.data.uniform_(-std, std)\n",
"\n",
" def forward(self, input, hx=None):\n",
" if hx is None:\n",
" hx = input.new_zeros(input.size(0), self.hidden_size, requires_grad=False)\n",
" hx = (hx, hx)\n",
" \n",
" # We used hx to pack both the hidden and cell states\n",
" hx, cx = hx\n",
" \n",
" ########################################################################\n",
" ## START OF YOUR CODE - Question 1b) Complete the missing code\n",
" ########################################################################\n",
" c_t = torch.tanh(self.w_c_x(input) + self.w_c_h(hx))\n",
" i_t = torch.sigmoid(self.w_i_x(input) + self.w_i_h(hx))\n",
" f_t = torch.sigmoid(self.w_f_x(input) + self.w_f_h(hx))\n",
" o_t = torch.sigmoid(self.w_o_x(input) + self.w_o_h(hx))\n",
" cy = f_t * cx + i_t * c_t\n",
" hy = o_t * torch.tanh(cy)\n",
" \n",
" ########################################################################\n",
" ## END OF YOUR CODE\n",
" ########################################################################\n",
"\n",
" return (hy, cy)\n",
"\n",
"class BasicRNNCell(nn.Module):\n",
" def __init__(self, input_size, hidden_size, bias=True, nonlinearity=\"tanh\"):\n",
" super(BasicRNNCell, self).__init__()\n",
" self.input_size = input_size\n",
" self.hidden_size = hidden_size\n",
" self.bias = bias\n",
" self.nonlinearity = nonlinearity\n",
" if self.nonlinearity not in [\"tanh\", \"relu\"]:\n",
" raise ValueError(\"Invalid nonlinearity selected for RNN.\")\n",
"\n",
" self.x2h = nn.Linear(input_size, hidden_size, bias=bias)\n",
" self.h2h = nn.Linear(hidden_size, hidden_size, bias=bias)\n",
"\n",
" self.reset_parameters()\n",
" \n",
"\n",
" def reset_parameters(self):\n",
" std = 1.0 / math.sqrt(self.hidden_size)\n",
" for w in self.parameters():\n",
" w.data.uniform_(-std, std)\n",
"\n",
" \n",
" def forward(self, input, hx=None):\n",
" if hx is None:\n",
" hx = input.new_zeros(input.size(0), self.hidden_size, requires_grad=False)\n",
"\n",
" activation = getattr(nn.functional, self.nonlinearity)\n",
" hy = activation(self.x2h(input) + self.h2h(hx))\n",
"\n",
" return hy\n",
"\n",
" \n",
" \n",
"class GRUCell(nn.Module):\n",
" def __init__(self, input_size, hidden_size, bias=True):\n",
" super(GRUCell, self).__init__()\n",
" self.input_size = input_size\n",
" self.hidden_size = hidden_size\n",
" self.bias = bias\n",
"\n",
" ########################################################################\n",
" ## START OF YOUR CODE - Question 1c) Complete the missing code\n",
" ########################################################################\n",
" self.w_z_x = nn.Linear(input_size, hidden_size, bias = bias)\n",
" self.w_z_h = nn.Linear(hidden_size, hidden_size, bias = bias)\n",
"\n",
" self.w_r_x = nn.Linear(input_size, hidden_size, bias = bias)\n",
" self.w_r_h = nn.Linear(hidden_size, hidden_size, bias = bias)\n",
"\n",
" self.w_n_x = nn.Linear(input_size, hidden_size, bias = bias)\n",
" self.w_n_h = nn.Linear(hidden_size, hidden_size, bias = bias)\n",
" ########################################################################\n",
" ## END OF YOUR CODE\n",
" ########################################################################\n",
" self.reset_parameters()\n",
" \n",
"\n",
" def reset_parameters(self):\n",
" std = 1.0 / math.sqrt(self.hidden_size)\n",
" for w in self.parameters():\n",
" w.data.uniform_(-std, std)\n",
"\n",
" def forward(self, input, hx=None):\n",
" if hx is None:\n",
" hx = input.new_zeros(input.size(0), self.hidden_size, requires_grad=False)\n",
"\n",
" ########################################################################\n",
" ## START OF YOUR CODE - Question 1d) Complete the missing code\n",
" ########################################################################\n",
" z_t = torch.sigmoid(self.w_z_x(input) + self.w_z_h(hx))\n",
" r_t = torch.sigmoid(self.w_r_x(input) + self.w_r_h(hx))\n",
" n_t = torch.tanh(self.w_n_x(input) + r_t * self.w_n_h(hx))\n",
" hy = torch.mul((1 - z_t), n_t) + torch.mul(z_t, hx)\n",
" ########################################################################\n",
" ## END OF YOUR CODE\n",
" ########################################################################\n",
" \n",
" return hy"
],
"execution_count": 7,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "7kEGenNJ6le_"
},
"source": [
"### Question 2: Finalise the RNNModel and BidirRecurrentModel"
]
},
{
"cell_type": "code",
"metadata": {
"id": "F15K5FwA6lfA"
},
"source": [
"class RNNModel(nn.Module):\n",
" def __init__(self, mode, input_size, hidden_size, num_layers, bias, output_size):\n",
" super(RNNModel, self).__init__()\n",
" self.mode = mode\n",
" self.input_size = input_size\n",
" self.hidden_size = hidden_size\n",
" self.num_layers = num_layers\n",
" self.bias = bias\n",
" self.output_size = output_size\n",
" \n",
" self.rnn_cell_list = nn.ModuleList()\n",
" \n",
" if mode == 'LSTM':\n",
" ########################################################################\n",
" ## START OF YOUR CODE - Question 2a) Complete the missing code\n",
" #\n",
" # Append the appropriate LSTM cells to rnn_cell_list\n",
" ########################################################################\n",
" self.rnn_cell_list.append(LSTMCell(self.input_size, self.hidden_size, self.bias))\n",
" for l in range(1, self.num_layers):\n",
" self.rnn_cell_list.append(LSTMCell(self.hidden_size, self.hidden_size, self.bias))\n",
" \n",
" ########################################################################\n",
" ## END OF YOUR CODE\n",
" ######################################################################## \n",
"\n",
" elif mode == 'GRU':\n",
" \n",
" ########################################################################\n",
" ## START OF YOUR CODE - Question 2b) Complete the missing code\n",
" #\n",
" # Append the appropriate GRU cells to rnn_cell_list\n",
" ########################################################################\n",
" self.rnn_cell_list.append(GRUCell(self.input_size, self.hidden_size, self.bias))\n",
" for l in range(1, self.num_layers):\n",
" self.rnn_cell_list.append(GRUCell(self.hidden_size, self.hidden_size, self.bias)) \n",
"\n",
" ########################################################################\n",
" ## END OF YOUR CODE\n",
" ######################################################################## \n",
" \n",
" elif mode == 'RNN_TANH':\n",
" \n",
" \n",
" ########################################################################\n",
" ## START OF YOUR CODE - Question 2c) Complete the missing code\n",
" #\n",
" # Append the appropriate RNN cells to rnn_cell_list\n",
" ########################################################################\n",
" self.rnn_cell_list.append(BasicRNNCell(self.input_size, self.hidden_size, self.bias, \"tanh\"))\n",
" for l in range(1, self.num_layers):\n",
" self.rnn_cell_list.append(BasicRNNCell(self.hidden_size, self.hidden_size, self.bias, \"tanh\")) \n",
"\n",
" ########################################################################\n",
" ## END OF YOUR CODE\n",
" ########################################################################\n",
"\n",
"\n",
" \n",
" elif mode == 'RNN_RELU':\n",
" \n",
" ########################################################################\n",
" ## START OF YOUR CODE - Question 2d) Complete the missing code\n",
" #\n",
" # Append the appropriate RNN cells to rnn_cell_list\n",
" ########################################################################\n",
" self.rnn_cell_list.append(BasicRNNCell(self.input_size, self.hidden_size, self.bias, \"relu\"))\n",
" for l in range(1, self.num_layers):\n",
" self.rnn_cell_list.append(BasicRNNCell(self.hidden_size, self.hidden_size, self.bias, \"relu\")) \n",
"\n",
" ########################################################################\n",
" ## END OF YOUR CODE\n",
" ########################################################################\n",
"\n",
"\n",
" else:\n",
" raise ValueError(\"Invalid RNN mode selected.\")\n",
"\n",
"\n",
" self.att_fc = nn.Linear(self.hidden_size, 1)\n",
" self.fc = nn.Linear(self.hidden_size, self.output_size)\n",
"\n",
" \n",
" def forward(self, input, hx=None):\n",
"\n",
" outs = []\n",
" h0 = [None] * self.num_layers if hx is None else list(hx)\n",
" \n",
" # In this forward pass we want to create our RNN from the rnn cells,\n",
" # ..taking the hidden states from the final RNN layer and passing these \n",
" # ..through our fully connected layer (fc).\n",
" \n",
" # The multi-layered RNN should be able to run when the mode is either \n",
" # .. LSTM, GRU, RNN_TANH or RNN_RELU.\n",
" \n",
" ########################################################################\n",
" ## START OF YOUR CODE - Question 2e) Complete the missing code\n",
" #\n",
" # HINT: You may need a special case for LSTMs\n",
" ########################################################################\n",
"\n",
" for seq in range(input.size(1)):\n",
" x = input[:, seq, :]\n",
" for l, cell in enumerate(self.rnn_cell_list):\n",
" h0[l] = cell(x, h0[l])\n",
" if self.mode == 'LSTM':\n",
" x, _ = h0[l]\n",
" else:\n",
" x = h0[l]\n",
" outs.append(x)\n",
" \n",
" ########################################################################\n",
" ## END OF YOUR CODE\n",
" ########################################################################\n",
"\n",
" out = outs[-1].squeeze()\n",
"\n",
" out = self.fc(out)\n",
" \n",
" \n",
" return out\n",
" \n",
"\n",
"class BidirRecurrentModel(nn.Module):\n",
" def __init__(self, mode, input_size, hidden_size, num_layers, bias, output_size):\n",
" super(BidirRecurrentModel, self).__init__()\n",
" self.mode = mode\n",
" self.input_size = input_size\n",
" self.hidden_size = hidden_size\n",
" self.num_layers = num_layers\n",
" self.bias = bias\n",
" self.output_size = output_size\n",
" \n",
" self.rnn_cell_list = nn.ModuleList()\n",
" self.rnn_cell_list_rev = nn.ModuleList()\n",
" \n",
" ########################################################################\n",
" ## START OF YOUR CODE - Question 2f) Complete the missing code\n",
" #\n",
" # Create code for the following 'mode' values:\n",
" # 'LSTM', 'GRU', 'RNN_TANH' and 'RNN_RELU'\n",
" ########################################################################\n",
" if mode == 'LSTM':\n",
" self.rnn_cell_list.append(LSTMCell(self.input_size, self.hidden_size, self.bias))\n",
" for l in range(1, self.num_layers):\n",
" self.rnn_cell_list.append(LSTMCell(self.hidden_size, self.hidden_size, self.bias))\n",
" self.rnn_cell_list_rev.append(LSTMCell(self.input_size, self.hidden_size, self.bias))\n",
" for l in range(1, self.num_layers):\n",
" self.rnn_cell_list_rev.append(LSTMCell(self.hidden_size, self.hidden_size, self.bias))\n",
" \n",
" elif mode == 'GRU':\n",
" self.rnn_cell_list.append(GRUCell(self.input_size, self.hidden_size, self.bias))\n",
" for l in range(1, self.num_layers):\n",
" self.rnn_cell_list.append(GRUCell(self.hidden_size, self.hidden_size, self.bias))\n",
" self.rnn_cell_list_rev.append(GRUCell(self.input_size, self.hidden_size, self.bias))\n",
" for l in range(1, self.num_layers):\n",
" self.rnn_cell_list_rev.append(GRUCell(self.hidden_size, self.hidden_size, self.bias))\n",
"\n",
" elif mode == 'RNN_TANH':\n",
" self.rnn_cell_list.append(BasicRNNCell(self.input_size, self.hidden_size, self.bias, \"tanh\"))\n",
" for l in range(1, self.num_layers):\n",
" self.rnn_cell_list.append(BasicRNNCell(self.hidden_size, self.hidden_size, self.bias, \"tanh\"))\n",
" self.rnn_cell_list_rev.append(BasicRNNCell(self.input_size, self.hidden_size, self.bias, \"tanh\"))\n",
" for l in range(1, self.num_layers):\n",
" self.rnn_cell_list_rev.append(BasicRNNCell(self.hidden_size, self.hidden_size, self.bias, \"tanh\"))\n",
"\n",
" elif mode == 'RNN_RELU':\n",
" self.rnn_cell_list.append(BasicRNNCell(self.input_size, self.hidden_size, self.bias, \"relu\"))\n",
" for l in range(1, self.num_layers):\n",
" self.rnn_cell_list.append(BasicRNNCell(self.hidden_size, self.hidden_size, self.bias, \"relu\"))\n",
" self.rnn_cell_list_rev.append(BasicRNNCell(self.input_size, self.hidden_size, self.bias, \"relu\"))\n",
" for l in range(1, self.num_layers):\n",
" self.rnn_cell_list_rev.append(BasicRNNCell(self.hidden_size, self.hidden_size, self.bias, \"relu\"))\n",
" else:\n",
" raise ValueError(\"Invalid RNN mode selected.\")\n",
"\n",
" # x2 hidden-to-output connections \n",
" self.fc = nn.Linear(self.hidden_size*2, self.output_size) \n",
"\n",
" ########################################################################\n",
" ## END OF YOUR CODE\n",
" ########################################################################\n",
" \n",
" \n",
" def forward(self, input, hx=None):\n",
" \n",
" # In this forward pass we want to create our Bidirectional RNN from the rnn cells,\n",
" # .. taking the hidden states from the final RNN layer with their reversed counterparts\n",
" # .. before concatening these and running them through the fully connected layer (fc)\n",
" \n",
" # The multi-layered RNN should be able to run when the mode is either \n",
" # .. LSTM, GRU, RNN_TANH or RNN_RELU.\n",
" \n",
" ########################################################################\n",
" ## START OF YOUR CODE - Question 2g) Complete the missing code\n",
" ########################################################################\n",
" outs = []\n",
" outs_rev = []\n",
" h0 = [None] * self.num_layers if hx is None else list(hx)\n",
" h0_rev = [None] * self.num_layers if hx is None else list(hx)\n",
"\n",
" for seq in range(input.size(1)):\n",
" x = input[:, seq, :]\n",
" x_rev = input[:, -seq-1, :]\n",
" for l, cell in enumerate(self.rnn_cell_list):\n",
" h0[l] = cell(x, h0[l])\n",
" if self.mode == 'LSTM':\n",
" x, _ = h0[l]\n",
" else:\n",
" x = h0[l]\n",
" for l, cell_rev in enumerate(self.rnn_cell_list_rev):\n",
" h0_rev[l] = cell_rev(x_rev, h0_rev[l])\n",
" if self.mode == 'LSTM':\n",
" x_rev, _ = h0_rev[l]\n",
" else:\n",
" x_rev = h0_rev[l]\n",
" outs.append(x) \n",
" outs_rev.append(x_rev)\n",
" ########################################################################\n",
" ## END OF YOUR CODE\n",
" ########################################################################\n",
"\n",
" out = outs[-1].squeeze()\n",
" out_rev = outs_rev[0].squeeze()\n",
" out = torch.cat((out, out_rev), 1)\n",
"\n",
" out = self.fc(out)\n",
" return out"
],
"execution_count": 8,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "NSGwF4XR6lfC"
},
"source": [
"The code below trains a network based on your code above. This should work without error:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "q5tW2k016lfC",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "c37dfc6b-a560-476f-b5e0-6ad450a01b80"
},
"source": [
"seq_dim, input_dim = train_dataset[0][0].shape\n",
"output_dim = 3\n",
"\n",
"hidden_dim = 128\n",
"layer_dim = 4\n",
"bias = True\n",
"\n",
"### Change the code below to try running different models:\n",
"# model = RNNModel(\"LSTM\", input_dim, hidden_dim, layer_dim, bias, output_dim)\n",
"model = BidirRecurrentModel(\"LSTM\", input_dim, hidden_dim, layer_dim, bias, output_dim)\n",
"\n",
"if torch.cuda.is_available():\n",
" model.cuda()\n",
" \n",
"criterion = nn.CrossEntropyLoss()\n",
"\n",
"learning_rate = 0.001\n",
"optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)\n",
"\n",
"loss_list = []\n",
"iter = 0\n",
"max_v_accuracy = 0\n",
"reported_t_accuracy = 0\n",
"max_t_accuracy = 0\n",
"for epoch in range(num_epochs):\n",
" for i, (audio, labels) in enumerate(train_loader):\n",
" if torch.cuda.is_available():\n",
" audio = Variable(audio.view(-1, seq_dim, input_dim).cuda())\n",
" labels = Variable(labels.cuda())\n",
" else:\n",
" audio = Variable(audio.view(-1, seq_dim, input_dim))\n",
" labels = Variable(labels)\n",
"\n",
" optimizer.zero_grad()\n",
"\n",
" outputs = model(audio)\n",
"\n",
" loss = criterion(outputs, labels)\n",
"\n",
" if torch.cuda.is_available():\n",
" loss.cuda()\n",
"\n",
" loss.backward()\n",
"\n",
" optimizer.step()\n",
"\n",
" loss_list.append(loss.item())\n",
" iter += 1\n",
"\n",
" if iter % valid_every_n_steps == 0:\n",
" correct = 0\n",
" total = 0\n",
" for audio, labels in valid_loader:\n",
" if torch.cuda.is_available():\n",
" audio = Variable(audio.view(-1, seq_dim, input_dim).cuda())\n",
" else:\n",
" audio = Variable(audio.view(-1, seq_dim, input_dim))\n",
"\n",
" outputs = model(audio)\n",
"\n",
" _, predicted = torch.max(outputs.data, 1)\n",
"\n",
" total += labels.size(0)\n",
"\n",
" if torch.cuda.is_available():\n",
" correct += (predicted.cpu() == labels.cpu()).sum()\n",
" else:\n",
" correct += (predicted == labels).sum()\n",
"\n",
" v_accuracy = 100 * correct // total\n",
" \n",
" is_best = False\n",
" if v_accuracy >= max_v_accuracy:\n",
" max_v_accuracy = v_accuracy\n",
" is_best = True\n",
"\n",
" if is_best:\n",
" for audio, labels in test_loader:\n",
" if torch.cuda.is_available():\n",
" audio = Variable(audio.view(-1, seq_dim, input_dim).cuda())\n",
" else:\n",
" audio = Variable(audio.view(-1, seq_dim, input_dim))\n",
"\n",
" outputs = model(audio)\n",
"\n",
" _, predicted = torch.max(outputs.data, 1)\n",
"\n",
" total += labels.size(0)\n",
"\n",
" if torch.cuda.is_available():\n",
" correct += (predicted.cpu() == labels.cpu()).sum()\n",
" else:\n",
" correct += (predicted == labels).sum()\n",
"\n",
" t_accuracy = 100 * correct // total\n",
" reported_t_accuracy = t_accuracy\n",
"\n",
" print('Iteration: {}. Loss: {}. V-Accuracy: {} T-Accuracy: {}'.format(iter, loss.item(), v_accuracy, reported_t_accuracy))\n",
"\n"
],
"execution_count": 9,
"outputs": [
{
"output_type": "stream",
"text": [
"Iteration: 20. Loss: 0.9298540353775024. V-Accuracy: 58 T-Accuracy: 57\n",
"Iteration: 40. Loss: 0.728527307510376. V-Accuracy: 68 T-Accuracy: 69\n",
"Iteration: 60. Loss: 0.5860320925712585. V-Accuracy: 73 T-Accuracy: 73\n",
"Iteration: 80. Loss: 0.4155048727989197. V-Accuracy: 81 T-Accuracy: 82\n",
"Iteration: 100. Loss: 0.4415952265262604. V-Accuracy: 81 T-Accuracy: 80\n",
"Iteration: 120. Loss: 0.22947010397911072. V-Accuracy: 84 T-Accuracy: 85\n",
"Iteration: 140. Loss: 0.323339581489563. V-Accuracy: 88 T-Accuracy: 88\n",
"Iteration: 160. Loss: 0.38627883791923523. V-Accuracy: 90 T-Accuracy: 91\n",
"Iteration: 180. Loss: 0.1758715659379959. V-Accuracy: 90 T-Accuracy: 91\n",
"Iteration: 200. Loss: 0.31855612993240356. V-Accuracy: 92 T-Accuracy: 92\n",
"Iteration: 220. Loss: 0.24535949528217316. V-Accuracy: 92 T-Accuracy: 93\n",
"Iteration: 240. Loss: 0.23214882612228394. V-Accuracy: 94 T-Accuracy: 93\n",
"Iteration: 260. Loss: 0.3989745080471039. V-Accuracy: 93 T-Accuracy: 93\n",
"Iteration: 280. Loss: 0.11613195389509201. V-Accuracy: 94 T-Accuracy: 94\n",
"Iteration: 300. Loss: 0.14072883129119873. V-Accuracy: 93 T-Accuracy: 94\n",
"Iteration: 320. Loss: 0.12234506011009216. V-Accuracy: 95 T-Accuracy: 95\n",
"Iteration: 340. Loss: 0.1554853469133377. V-Accuracy: 94 T-Accuracy: 95\n",
"Iteration: 360. Loss: 0.12669718265533447. V-Accuracy: 95 T-Accuracy: 95\n",
"Iteration: 380. Loss: 0.1484023630619049. V-Accuracy: 94 T-Accuracy: 95\n",
"Iteration: 400. Loss: 0.15945787727832794. V-Accuracy: 95 T-Accuracy: 95\n",
"Iteration: 420. Loss: 0.036814115941524506. V-Accuracy: 95 T-Accuracy: 96\n",
"Iteration: 440. Loss: 0.07982578128576279. V-Accuracy: 95 T-Accuracy: 96\n",
"Iteration: 460. Loss: 0.05860297754406929. V-Accuracy: 96 T-Accuracy: 96\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0H6oJyOX-W7n"
},
"source": [
"## Part 2: Theoretical questions"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CK0guD_b-ZAR"
},
"source": [
"#### Theory question 1: \n",
"What is the _vanishing gradients problem_ and why does it occur? Which activation functions are more or less impacted by this, and why?"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Ha2y337Li6b4"
},
"source": [
"#### Your answers(see next cell):\n",
"* Your answer here describing vanishing gradients problem\n",
"* Two examples of activation functions more impacted by vanishing gradients\n",
"* Two examples of activation functions less impacted by vanishing gradients, why are they impacted less?"
]
},
{
"source": [
"#### Answers for question 1:\n",
"\n",
"1) As for neural network like:\n",
"\n",
"![](./image/1.jpg)\n",
"\n",
"In each layer, we can get $y_{i}=\\sigma\\left(z_{i}\\right)=\\sigma\\left(w_{i} x_{i}+b_{i}\\right)$ ($\\sigma$ is activation function).\n",
"\n",
"According to Backpropagation, we can get:\n",
"\n",
"$$\\frac{\\delta C}{\\partial b_{1}}=\\frac{\\partial C}{\\partial y_{n}} \\cdot \\prod_{i=2}^{n}\\left(\\sigma^{\\prime}\\left(z_{i}\\right) w_{i}\\right)\\sigma^{\\prime}\\left(z_{1}\\right)$$\n",
"\n",
"So, if $\\sigma^{\\prime} < 1 $ and the initial value of $w < 1$, we will find with the deeper of the network, the derivative can be zero, which is called vanishing gradients problem.\n",
"\n",
"So, **sigmoid and tanh activation functions** can be more impacted by vanishing gradients, for the derivative of former is between 0 and 0.25, the latter is between 0 and 1. However, **Relu, Leaky ReLU and ELU activation functions **can be impacted less, because, when $x>0$ the derivatives are always 1, therefore, the gradient of the deep layers can also be transferred to the shallow layers.\n",
"\n",
"2) As for RNN\n",
"\n",
"![](./image/2.png)\n",
"\n",
"We can get:\n",
"\n",
"$$\\frac{\\partial \\operatorname{Loss}_{t}}{\\delta w_{h}}=\\sum_{k=0}^{t} \\frac{\\delta \\operatorname{Loss}_{t}}{\\delta L_{t}} \\frac{\\delta L_{t}}{\\delta h_{t}}\\left(\\prod_{j=k+1}^{t} \\frac{\\delta h_{j}}{\\delta h_{j-1}}\\right) \\frac{\\delta h_{k}}{\\delta w_{h}}$$\n",
"\n",
"$\\prod_{j=k+1}^{t} \\frac{\\delta h_{j}}{\\delta h_{j-1}}$ causes vanishing gradients problem, the reason is the same as 1).\n"
],
"cell_type": "markdown",
"metadata": {}
},
{
"cell_type": "markdown",
"metadata": {
"id": "xNXeJWpn6lfE"
},
"source": [
"#### Theory question 2: \n",
"Why do LSTMs help address the vanishing gradient problem compared to a vanilla RNN?"
]
},
{
"source": [
"#### Answers for question 2:\n",
"\n",
"$$\n",
"\\begin{array}{l}\n",
"i_{t}=\\sigma\\left(W_{i i} x_{t}+b_{i i}+W_{h i} h_{t-1}+b_{h i}\\right) \\\\\n",
"f_{t}=\\sigma\\left(W_{i f} x_{t}+b_{i f}+W_{h f} h_{t-1}+b_{h f}\\right) \\\\\n",
"o_{t}=\\sigma\\left(W_{i o} x_{t}+b_{i o}+W_{h o} h_{t-1}+b_{h o}\\right) \\\\\n",
"g_{t}=\\tanh \\left(W_{i g} x_{t}+b_{i g}+W_{h g} h_{t-1}+b_{h g}\\right) \\\\\n",
"c_{t}=f_{t} \\odot c_{t-1}+i_{t} \\odot g_{t} \\\\\n",
"h_{t}=o_{t} \\odot \\tanh \\left(c_{t}\\right)\n",
"\\end{array}\n",
"$$\n",
"\n",
"We noticed that the activation function of the first three gates is sigmoid, and this means that the output of these three gates are either close to 0 or close to 1. So as for $\\frac{\\delta c_{t}}{\\delta c_{t-1}}=f_{t} + ..., \\quad \\frac{\\delta h_{t}}{\\delta h_{t-1}}=o_{t} +...$, $f_{t}$ and $o_{t}$ are 0 or 1. When the output of gate is 1, the gradient can be well propagated in the LSTM, which greatly reduces the probability of gradient vanishing. When the gate is 0, it means that the information at the previous moment has no effect on the current moment, and it is not necessary to pass the gradient back to update the parameters."
],
"cell_type": "markdown",
"metadata": {}
},
{
"cell_type": "markdown",
"metadata": {
"id": "BpefJuwe_c8o"
},
"source": [
"#### Theory question 3: \n",
"\n",
"The plot below shows the training curves for three models A, B, and C, trained on the same dataset up to 100 epochs. The three models are a RNN, a LSTM and a GRU, not necessarily in that order.\n",
"\n",
"* Which could plausibly be which? Why? Please explain your reasoning.\n",
"\n",
"(In the cell below please set the values for A_model, B_model and C_model to be 'RNN', 'LSTM' or 'GRU'. This needs to be exact for the automatic marking.)"
]
},
{
"cell_type": "code",
"metadata": {
"id": "FBJJMuOUV-jJ",
"scrolled": false,
"colab": {
"base_uri": "https://localhost:8080/",
"height": 301
},
"outputId": "b4a371f1-a170-4c7f-892f-455540d50793"
},
"source": [
"from IPython.display import Image, display\n",
"display(Image(filename='Performance by epoch.png', width=550))"
],
"execution_count": 10,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "\n",
"text/plain": [
"<IPython.core.display.Image object>"
]
},
"metadata": {
"tags": [],
"image/png": {
"width": 550
}
}
}
]
},
{
"source": [
"# Answers below:\n",
"\n",
"A_model = 'GRU'\n",
"B_model = 'LSTM'\n",
"C_model = 'RNN'\n",
"\n",
"# Give your reasons below:\n",
"# See next cell."
],
"cell_type": "code",
"metadata": {
"id": "xoULWB4S6lfF"
},
"execution_count": null,
"outputs": []
},
{
"source": [
"Model C has the worst performance in the three models, so it is RNN, for it is more susceptible to the vanishing gradient problem. \n",
"\n",
"What's more we see that model B takes longer to train to get similar performance as model A, so model B should be LSTM, for LSTMs have more paramteters than GRU."
],
"cell_type": "markdown",
"metadata": {}
},
{
"cell_type": "markdown",
"metadata": {
"id": "94yhrfZk6lfF"
},
"source": [
"#### Theory question 4: \n",
"\n",
"When might you choose to use each of the three different types of models?"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "NuLS-d3LkOUR"
},
"source": [
"#### Your answers:\n",
"* Type of problem when best to use vanilla RNN: We use vanilla RNN when the problem is simple and does not include long sequences of inputs.\n",
"* Type of problem to use GRU: If the task required a quite deep network, such as speech recognition, then a GRU would make sense for its efficiency, which has less paramteters.\n",
"* Type of problem to use LSTM: If we have enough data sets and computation resources, then we can use the LSTM since it contains more parameters and maybe get a better result compared to the GRU.\n"
]
}
]
}