深度學習第37講：迴圈神經網路 RNN 入門

作者：louwill 個人公眾號：機器學習實驗室，致力於資料科學、機器學習和深度學習的應用與研究。有著多年的R語言和Python程式設計經驗

配套影片教程：

Python機器學習全流程專案實戰精

https：//

edu。hellobi。com/course/

284

涵蓋需求分析->資料採集->資料清洗與預處理->資料分析與視覺化->特徵工程->機器學習建模->模型調優->報告輸出。以Python為工具實現機器學習全流程。

在前面內容中，筆者和大家一起學習了深度神經網路 DNN 和卷積神經網路 CNN，其中我們在 CNN 上花了大量時間和精力對其基本原理和應用以及在計算機視覺領域的應用進行了詳細的介紹。從本節開始，筆者將和大家一起學習一種新的神經網路結構——迴圈神經網路（Recurrent Neural Networks）。

相較於 CNN 在影象識別和檢測方面的廣泛應用，基於序列模型的 RNN 的應用方面則是語音識別、文字翻譯和自然語言處理等其他更為激動人心的領域。所以，正如 CNN 在計算機視覺中的應用一樣，在 RNN 中筆者將重點關注其在自然語言處理的應用與研究。

RNN 使用場景

相較於 DNN 和 CNN，RNN 網路結構有什麼特別之處？它與前兩者又有哪些不一樣的結構設計？在對 RNN 的結構進行深入瞭解之前，我們必須對使用 RNN 面臨的問題場景進行梳理。假設我們在進行語音識別時，給定了一個輸入音訊片段 x，要求我們輸出一個文字片段 y，其中輸入 x 是一個按照時間播放的音訊片段，y 是一個按照順序排列的單片語成的一句話，所以在 RNN 中我們的輸入輸出都是序列性質的。針對這樣的輸入輸出（x，y）的有監督學習，最適合的神經網路結構就是迴圈神經網路。為什麼迴圈神經網路就最適用這種場景？在正式介紹 RNN 前，我們先來看下對於序列問題使用常規的神經網路看看會有什麼問題。

假設我們現在需要對輸入的一段話識別其中每個單詞是否是人名，即輸入是一段文字序列，輸出是一個每個單詞是否是人名的序列。假設這段話有9個單詞，我們將其轉化為 9 個 one-hot 向量輸入到標準神經網路中去，經過一些隱藏層和啟用函式得到最終 9 個值為 0/1 的輸出。但這樣做的問題有兩個：

一是輸入輸出的長度是否相等以及輸入大小不固定的問題。在語音識別問題中，輸入音訊序列和輸出文字序列很少情況下是長度相等的，普通網路難以處理這種問題。

二是普通神經網路結構不能共享從文字不同位置上學到的特徵，簡單來說就是如果神經網路已經從位置 1 學到了 louwill 是一個人名，那麼如果 louwill 出現在其他位置，神經網路就可以自動識別到它就是已經學習過的人名，這種共享可以減少訓練引數和提高網路效率，普通網路不能達到這樣的目的。

所以直觀上看，普通神經網路和迴圈神經網路的區別如下圖所示：

那麼 RNN 到底長什麼樣子呢？

RNN 結構

假設我們將一個句子輸入 RNN，第一個輸入的單詞就是 x1，我們將 x1 輸入到神經網路，經過隱藏層得到輸出判斷其是否為人名，即輸出為 y1。同時網路初始化隱藏層啟用值，並在隱藏層中結合輸入 x1 進行啟用計算傳入到下一個時間步。當輸入第二個單詞 x2 的時候，除了使用 x2 預測輸出 y2 之外，當前時間步的啟用函式會基於上一個時間步的進行啟用計算，即第二個時間步利用了第一個時間步的資訊。這便是迴圈（Recurrent）的含義。如此下去，一直到網路在最後一個時間步輸出 yn 和啟用值 an。所以在每一個時間步中，RNN 傳遞一個啟用值到下一個時間步中用於計算。

上圖便是迴圈神經網路的基本結構。左邊是一個統一的表現形式，右邊則是左邊的展開圖解。在這樣的迴圈神經網路中，當我們在預測 yt 時，不僅要使用 xt 的資訊，還要使用 xt-1 的資訊，因為在橫軸路徑上的隱藏層啟用資訊得以幫助我們預測 yt。

所以， RNN 單元結構通常需要兩次計算，一次是隱藏層隱變數啟用函式的計算，一個是結合隱變數和輸入的計算。一個 RNN 單元和兩次計算如下圖所示：

其中隱藏層的啟用函式一般採用

tanh

，而輸入輸出的啟用函式一半使用

sigmoid

或者

softmax

函式。

當多個 RNN 單元組合到一起便是 RNN 結構：

RNN 結構的 numpy 實現

定義

sigmoid

和

softmax

函式：

import numpy as np

def softmax（x）：

e_x = np。exp（x - np。max（x））

return e_x / e_x。sum（axis=0）

def sigmoid（x）：

return 1 / （1 + np。exp（-x））

定義 RNN 單元結構：

def rnn_cell_forward（xt， a_prev， parameters）：

“”“

Arguments：

xt —— your input data at timestep ”t“， numpy array of shape （n_x， m）。

a_prev —— Hidden state at timestep ”t-1“， numpy array of shape （n_a， m）

parameters —— python dictionary containing：

Wax —— Weight matrix multiplying the input， numpy array of shape （n_a， n_x）

Waa —— Weight matrix multiplying the hidden state， numpy array of shape （n_a， n_a）

Wya —— Weight matrix relating the hidden-state to the output， numpy array of shape （n_y， n_a）

ba —— Bias， numpy array of shape （n_a， 1）

by —— Bias relating the hidden-state to the output， numpy array of shape （n_y， 1）

Returns：

a_next —— next hidden state， of shape （n_a， m）

yt_pred —— prediction at timestep ”t“， numpy array of shape （n_y， m）

cache —— tuple of values needed for the backward pass， contains （a_next， a_prev， xt， parameters）

”“”

# Retrieve parameters from “parameters”

Wax = parameters［“Wax”］

Waa = parameters［“Waa”］

Wya = parameters［“Wya”］

ba = parameters［“ba”］

by = parameters［“by”］

# compute next activation state using the formula given above

a_next = np。tanh（np。matmul（Wax， xt） + np。matmul（Waa， a_prev） + ba）

# compute output of the current cell using the formula given above

yt_pred = softmax（np。matmul（Wya， a_next） + by）

# store values you need for backward propagation in cache

cache = （a_next， a_prev， xt， parameters）

return a_next， yt_pred， cache

計算示例：

基於 RNN 單元構建 RNN 網路結構：

def rnn_forward（x， a0， parameters）：

“”“

Arguments：

x —— Input data for every time-step， of shape （n_x， m， T_x）。

a0 —— Initial hidden state， of shape （n_a， m）

parameters —— python dictionary containing：

Waa —— Weight matrix multiplying the hidden state， numpy array of shape （n_a， n_a）

Wax —— Weight matrix multiplying the input， numpy array of shape （n_a， n_x）

Wya —— Weight matrix relating the hidden-state to the output， numpy array of shape （n_y， n_a）

ba —— Bias numpy array of shape （n_a， 1）

by —— Bias relating the hidden-state to the output， numpy array of shape （n_y， 1）

Returns：

a —— Hidden states for every time-step， numpy array of shape （n_a， m， T_x）

y_pred —— Predictions for every time-step， numpy array of shape （n_y， m， T_x）

caches —— tuple of values needed for the backward pass， contains （list of caches， x）

”“”

# Initialize “caches” which will contain the list of all caches

caches = ［］

# Retrieve dimensions from shapes of x and parameters［“Wya”］

n_x， m， T_x = x。shape

n_y， n_a = parameters［“Wya”］。shape

# initialize “a” and “y” with zeros （≈2 lines）

a = np。zeros（（n_a， m， T_x））

y_pred = np。zeros（（n_y， m， T_x））

# Initialize a_next （≈1 line）

a_next = a0

# loop over all time-steps

for t in range（T_x）：

# Update next hidden state， compute the prediction， get the cache

a_next， yt_pred， cache = rnn_cell_forward（x［：，：，t］， a_next， parameters）

# Save the value of the new “next” hidden state in a

a［：，：，t］ = a_next

# Save the value of the prediction in y

y_pred［：，：，t］ =

# Append “cache” to “caches”

caches。append（cache）

# store values needed for backward propagation in cache

caches = （caches， x）

return a， y_pred， caches

計算示例：

這樣一個簡單的 RNN 結構就搭建起來了。至於 RNN 的反向傳播和更為複雜的結構模式我們將在下一講繼續探討學習。

參考資料：

http：//

deeplearningai。com

https：//

zhuanlan。zhihu。com/p/22

930328

往期精彩：

深度學習第36講：影象例項分割經典論文研讀之 Mask R-CNN

深度學習第35講：影象語義分割經典論文研讀之 u-net

深度學習第34講：影象語義分割經典論文研讀之 FCN 全卷積網路

深度學習第33講：CNN影象語義分割和例項分割綜述

深度學習第32講：目標檢測演算法經典論文研讀之 yolo v3

深度學習第31講：目標檢測演算法經典論文研讀之 yolo v2/yolo 9000

深度學習第30講：目標檢測演算法經典論文研讀之SSD

深度學習第29講：目標檢測演算法經典論文研讀之 yolo v1

深度學習第28講：目標檢測演算法經典論文研讀之Faster R-CNN

深度學習第27講：目標檢測演算法經典論文研讀之Fast R-CNN

小蜜蜂問答

小蜜蜂問答

深度學習第37講：迴圈神經網路 RNN 入門

推薦文章

小蜜蜂問答

小蜜蜂問答

深度學習第37講：迴圈神經網路 RNN 入門

相關文章

光計算機的發展方向是？

神經網路正則化(2)：dropout正則化

Matlab user 轉 Python 筆記（1）：初遇Numpy

《機器學習實戰》筆記ch02-K近鄰演算法

推薦文章