花了一个多周看循环神经网络,慢慢有点感觉,没办法  智商不够用

当然有很多英文参考,讲的比较清楚,看中文容易给自己添堵

https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/

http://nikhilbuduma.com/2015/01/11/a-deep-dive-into-recurrent-neural-networks/

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

这四个连接基本上了解了啥时循环神经网络,深度学习有点过火了,很多拿去忽悠投资人去了

无论如何,人人都是产品经理  人人都搞深度学习   早晚会不会烂大街?

我放几张图,做个简单的解释

Recurrent Neural Networks have loops.

这个是一个简单的循环网络节点,中间那个节点自身循环只是一个概括,你把里面的分开就是下图所示

 

An unrolled recurrent neural network.
An unrolled recurrent neural network.
上面这张图清晰的解释了循环神经网络的架构,问题是太简单了, 没有体现循环神经网络的复杂性

 

 

Neural networks struggle with long term dependencies.

请不要被上图的A所迷惑,每个节点都是有三个部分 组成,你把中间的框看成memoery cell

他有三个gate, write gate,keep gate, read gate

 

 

LSTM Networks

 

A LSTM neural network.

Neural networks have hidden layers. Normally, the state of your hidden layer isbased ONLY on your input data. So, normally a neural network’s information flow would look like this:

input -> hidden -> output
 

This is straightforward. Certain types of input create certain types of hidden layers. Certain types of hidden layers create certain types of output layers. It’s kindof a closed system. Memory changes this. Memory means that the hidden layer is a combination of your input data at the current timestep and the hidden layer of the previous timestep.

(input + prev_hidden) -> hidden -> output
Why the hidden layer? Well, we could technically do this.

(input + prev_input) -> hidden -> output
However, we’d be missing out. I encourage you to sit and consider the difference between these two information flows. For a little helpful hint, consider how this plays out. Here, we have 4 timesteps of a recurrent neural network pulling information from the previous hidden layer.

(input + empty_hidden) -> hidden -> output
(input + prev_hidden) -> hidden -> output
(input + prev_hidden) -> hidden -> output
(input + prev_hidden) -> hidden -> output
And here, we have 4 timesteps of a recurrent neural network pulling information from the previous input layer

(input + empty_input) -> hidden -> output
(input + prev_input) -> hidden -> output
(input + prev_input) -> hidden -> output
(input + prev_input) -> hidden -> output
Maybe, if I colored things a bit, it would become more clear. Again, 4 timesteps with hidden layer recurrence:

(input + empty_hidden) -> hidden -> output
(input + prev_hidden) -> hidden -> output
(input + prev_hidden) -> hidden -> output
(input + prev_hidden ) -> hidden -> output
…. and 4 timesteps with input layer recurrence….

(input + empty_input) -> hidden -> output
(input + prev_input) -> hidden -> output
(input + prev_input) -> hidden -> output
(input + prev_input) -> hidden -> output
Focus on the last hidden layer (4th line). In the hidden layer recurrence, we see a presence of every input seen so far. In the input layer recurrence, it’s exclusively defined by the current and previous inputs. This is why we model hidden recurrence. Hidden recurrence learns what to remember whereas input recurrence is hard wired to just remember the immediately previous datapoint.

 

Long Short Term Memory

To address these problems, researchers proposed a modified architecture for recurrent neural networks to help bridge long time lags between forcing inputs and appropriate responses and protect against exploding gradients. The architecture forces constant error flow (thus, neither exploding nor vanishing) through the internal state of special memory units. This long short term memory (LSTM) architecture utlized units that were structured as follows:

LSTM

Structure of the basic LSTM unit

The LSTM unit consists of a memory cell which attempts to store information for extended periods of time. Access to this memory cell is protected by specialized gate neurons – the keep, write, and read gates – which are all logistic units. These gate cells, instead of sending their activities as inputs to other neurons, set the weights on edges connecting the rest of the neural net to the memory cell. The memory cell is a linear neuron that has a connection to itself. When the keep gate is turned on (with an activity of 1), the self connection has weight one and the memory cell writes its contents into itself. When the keep gate outputs a zero, the memory cell forgets its previous contents. The write gate allows the rest of the neural net to write into the memory cell when it outputs a 1 while the read gate allows the rest of the neural net to read from the memory cell when it outputs a 1.

So how exactly does this force a constant error flow through time to locally protect against exploding and vanishing gradients? To visualize this, let’s unroll the LSTM unit through time:

LSTM unrolled

Unrolling the LSTM unit through the time domain

At first, the keep gate is set to 0 and the write gate is set to 1, which places 4.2 into the memory cell. This value is retained in the memory cell by a subsequent keep value of 1 and protected from read/write by values of 0. Finally, the cell is read and then cleared. Now we try to follow the backpropagation from the point of loading 4.2 into the memory cell to the point of reading 4.2 from the cell and its subsequent clearing