Learning In RNN Part II | StriveZs的博客

Learning In RNN Part II

Learning In RNN Part II

Learning Tager: Make the loss be minimize that evaluating by cost function.

figure1

Unfortunately

  • RNN-based network is not always easy to learn.

figure2

  • Th error surface is rought

figure3

Helpful Techniques

Long Short-term Memory(LSTM)

Why replace RNN to LSTM?

Can deal with gradient vanishing(消灭,等于0) (not gradient explode爆炸激增)
It can make your error surface to be flatting nor not steep.
The specify performance is that it can remove the flat regions and solve the problem of gradient vanishing, but not gradient explode.

How to work:

The different operation between RNN and LSTM is that RNN can reomve value in memory after each computation and store new value. But LSTM can add the previous value to new value in cell memory after each computation.(Concretely depend on the value of forget gate)
So the difference of RNN and LSTM is if a weight influence value of memory, the influence never disappears unless forget gate is closed.
If forget gate is opened, there no gradient vanishing.

Summarization

  • can deal with gradient vanishing(not gradient explode)
  • Memory and input are added
  • The influence never disappears
    unless forget gate is closed
  • No Gradient vanishing(If forget gate is opened)

Gated Recurrent Unit(GRU):simpier thant LSTM

Other helpful techniques:

figure4

More Applications

Many to one

  • Input is a vector sequence, but output is only one vector.

Sentiment Analysis:(意见分析)

figure5

figure6

Many to Many (Output is shorter)

  • Both input and output are both sequences, but the output is shorter.

Speech Recognition:

figure7

How to differentiate?

  • Connectionist Temporal Classification(CTC,联结主义时间分类)

==Add an extra symbol “Φ” representing “null”.==

figure8

Use this method to slove the problem like differentiate “好棒” or “好棒棒”.

CTC Training

Acoustic Features:(声音特征)
ALL possible alignments(序列/顺序) are considered as correct because we don’t know what alignment is correct. So we can list all alignments to train.

figure9

CTC: example

figure10

Many to Many (No Limitation)

  • Both input and output are both sequences with differnet lengths. ➡ Sequence to sequence learning
    Machine Translate(Machine Learning ➡ 机器学习)

bag-of-word:

figure11

Above model can’t stop until it’s interrupted.

How to make the network stop

  • Adda a symbol ‘===’(断)

figure12

Beyond Sequence

  • Syntactic parsing(句法分析)

Transform Tree Structure to sequence

Conversion principle:

figure13

We can transform sentence tree to sequence by using this principle and train a sequence model to recognize sentence.

Sequence-to-sequence

Auto-encoder-Text

  • To understand the meaning of a word sequence, the order of the words can not be ignored.

figure14

figure15

figure16

Auto-encoder-Speech

  • Dimension reduction for a sequence with variable length

audio segments()word-level->Fixed-length vector

figure17

Audio Search Principle:

figure18

How to transform audio segment to vector

figure19

ps: jointly 共同地 同时地 similarity 相似 类似 embedding 埋入/埋葬

Visualizing embedding vectors of the words

figure20

Sequence-to-sequence Learning Demo:Chat-bot

Learning Principle:

figure21

Data Set:
40000 sentences in Movie album and discussion of presidential election in American.

Attention-based Model

Structure Version 1:

figure22

Structure Version 2:
==Neural Turing Machine(神经图灵机)==

figure23

Reading Comprehension

figure24

Visual Question Answering

figure25

Principle:
==A vector for each region==

figure26

Speech Question Answering

  • TOEFL Listening Comprehension Test By Machine

Example:

  1. Audio Story: the original story is 5 min long
  2. Question: “what is possible of Venus’ clouds?”
  3. Choices:
    1. gased released as a result of volcanic activity
    2. chemical reactions caused by high surface temperatures
    3. bursts of radio energy from the plane’s surface
    4. strong winds that blow dust into the atmosphere

Model Architecture

Everything is learned from training examples.

figure27

Deep & Structure

Integrated together

  • Speech Recognition: CNN/LSTM/DNN+HMM
    figure28

Bayes theorem

figure29

  • Sematic Tagging: Bi-directional LSTM+CRF/Structured SVM
    Testing:
    figure31

figure30

StriveZs wechat
Hobby lead  creation, technology change world.