๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

๐Ÿ”ฌ Science/๐Ÿ“ป Signal

Neural Network (CNN, RNN, LSTM, BiLSTM)

CNN(Convolution Neural Network), ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง

์ˆ˜๋ฉด ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๊ด€ํ•œ ๋…ผ๋ฌธ์„ ๊ฒ€์ƒ‰ํ•˜๋ฉด, ๋Œ€๋ถ€๋ถ„ CNN์„ ํ™œ์šฉํ•œ ๋ถ„๋ฅ˜ ๋ฐฉ๋ฒ•์ด ๋‚˜์˜จ๋‹ค. 

 

์ด ์‹ ๊ฒฝ๋ง์€ ์ž…๋ ฅ์ด '์ด๋ฏธ์ง€'๋กœ ๊ตฌ์„ฑ๋ผ ์žˆ๋‹ค๋Š” ์ ์„ ํ™œ์šฉํ•œ๋‹ค.

์ผ๋ฐ˜ ์‹ ๊ฒฝ๋ง๊ณผ ๋‹ฌ๋ฆฌ CNN์˜ ๋ ˆ์ด์–ด์—๋Š” ๋„ˆ๋น„, ๋†’์ด, ๊นŠ์ด๋กœ 3์ฐจ์› ๋ฐฐ์—ด๋œ ๋‰ด๋Ÿฐ์ด ์žˆ๋‹ค. 

"๊นŠ์ด" : ์ „์ฒด ์‹ ๊ฒฝ๋ง์˜ ๊นŠ์ด๊ฐ€ ์•„๋‹Œ, ํ™œ์„ฑํ™” ๋ณผ๋ฅจ์˜ 3์ฐจ์›์„ ์˜๋ฏธํ•˜๋ฉฐ, ๋„คํŠธ์›Œํฌ์˜ ์ด ๋ ˆ์ด์–ด ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Œ.

ํ•œ ๋ ˆ์ด์–ด์˜ ๋‰ด๋Ÿฐ์€ ๋ชจ๋“  ๋‰ด๋Ÿฐ์ด ์™„์ „ํžˆ ์—ฐ๊ฒฐ๋œ ๋ฐฉ์‹์ด ์•„๋‹ˆ๋ผ, ๊ทธ ์•ž์— ์žˆ๋Š” ๋ ˆ์ด์–ด์˜ ์ž‘์€ ์˜์—ญ์—๋งŒ ์—ฐ๊ฒฐ๋จ. 

์ฒ˜์ ์ž…๋ ฅ ์ด๋ฏธ์ง€๊ฐ€ 32*32*3์ด๋ฉด -> ์ตœ์ข… ์ถœ๋ ฅ ๋ ˆ์ด์–ด์˜ ํฌ๊ธฐ๋Š” 1*1*10์ด ๋˜๋Š”๋ฐ,

Convent ์•„ํ‚คํ…์ฒ˜์˜ ๋งˆ์ง€๋ง‰์—๋Š” full image๋ฅผ ๊นŠ์ด dimension์„ ๋”ฐ๋ผ ๋ฐฐ์—ด๋œ single vector of class scores๋กœ ๋ณ€ํ™˜. 

 

๊ฐ„๋‹จํ•œ ConvNet -> sequence of layers์ž„.

๋ชจ๋“  ๋ ˆ์ด์–ด๋Š” ์ฐจ๋ณ„ํ™” ๊ฐ€๋Šฅํ•œ ๊ธฐ๋Šฅ์„ ํ†ตํ•ด, ํ•œ ๋ณผ๋ฅจ์˜ ํ™œ์„ฑํ™”๋ฅผ ๋‹ค๋ฅธ ๋ณผ๋ฅจ์œผ๋กœ ๋ณ€ํ™˜ํ•จ. 

์„ธ ๊ฐ€์ง€ ์ฃผ์š” ์œ ํ˜•์˜ ์ฃผ์š” ๋ ˆ์ด์–ด๋ฅผ ์‚ฌ์šฉํ•ด ์•„ํ‚คํ…์ณ ๊ตฌ์ถ•

Convolution Layer - Pooling Layer - Fully Connected Layer

[INPUT - CONV - RELU - POOL - FC]

 

์›๋ณธ ํ”ฝ์…€ ๊ฐ’์—์„œ ์ตœ์ข… ํด๋ž˜์Šค scores๋กœ ์›๋ณธ ์ด๋ฏธ์ง€๋ฅผ ๋ ˆ์ด์–ด๋ณ„๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ.

์ผ๋ถ€ ๋ ˆ์ด์–ด์—๋Š” ๋งค๊ฐœ ๋ณ€์ˆ˜๊ฐ€ ํฌํ•จ๋ผ ์žˆ๊ณ , ๊ทธ๋ ‡์ง€ ์•Š์€ ๋ ˆ์ด์–ด๋„ ์กด์žฌ.

ํŠนํžˆ, CONV/FC ๋ ˆ์ด์–ด๋Š” ์ž…๋ ฅ ๋ณผ๋ฅจ์˜ ํ™œ์„ฑํ™”๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๋งค๊ฐœ๋ณ€์ˆ˜(๋‰ด๋Ÿฐ์˜ weight์™€ biases)์˜ ํ•จ์ˆ˜์ธ ๋ณ€ํ™˜์„ ์ˆ˜ํ–‰.

๋ฐ˜๋ฉด, RELU/POOL ๋ ˆ์ด์–ด๋Š” ๊ณ ์ •๋œ ํ•จ์ˆ˜๋ฅผ ๊ตฌํ˜„. CONV/FC ๋ ˆ์ด์–ด์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” gradient descent๋กœ ํ›ˆ๋ จ๋˜์–ด,

ConvNet์ด ๊ณ„์‚ฐํ•œ class scores๊ฐ€ ๊ฐ ์ด๋ฏธ์ง€์˜ training set์—์„œ์˜ label๊ณผ ์ผ์น˜ํ•จ.

 

CNN ์•„ํ‚คํ…์ณ์—๋„ ๋‹ค์–‘ํ•œ ์ข…๋ฅ˜๊ฐ€ ์žˆ์Œ. ImageNet, AlexNet, VGG 16,19, GoogLeNet, ResNet, SENet

 

 

 

RNN(Recurrent Neural Network), ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง

๊ฐ€๋ณ€ ๊ธธ์ด์˜ ์ˆœ์ฐจ์  ํ˜น์€ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹ ์•„ํ‚คํ…์ณ 

 

์ˆœ์ฐจ ๋ฐ์ดํ„ฐ๋ž€ : ์ˆœ์ฐจ์  ๊ตฌ์„ฑ ์š”์†Œ๊ฐ€ ๋ณต์žกํ•œ ์˜๋ฏธ์™€ ๊ทœ์น™์— ๋”ฐ๋ผ ์ƒํ˜ธ ์—ฐ๊ด€๋˜๋Š” ๋ฐ์ดํ„ฐ

์ด๋•Œ, Image Caption์„ ํ•˜๋ ค๋ฉด Image -> Sequence of words (one to many) ๋ฐฉ๋ฒ•์„ ์ทจํ•˜๊ณ ,

action prediction์ด ํ•„์š”ํ•  ๊ฒฝ์šฐ, sequence of video frames -> action class (many to one) ๋ฐฉ๋ฒ•์„ ์ทจํ•จ.

for video captioning, sequence of video frames -> caption (many to many) ๋ฐฉ๋ฒ•. 

์ˆ˜๋ฉด ๋ถ„๋ฅ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฒฝ์šฐ ๋‘๋ฒˆ ์งธ์ด์ง€ ์•Š์„๊นŒ. 

 

RNN = internal state (์‹œํ€€์Šค ์ง„ํ–‰ํ•˜๋Š” ๋™์•ˆ ์—…๋ฐ์ดํŠธ)

 

๋ช‡๋ช‡ ์‹œ๊ฐ„ ๋‹จ๊ณ„์—์„œ old state์— input vector๊ฐ’์„ ์ง‘์–ด ๋„ฃ์–ด parameters W๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ํ•จ์ˆ˜์— ์ง‘์–ด๋„ฃ์–ด ๋Œ๋ฆฌ๊ณ ,

new state์„ ์–ป๋Š” ์‹์ด๋‹ค. (new state = f(old state, xt)

 

 

 

LSTM(Long Short Term Memory Network) 

RNN์ด ๊ฐ€์ง„ long-term dependencies๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•

 

RNN์€ ํ˜„์žฌ ์ •๋ณด์— ๋Œ€ํ•œ ์ดํ•ด๋ฅผ ์œ„ํ•ด ์ด์ „ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค. 

๊ทธ๋Ÿฐ๋ฐ ํ˜„์žฌ ๋‹จ๊ณ„์—์„œ ํ•„์š”ํ•œ ์ •๋ณด๊ฐ€ ์ดˆ๊ธฐ ๋‹จ๊ณ„ ํ˜น์€ ๋จผ ๊ณผ๊ฑฐ์˜ ๋‹จ๊ณ„์—์„œ์˜ ์ •๋ณด๋ผ๋ฉด? ๊ฒฉ์ฐจ๊ฐ€ ์ปค์ง€๋ฉด ์ •๋ณด ์—ฐ๊ฒฐ์„ฑ์ด ๋ถ€์กฑํ•˜๋‹ค.

๊ทธ๋ž˜์„œ ๊ทธ ์žฅ๊ธฐ ์˜์กด์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‚˜์˜จ๊ฒŒ LSTM ๋„คํŠธ์›Œํฌ

 

๊ธฐ๋ณธ์ ์ธ ํ‹€ ์ž์ฒด๋Š” RNN์ฒ˜๋Ÿผ ์ˆœ์ฐจ ๋ฐ์ดํ„ฐ๋ฅผ ์—ฐ๊ฒฐ~์—ฐ๊ฒฐ~ํ•ด์„œ ์ „๋‹ฌํ•˜๋Š” ๊ตฌ์กฐ์ธ๋ฐ,

[Cell State - ์„ ํ˜•์ ์ธ ์ƒํ˜ธ์ž‘์šฉ๋งŒ ์ ์šฉํ•˜๋ฉด์„œ, ์ผ์ •ํ•œ ์ •๋ณด๋ฅผ ๊ทธ๋Œ€๋กœ ์ „๋‹ฌํ•˜๋Š” ์ƒํƒœ] - ๋งจ ์œ„ ๋ผ์ธ

[Forget Gate Layer - ๊ณผ๊ฑฐ์˜ ์ •๋ณด๋ฅผ ๋ฒ„๋ฆด ์ง€ ๊ฒฐ์ •ํ•ด์„œ ์ณ๋‚ด๋Š” ๋ถ€๋ถ„] - ์•„๋ž˜์—์„œ ์ฒซ๋ฒˆ์งธ ์„ธ๋กœ ๋ผ์ธ

[Input Gate Layer - ํ˜„์žฌ์˜ cell state value์— ์–ผ๋งˆ๋ฅผ ๋”ํ• ์ง€] - ์•„๋ž˜ ๋‘๋ฒˆ์งธ ์„ธ๋กœ ๋ผ์ธ

[Update Gate - forget gate๋ฅผ ํ†ต๊ณผํ•œ ๊ฐ’ ์ •๋ณด & input gate๋ฅผ ํ†ต๊ณผํ•œ ๊ฐ’ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•ด update] - ์•„๋ž˜ ์„ธ๋ฒˆ์งธ

[Output Gate - ์ตœ์ข…๊ฐ’] ๊ตฌ์กฐ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ, ๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ข€ ๋” ์ฒด๊ณ„์ ์œผ๋กœ ํ‰๊ฐ€ํ•˜๊ณ  ์ €์žฅํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

 

Bi-LSTM(Bidirectional LSTM)

์ •๋ฐฉํ–ฅ ํ•™์Šต ์ง„ํ–‰ ๊ณผ์ •์—์„œ, ๋งˆ์ง€๋ง‰ ๋…ธ๋“œ์—์„œ ๋’คto์•ž (์—ญ๋ฐฉํ–ฅ)์œผ๋กœ ์‹คํ–‰๋˜๋Š” ๋‹ค๋ฅธ LSTM์„ ์ถ”๊ฐ€ํ•œ ๊ฒƒ

 

์—ญ๋ฐฉํ–ฅ์œผ๋กœ ์ •๋ณด๋ฅผ ์ „๋‹ฌํ•˜๋Š” hidden layer์„ ์ถ”๊ฐ€ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๊ฐ ์‹œ์ ์—์„œ hidden state๊ฐ€ ์ด์ „ ์‹œ์  & ๋ฏธ๋ž˜ ์‹œ์ ์˜ ์ •๋ณด๋ฅผ ๋ชจ๋‘ ๊ฐ–๋Š” ํšจ๊ณผ๊ฐ€ ์žˆ์Œ. 

 


์ฐธ๊ณ 

http://cs231n.stanford.edu/schedule.html

 

Stanford University CS231n: Deep Learning for Computer Vision

04/20 Lecture 6: CNN Architectures Batch Normalization Transfer learning AlexNet, VGG, GoogLeNet, ResNet [slides] AlexNet, VGGNet, GoogLeNet, ResNet

cs231n.stanford.edu

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

 

Understanding LSTM Networks -- colah's blog

Posted on August 27, 2015 <!-- by colah --> Humans don’t start their thinking from scratch every second. As you read this essay, you understand each word based on your understanding of previous words. You don’t throw everything away and start thinking

colah.github.io

https://sirzzang.github.io/ai/AI-01-LSTM-04/

 

[DL] LSTM_4.์–‘๋ฐฉํ–ฅ ๋ชจ๋ธ ์•„ํ‚คํ…์ณ ๋ฐ ๊ตฌํ˜„

«Neural Network» ์–‘๋ฐฉํ–ฅ LSTM ๋ชจ๋ธ์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์ž.

sirzzang.github.io

 

๋ฐ˜์‘ํ˜•