Processing math: 100%
๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

๐Ÿ”ฌ Science/๐Ÿ“ป Signal

Neural Network CNN,RNN,LSTM,BiLSTM

CNNConvolutionNeuralNetwork, ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง

์ˆ˜๋ฉด ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๊ด€ํ•œ ๋…ผ๋ฌธ์„ ๊ฒ€์ƒ‰ํ•˜๋ฉด, ๋Œ€๋ถ€๋ถ„ CNN์„ ํ™œ์šฉํ•œ ๋ถ„๋ฅ˜ ๋ฐฉ๋ฒ•์ด ๋‚˜์˜จ๋‹ค. 

 

์ด ์‹ ๊ฒฝ๋ง์€ ์ž…๋ ฅ์ด '์ด๋ฏธ์ง€'๋กœ ๊ตฌ์„ฑ๋ผ ์žˆ๋‹ค๋Š” ์ ์„ ํ™œ์šฉํ•œ๋‹ค.

์ผ๋ฐ˜ ์‹ ๊ฒฝ๋ง๊ณผ ๋‹ฌ๋ฆฌ CNN์˜ ๋ ˆ์ด์–ด์—๋Š” ๋„ˆ๋น„, ๋†’์ด, ๊นŠ์ด๋กœ 3์ฐจ์› ๋ฐฐ์—ด๋œ ๋‰ด๋Ÿฐ์ด ์žˆ๋‹ค. 

"๊นŠ์ด" : ์ „์ฒด ์‹ ๊ฒฝ๋ง์˜ ๊นŠ์ด๊ฐ€ ์•„๋‹Œ, ํ™œ์„ฑํ™” ๋ณผ๋ฅจ์˜ 3์ฐจ์›์„ ์˜๋ฏธํ•˜๋ฉฐ, ๋„คํŠธ์›Œํฌ์˜ ์ด ๋ ˆ์ด์–ด ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Œ.

ํ•œ ๋ ˆ์ด์–ด์˜ ๋‰ด๋Ÿฐ์€ ๋ชจ๋“  ๋‰ด๋Ÿฐ์ด ์™„์ „ํžˆ ์—ฐ๊ฒฐ๋œ ๋ฐฉ์‹์ด ์•„๋‹ˆ๋ผ, ๊ทธ ์•ž์— ์žˆ๋Š” ๋ ˆ์ด์–ด์˜ ์ž‘์€ ์˜์—ญ์—๋งŒ ์—ฐ๊ฒฐ๋จ. 

์ฒ˜์ ์ž…๋ ฅ ์ด๋ฏธ์ง€๊ฐ€ 32*32*3์ด๋ฉด -> ์ตœ์ข… ์ถœ๋ ฅ ๋ ˆ์ด์–ด์˜ ํฌ๊ธฐ๋Š” 1*1*10์ด ๋˜๋Š”๋ฐ,

Convent ์•„ํ‚คํ…์ฒ˜์˜ ๋งˆ์ง€๋ง‰์—๋Š” full image๋ฅผ ๊นŠ์ด dimension์„ ๋”ฐ๋ผ ๋ฐฐ์—ด๋œ single vector of class scores๋กœ ๋ณ€ํ™˜. 

 

๊ฐ„๋‹จํ•œ ConvNet -> sequence of layers์ž„.

๋ชจ๋“  ๋ ˆ์ด์–ด๋Š” ์ฐจ๋ณ„ํ™” ๊ฐ€๋Šฅํ•œ ๊ธฐ๋Šฅ์„ ํ†ตํ•ด, ํ•œ ๋ณผ๋ฅจ์˜ ํ™œ์„ฑํ™”๋ฅผ ๋‹ค๋ฅธ ๋ณผ๋ฅจ์œผ๋กœ ๋ณ€ํ™˜ํ•จ. 

์„ธ ๊ฐ€์ง€ ์ฃผ์š” ์œ ํ˜•์˜ ์ฃผ์š” ๋ ˆ์ด์–ด๋ฅผ ์‚ฌ์šฉํ•ด ์•„ํ‚คํ…์ณ ๊ตฌ์ถ•

Convolution Layer - Pooling Layer - Fully Connected Layer

[INPUT - CONV - RELU - POOL - FC]

 

์›๋ณธ ํ”ฝ์…€ ๊ฐ’์—์„œ ์ตœ์ข… ํด๋ž˜์Šค scores๋กœ ์›๋ณธ ์ด๋ฏธ์ง€๋ฅผ ๋ ˆ์ด์–ด๋ณ„๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ.

์ผ๋ถ€ ๋ ˆ์ด์–ด์—๋Š” ๋งค๊ฐœ ๋ณ€์ˆ˜๊ฐ€ ํฌํ•จ๋ผ ์žˆ๊ณ , ๊ทธ๋ ‡์ง€ ์•Š์€ ๋ ˆ์ด์–ด๋„ ์กด์žฌ.

ํŠนํžˆ, CONV/FC ๋ ˆ์ด์–ด๋Š” ์ž…๋ ฅ ๋ณผ๋ฅจ์˜ ํ™œ์„ฑํ™”๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๋งค๊ฐœ๋ณ€์ˆ˜๋‰ด๋Ÿฐ์˜weight์™€biases์˜ ํ•จ์ˆ˜์ธ ๋ณ€ํ™˜์„ ์ˆ˜ํ–‰.

๋ฐ˜๋ฉด, RELU/POOL ๋ ˆ์ด์–ด๋Š” ๊ณ ์ •๋œ ํ•จ์ˆ˜๋ฅผ ๊ตฌํ˜„. CONV/FC ๋ ˆ์ด์–ด์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” gradient descent๋กœ ํ›ˆ๋ จ๋˜์–ด,

ConvNet์ด ๊ณ„์‚ฐํ•œ class scores๊ฐ€ ๊ฐ ์ด๋ฏธ์ง€์˜ training set์—์„œ์˜ label๊ณผ ์ผ์น˜ํ•จ.

 

CNN ์•„ํ‚คํ…์ณ์—๋„ ๋‹ค์–‘ํ•œ ์ข…๋ฅ˜๊ฐ€ ์žˆ์Œ. ImageNet, AlexNet, VGG 16,19, GoogLeNet, ResNet, SENet

 

 

 

RNNRecurrentNeuralNetwork, ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง

๊ฐ€๋ณ€ ๊ธธ์ด์˜ ์ˆœ์ฐจ์  ํ˜น์€ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹ ์•„ํ‚คํ…์ณ 

 

์ˆœ์ฐจ ๋ฐ์ดํ„ฐ๋ž€ : ์ˆœ์ฐจ์  ๊ตฌ์„ฑ ์š”์†Œ๊ฐ€ ๋ณต์žกํ•œ ์˜๋ฏธ์™€ ๊ทœ์น™์— ๋”ฐ๋ผ ์ƒํ˜ธ ์—ฐ๊ด€๋˜๋Š” ๋ฐ์ดํ„ฐ

์ด๋•Œ, Image Caption์„ ํ•˜๋ ค๋ฉด Image -> Sequence of words onetomany ๋ฐฉ๋ฒ•์„ ์ทจํ•˜๊ณ ,

action prediction์ด ํ•„์š”ํ•  ๊ฒฝ์šฐ, sequence of video frames -> action class manytoone ๋ฐฉ๋ฒ•์„ ์ทจํ•จ.

for video captioning, sequence of video frames -> caption manytomany ๋ฐฉ๋ฒ•. 

์ˆ˜๋ฉด ๋ถ„๋ฅ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฒฝ์šฐ ๋‘๋ฒˆ ์งธ์ด์ง€ ์•Š์„๊นŒ. 

 

RNN = internal state ์‹œํ€€์Šค์ง„ํ–‰ํ•˜๋Š”๋™์•ˆ์—…๋ฐ์ดํŠธ

 

๋ช‡๋ช‡ ์‹œ๊ฐ„ ๋‹จ๊ณ„์—์„œ old state์— input vector๊ฐ’์„ ์ง‘์–ด ๋„ฃ์–ด parameters W๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ํ•จ์ˆ˜์— ์ง‘์–ด๋„ฃ์–ด ๋Œ๋ฆฌ๊ณ ,

new state์„ ์–ป๋Š” ์‹์ด๋‹ค. newstate=f(oldstate,xt

 

 

 

LSTMLongShortTermMemoryNetwork 

RNN์ด ๊ฐ€์ง„ long-term dependencies๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•

 

RNN์€ ํ˜„์žฌ ์ •๋ณด์— ๋Œ€ํ•œ ์ดํ•ด๋ฅผ ์œ„ํ•ด ์ด์ „ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค. 

๊ทธ๋Ÿฐ๋ฐ ํ˜„์žฌ ๋‹จ๊ณ„์—์„œ ํ•„์š”ํ•œ ์ •๋ณด๊ฐ€ ์ดˆ๊ธฐ ๋‹จ๊ณ„ ํ˜น์€ ๋จผ ๊ณผ๊ฑฐ์˜ ๋‹จ๊ณ„์—์„œ์˜ ์ •๋ณด๋ผ๋ฉด? ๊ฒฉ์ฐจ๊ฐ€ ์ปค์ง€๋ฉด ์ •๋ณด ์—ฐ๊ฒฐ์„ฑ์ด ๋ถ€์กฑํ•˜๋‹ค.

๊ทธ๋ž˜์„œ ๊ทธ ์žฅ๊ธฐ ์˜์กด์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‚˜์˜จ๊ฒŒ LSTM ๋„คํŠธ์›Œํฌ

 

๊ธฐ๋ณธ์ ์ธ ํ‹€ ์ž์ฒด๋Š” RNN์ฒ˜๋Ÿผ ์ˆœ์ฐจ ๋ฐ์ดํ„ฐ๋ฅผ ์—ฐ๊ฒฐ~์—ฐ๊ฒฐ~ํ•ด์„œ ์ „๋‹ฌํ•˜๋Š” ๊ตฌ์กฐ์ธ๋ฐ,

[Cell State - ์„ ํ˜•์ ์ธ ์ƒํ˜ธ์ž‘์šฉ๋งŒ ์ ์šฉํ•˜๋ฉด์„œ, ์ผ์ •ํ•œ ์ •๋ณด๋ฅผ ๊ทธ๋Œ€๋กœ ์ „๋‹ฌํ•˜๋Š” ์ƒํƒœ] - ๋งจ ์œ„ ๋ผ์ธ

[Forget Gate Layer - ๊ณผ๊ฑฐ์˜ ์ •๋ณด๋ฅผ ๋ฒ„๋ฆด ์ง€ ๊ฒฐ์ •ํ•ด์„œ ์ณ๋‚ด๋Š” ๋ถ€๋ถ„] - ์•„๋ž˜์—์„œ ์ฒซ๋ฒˆ์งธ ์„ธ๋กœ ๋ผ์ธ

[Input Gate Layer - ํ˜„์žฌ์˜ cell state value์— ์–ผ๋งˆ๋ฅผ ๋”ํ• ์ง€] - ์•„๋ž˜ ๋‘๋ฒˆ์งธ ์„ธ๋กœ ๋ผ์ธ

[Update Gate - forget gate๋ฅผ ํ†ต๊ณผํ•œ ๊ฐ’ ์ •๋ณด & input gate๋ฅผ ํ†ต๊ณผํ•œ ๊ฐ’ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•ด update] - ์•„๋ž˜ ์„ธ๋ฒˆ์งธ

[Output Gate - ์ตœ์ข…๊ฐ’] ๊ตฌ์กฐ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ, ๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ข€ ๋” ์ฒด๊ณ„์ ์œผ๋กœ ํ‰๊ฐ€ํ•˜๊ณ  ์ €์žฅํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

 

Bi-LSTMBidirectionalLSTM

์ •๋ฐฉํ–ฅ ํ•™์Šต ์ง„ํ–‰ ๊ณผ์ •์—์„œ, ๋งˆ์ง€๋ง‰ ๋…ธ๋“œ์—์„œ ๋’คto์•ž ์—ญ๋ฐฉํ–ฅ์œผ๋กœ ์‹คํ–‰๋˜๋Š” ๋‹ค๋ฅธ LSTM์„ ์ถ”๊ฐ€ํ•œ ๊ฒƒ

 

์—ญ๋ฐฉํ–ฅ์œผ๋กœ ์ •๋ณด๋ฅผ ์ „๋‹ฌํ•˜๋Š” hidden layer์„ ์ถ”๊ฐ€ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๊ฐ ์‹œ์ ์—์„œ hidden state๊ฐ€ ์ด์ „ ์‹œ์  & ๋ฏธ๋ž˜ ์‹œ์ ์˜ ์ •๋ณด๋ฅผ ๋ชจ๋‘ ๊ฐ–๋Š” ํšจ๊ณผ๊ฐ€ ์žˆ์Œ. 

 


์ฐธ๊ณ 

http://cs231n.stanford.edu/schedule.html

 

Stanford University CS231n: Deep Learning for Computer Vision

04/20 Lecture 6: CNN Architectures Batch Normalization Transfer learning AlexNet, VGG, GoogLeNet, ResNet [slides] AlexNet, VGGNet, GoogLeNet, ResNet

cs231n.stanford.edu

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

 

Understanding LSTM Networks -- colah's blog

Posted on August 27, 2015 <!-- by colah --> Humans donโ€™t start their thinking from scratch every second. As you read this essay, you understand each word based on your understanding of previous words. You donโ€™t throw everything away and start thinking

colah.github.io

https://sirzzang.github.io/ai/AI-01-LSTM-04/

 

[DL] LSTM_4.์–‘๋ฐฉํ–ฅ ๋ชจ๋ธ ์•„ํ‚คํ…์ณ ๋ฐ ๊ตฌํ˜„

ยซNeural Networkยป ์–‘๋ฐฉํ–ฅ LSTM ๋ชจ๋ธ์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์ž.

sirzzang.github.io

 

๋ฐ˜์‘ํ˜•