softmax function

링크된 언급

...put 을 생성할때 encoder 에서 건네준 hidden state 를 이용하는데, 계산하려는 output 이 각 hidden state 와 얼마나 관련이 있는지 score 를 계산한다 (softmax). 그리고 계산된 score 를 hidden state 에 적용하고 (vector dot), 이렇게 가중치가 적용된 state 들을 하나로 더하여 context vector 로 완성시킨다....

embedding matrix

마지막으로, softmax function output 중 target word 의 확률을 높이는 방향으로 모델을 학습시킵니다.

GPT-2

dropout 은 왜 softmax 이후에 적용하는 걸까? GPT 모델에서 cheating 방지를 위해 masking 하는 방식은 아직도 이해를 잘 못하겠음.

hierarchical softmax

Hierarchical Softmax Hierarchical Softmax 는 softmax function 를 전체로 계산하기 보다는 Tree 구조로 Hierarchical 하게 Softmax 를 계산하는 방법이다.

logit

주로 logit 은 Normalization 을 위해 softmax function 의 입력값으로 사용된다.

negative sampling

D) Related hierarchical softmax E) References

Recurrent Neural Network

위와 같은 확률들은 softmax function 으로 표현되며, loss function 은 아래와 같이 cross-entropy 형태로 계산된다.

transformer

... 토큰을 잘 설명할 수 있는 시퀀스 내 또 다른 토큰을 찾는다. 물론 자기 자신에 대한 내적값도 계산한다 (e.g. q^<3> · k^<3>). 이후 내적값 (정확히는 softmax 값) 에 토큰 별 value vector v 를 곱한 뒤 모두 더하면, 해당 벡터가 attention value 를 표현하는 vector 가 된다. 각 토큰의 attention vector...

Zzong's Notes

탐색기

softmax function

Softmax Function

B) Log-sum-exp Trick (lse function)

D) References

링크된 언급

목차

탐색기

softmax function

Softmax Function

B) Log-sum-exp Trick (lse function)

C) Related

D) References

링크된 언급

함께 보면 좋은 글

목차