Llama

B) Llama 2

모델 사이즈

  • 7, 13, 34, 70 billion and a Llama chat variant with the same sizes

B.1) 기존 모델과 비교

  • increased the size of the pretraining corpus by 40%
  • doubled the context length of the model to 4k
  • adopted grouped-query attention

B.2) 학습 방식

RLHF 가 모델 성능에 매우 중요하다고 강조

  • Uses two reward models to avoid the safety-helpfulness tradeoff identified in Anthropic’s work.
  • Uses a two-stage RLHF approach: starting with Rejection Sampling, then doing Rejection Sampling + Proximal Policy Optimization (PPO)

B.3) Text Generation

  • Temperature: 창의적인 답변을 원한다면 temperature 를 올려야 한다.

C) Tokenizer

The model is not learning to predict the EOS token.

LLaMA FastTokenizer does not add `eos_token_id` at the end. · Issue #22794 · huggingface/transformers · GitHub

D) Related

E) References