Supervised Fine-tuning

Before we start training reward models and tuning our model with RL, it helps if the model is already good in the domain we are interested in. The easiest way to achieve this is by continuing to train the language model with the language modeling objective on texts from the domain or task.

There is nothing special about fine-tuning the model before doing RLHF - it’s just the causal language modeling objective from pretraining that we apply here.

B) 예시: Stack Exchange Dataset

The StackExchange dataset is enormous (over 10 million instructions), so we can easily train the language model on a subset of it.

We want it to answer questions, while for other use cases, we might want it to follow instructions, in which case instruction tuning is a great idea.

D) References

StackLLaMA: A hands-on guide to train LLaMA with RLHF

Zzong's Notes

탐색기

supervised fine-tuning

Supervised Fine-tuning

B) 예시: Stack Exchange Dataset

D) References

링크된 언급

목차

탐색기

supervised fine-tuning

Supervised Fine-tuning

B) 예시: Stack Exchange Dataset

C) Related

D) References

링크된 언급

함께 보면 좋은 글

목차