DistributedDataParallel

|600

a batch is sent to each GPU worker which has its own copy of the model. There the gradients are computed and then aggregated to update the model on each worker.

B) `local_rank`

torch.cuda.set_device(args.local_rank)

한개의 컴퓨터에서 연산을 진행하는 경우, local 이라는 키워드가 붙는다.
- 만약 연산하는 컴퓨터가 여러개라면 global 이다.
rank 는 컴퓨터의 process id 라고 생각하면 된다.
- 기본적으로 여러개 컴퓨터를 사용하는 구조가 포함되어있는 패키지이기 때문에 한개의 컴퓨터에서 쓴다면 그냥 0 이라고 쓰면 된다.

D) References

[PYTORCH] DistributedDataParallel이란? - Nvidia APEX로 구현하기 :: PAINTYCODE

링크된 언급

Trainer(huggingface)

batchsᵢze gradientaccumulatᵢoₙsteps 이 목적하는 실제 배치 사이즈로 계산됨 여기서 멀티 GPU 를 사용해서 학습하게 된다면, DDP 에 의해 계산되는 배치 사이즈 수가 GPU 개수만큼 늘어나게 된다. 최종적으로 1 training epoch 를 수행하기 위해 필요한 steps 은 다음과 같다: steps = numexamp...

Zzong's Notes

탐색기

DistributedDataParallel

DistributedDataParallel

B) `local_rank`

D) References

링크된 언급

목차

탐색기

DistributedDataParallel

DistributedDataParallel

B) local_rank

C) Related

D) References

링크된 언급

함께 보면 좋은 글

목차

B) `local_rank`