policy

state-value function

...e function v π(s) 이라 부른다. 이는 state s 에서 시작할 때 얻을 수 있는 expected discounted return 값을 의미한다. 그리고 MDP 에서 이 함수는 policy 를 위해 정의된 것이다. policy 에 의해 전이 확률 행렬이 결정되는 것을 고려해볼 때, policy 를 변경한다는 것은 결국 다른 state-value function 을 가진다는 것을...

Zzong's Notes

탐색기

policy

Policy

References

링크된 언급

목차

탐색기

policy

Policy

Related

References

링크된 언급

함께 보면 좋은 글

목차