Abstract
해결하려는 문제: 강화학습에서의 효율적인 exploration
Randomized value functions offer a promising approach to efficient exploration with generalization, but existing algorithms are not compatible with nonlinearly parameterized value functions.
B) Introduction
randomized value functions can implement something similar to Thompson sampling without the need for an intractable exact posterior update.