paper Link: https://arxiv.org/pdf/1205.2606.pdf

Exploring Compact Reinforcement-learning Representations with Linear Regression

KWIK Linear Regression
- KWIK (Knows What It Knows) is a framework for studying supervised learning algorithms and was designed to unify the analysis of model-based reinforcement-learning algorithms.
- Formally, a KWIK learner operates over an input space $X$ and an output space $Y$ . At every time step $t$ , an input $x_{t} \in X$ is chosen and presented to the learner.
- If the learner can make an accurate prediction on this input, it can predict $y_{t}$ , otherwise it must admit it does not know by returning $⊥$ (“I don’t know”), allowing it to see the true $y_{t}$ or a noisy version $z_{t}$ .
  - $z_{t} \in R$ and $y_{t} \in R$
- An algorithm is said to be KWIK if and only if, with high $(1 - δ)$ probability, $∥ y_{t} - y_{t} ∥ < ϵ$ and the number of $⊥$ s returned over the agent’s lifetime is bounded by a polynomial function over the size of the input problem.
- One of the first uses of the KWIK framework was in the analysis of an online linear regression algorithm used to learn linear transitions in continuous state MDPs.
  - This algorithm uses the least squares estimate of the weight vector for inputs where the output is known with high certainty.
    - Certainty is measured by two terms representing (1) the number and proximity of previous samples to the current point and (2) the appropriateness of the previous samples for making a least squares estimate.
  - When certainty is low for either measure, the algorithm reports $⊥$ .
- Some notation
  - Let $X := {x \in R^{n} ∣ ∥ x ∥ \leq 1}$ , and let $f : X \to R$ be a linear function with slope $θ^{*} \in R^{n}, ∥ θ^{*} ∥ \leq M$ , i.e. $f (x) := x^{T} θ^{*}$ .
  - Fix a timestep $t$ .
  - For each $i \in {1, \dots, t}$ , denote the stored samples by $x_{i}$ , their (unknown) expected values by $y_{i} := x_{t}^{T} θ^{*}$ , and their observed values by $z_{i} := x_{i}^{T} θ^{*} + η_{i}$
    - where the noise $η_{i}$ is assumed to form a martingale, i.e. $E (η_{i} ∣ η_{1}, \dots, η_{i - 1}) = 0$ , and bounded: $∣ η_{i} ∣ \leq S .$
  - Define the matrix $D_{t} := [x_{1}, x_{2}, \dots, x_{t}]^{T} \in R^{t \times n}$ and vectors $y_{t} := [y_{1}; \dots; y_{t}] \in R^{t}$ and $z_{t} := [z_{1}; \dots; z_{t}] \in R^{t}$ , and let $I$ be an $n \times n$ identity matrix.
- Suppose that a new query $x$ arrives. If we were able to solve the linear regression problem $D_{t} θ = z_{t}$ , then we could predict $y = x^{T} θ$ , where $θ$ is the least-squares solution to the system.
  - However, solving this system directly is problematic because
    1. If $D_{t}$ is rank-deficient the least-squares solution may not be unique.
    2. Even if we have a solution, we have no information on its confidence.
  - We can avoid the first problem by regularization, i.e. by augmenting the system with Iθ = ~v, where ~v is some arbitrary vector. Regularization certainly distorts the solution, but this gives us a measure of confidence: if the distortion is large, the predictor should have low confidence and output ⊥. On the other hand, if the distortion is low, it has two important consequences. First, the choice of ~v has little effect, and second, the fluctuations caused by using ~zt instead of ~yt are also minor.

A Contextual-Bandit Approach to Personalized News Article Recommendation

Zzong's Notes

탐색기

Exploring compact reinforcement-learning representations with linear regression

Exploring Compact Reinforcement-learning Representations with Linear Regression

C) References

목차

탐색기

Exploring compact reinforcement-learning representations with linear regression

Exploring Compact Reinforcement-learning Representations with Linear Regression

B) Related

C) References

함께 보면 좋은 글

목차