Maximum a Posteriori Probability

MAP(maximum a posteriori probability) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution.

θ_{MAP} = θ argmax (lo g (g (θ)) + i = 1 \sum n lo g (f (X_{i} ∣ θ)))

MLE 와 MAP 비교

공통점

MLE and MAP are method of estimating parameters of statistical models. 구해진 parameter 는 a singe fixed value 이므로 MLE 와 MAP 는 point estimator 로 생각할 수 있다.

차이점

MAP 는 prior 를 고려하여 parameter 를 계산한다.

MAP Examples

Example 1) MAP Estimation for the Binomial Distribution

리버풀이 $n$ 경기에서 $k$ 이길 확률이 Binomial Distribution 을 따르고, 해당 분포가 parameter $θ$ 를 가질때, 다음과 같이 표현될 수 있다.

P (k wins out of n matches ∣ θ) = P (D ∣ θ) = (n k) θ^{k} (1 - θ)^{n - k}

여기서 $D$ 는 관측된 데이터로, 38 경기 중 30 경기를 이겼다고 가정해보자.

Maximum Likelihood Estimation 을 통한 $θ$ 는 likelihood function $P (D ∣ θ)$ 를 $θ$ 에 대해 미분함으로써 계산된다.

\frac{d P ( D ∣ θ )}{d θ} = (n k) (k θ^{k - 1} (1 - θ)^{n - k} - (n - k) θ^{k} (1 - θ)^{n - k - 1}) = (n k) θ^{k - 1} (1 - θ)^{n - k - 1} (k (1 - θ) - (n - k) θ) = 0

위 식을 풀면 $θ$ 는 $\frac{k}{n}$ 일 때, likelihood 함수가 최대화된다. 물론 $θ$ 가 0 또는 1 일 때도 미분의 결과가 0 이 나오지만, 이는 최소값이다.

$k = 30$ 이고, $n = 38$ 이므로 $θ = 0.789$ 이다.
Maximum Likelihood Estimation 는 보다시피 데이터가 큰 경우에는 잘 동작하지만, 데이터가 적으면 제대로 동작하지 않는다.
- $k = 2, n = 2$ 면, 100% ?
MLE 의 단점을 보완하기 위해, MAP 를 이용하여 $θ$ 를 찾아보자.
- 리버풀의 이길 확률 (prior) 은 지난 시즌을 고려했을 때 약 50% 라고 가정하자
- 그리고 이미 전적 ( $D$ ) 을 알고있으므로, 이 두가지를 활용한 $θ$ 를 고려한다고 했을 때, $P (θ ∣ D)$ 라는 posterior 확률을 활용하게 된다.
- MAP estimation 은 이 $P (θ ∣ D)$ 를 최대화 하는 $θ$ 를 찾는 것이다.
- Bayes theorem 에 의해
- $P (θ ∣ D) = \frac{P ( D ∣ θ ) P ( θ )}{P ( D )}$ 로 계산된다.
  - 여기서 $P (D)$ 는 evidence 로, 상수 취급되기 떄문에 $θ$ 를 구하는데 관련이 없어서 생략할 수 있다.
- 계산의 단순성을 위해서 binomial distribution 을 따르는 likelihood $P (D ∣ θ)$ 의 conjugate prior 는 beta distribution 이므로, $P (θ)$ 는 Beta distribution 을 따른다고 설정할 수 있다.

P (θ) = \frac{Γ ( α + β )}{Γ ( α ) Γ ( β )} θ^{α - 1} (1 - θ)^{β - 1}

여기서 $α$ 와 $β$ 는 hyperparameter 다. 이 값들은 data 에 의해 결정되지 않고, 주관적으로 정해서 prior knowledge 를 표현할 수 있다.

	* 이제 최대화해야될 $P(D\mid\theta)P(\theta)$을 계산해보자.

		* $\begin{aligned}P(D\mid\theta)P(\theta)&=\left(\begin{array}{l}n\\k\end{array}\right)\theta^{k}(1-\theta)^{n-k}\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\theta^{\alpha-1}(1-\theta)^{\beta-1}\\&=\left(\begin{array}{l}n\\k\end{array}\right)\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\theta^{k+\alpha-1}(1-\theta)^{n-k+\beta-1}\end{aligned}$

	* 위 식을 $\theta$에 대해 미분하면 $\displaystyle\theta=\frac{k+\alpha-1}{n+\alpha+\beta-2}$가 나온다.

	* 결과적으로 $\alpha=10\text{and}\beta=10$인 경우, 39/56 으로, $\theta=0.696$이 나온다.

Example 2) MAP Estimation for the Bernoulli Distribution

동전 던지기를 할 때, 만약 앞면 또는 뒷면이 연속적으로 나온 상태에서 Maximum Likelihood Estimation 를 적용한 경우, 확률이 $θ = 0$ 또는 $θ = 1$ 와 같은 극단적인 case 가 발생할 수 있다.

이러한 overfitting 을 막기 위해, Beta distribution 을 활용하여 prior 로 설정한다.

p (θ) = Beta (θ ∣ a, b)

즉, log likelihood 더하기 log prior 는 다음과 같다.

LL (θ) = lo g p (D ∣ θ) + lo g p (θ) = [N_{1} lo g θ + N_{0} lo g (1 - θ)] + [(a - 1) lo g (θ) + (b - 1) lo g (1 - θ)]

$N_{1}, N_{0}$ 는 각각 동전의 앞면과 뒷면이 나온 횟수
여기서 log prior 의 Beta function 은 상수이므로 생략할 수 있다

$θ$ 에 대해서 풀면, MAP estimate 는 $θ_{map} = \frac{N _{1} + a - 1}{N _{1} + N _{0} + a + b - 2}$ 값을 얻는다.

Mode 와 MAP 관계

The estimate by MAP is the mode of the posterior distribution.

여기서 posterior distribution 은 Bayesian inference 를 통해 계산할 수 있다.

EAP

Zzong's Notes

탐색기

maximum a posteriori probability

Maximum a Posteriori Probability

MLE 와 MAP 비교

공통점

차이점

MAP Examples

Example 1) MAP Estimation for the Binomial Distribution

Example 2) MAP Estimation for the Bernoulli Distribution

Mode 와 MAP 관계

References

링크된 언급

목차

탐색기

maximum a posteriori probability

Maximum a Posteriori Probability

MLE 와 MAP 비교

공통점

차이점

MAP Examples

Example 1) MAP Estimation for the Binomial Distribution

Example 2) MAP Estimation for the Bernoulli Distribution

Mode 와 MAP 관계

Related

References

링크된 언급

함께 보면 좋은 글

목차