차원 축소 전략 중 하나인 Linear Discriminant Analysis

LDA vs PCA
- (1) PCA
- (2) LDA
Implementation
References

LDA (Linear Discriminant Analysis) and PCA (Principal Component Analysis) are both dimensionality reduction techniques, but they serve different purposes.

LDA vs PCA

(1) PCA

PCA

PCA 는 데이터의 분산을 최대화하는 방향 (principal components) 을 찾는다.
- 쉽게 생각하면, 데이터들을 특정 축(axis, principal component)에 사영(projection)했을때, 가장 높은 분산을 가지는 축을 찾아 그곳으로 차원을 축소한다.
간단한 방식이지만, 데이터의 클래스 정보를 고려하지 않는다는 문제가 있다.
주로 데이터 분석 및 노이즈 제거를 위해 사용된다.
단점: 종종 feature 의 variance 가 낮으면서 중요한 경우가 있는데, PCA 는 이를 고려하는데 어려움을 겪는다.

(2) LDA

LDA 는 데이터의 class label 을 고려하는 supervised 방식이다.
데이터의 여러 클래스 간 분리 정도(separation)를 최대화하는 방향을 찾는다.
LDA 는 주로 클래스 정보를 보존하기 위해서 분류 task 를 해결할 때 사용된다.
단점: 클래스 분포가 불균형적인 경우, 특정 클래스에 편향(bias)된 결과를 내기 쉽다.

Implementation

sklearn 에 이미 구현된 함수가 있다.

import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])

clf = LinearDiscriminantAnalysis()
clf.fit(X, y)

print(clf.predict([[-0.8, -1]])) # 1

LDA vs PCA

(1) PCA

(2) LDA

Implementation

References

Enjoy Reading This Article?