Support Vector Machine

본 글은 An Introduction to Statistical Learning의 9 Support Vector Machines 챕터를 정리한 것입니다.

SVM Classification approach : Maximal Margin Classifier의 generalized version

Maximal Margin Classifier: Linear boundary 필요로 함: 모든 데이터셋에 적용 어려워
Support Vector classifier: MMC의 확장판ㅡ 보다 다양한 데이터들을 다룰 수 있게 됨
Support Vector Machine: SVC의 확장, non-linear class boundary 또한 다룰 수 있게 됨

9.1 Maximal Margin Classifier

9.1.1 What is a Hyperplane?

p-dimensional place 에서의 hyperplane: flat affine subpace of dimension p1
- 어떤 공간이 있을 때 이 공간의 한 점을 통과하는 해집합 → p-1차원의 공간 형성
- affine subspace: 원점이 어딘지 모르는 벡터 공간
⇒ Hyperplane으로 P-dimensional space를 두 공간으로 분할할 수 있다.

9.1.2 Classification Using a Separating Hyperplane

nXp matrix → (x11 ~ x1p) * n

위 데이터를 두 가지 클래스로 분류하는 문제를 해결하고자 할 때 사용할 수 있는 방법들
- LDA
- linear regression
- separating hyperplane
  - hyperplane 식 > 0, yi = 1
  - hyperplane 식 < 0, yi = 1
  - hyperplane이 존재하기만 하면 가장 자연스러운 분류기로 활용할 수 있다. test data를 적용해 서 hyperplane을 기준으로 어느 subspace에 속하는지를 토대로 예측도 수행할 수 있다.
  - f(x*) 값의 크기를 이용할 수도 있음: 그 값이 클수록 hyperplane으로부터 멀리 떨어져 있는 것이므로 classification에보다 큰 confidence를 보장한다.
  - 반대로 그 값이 작으면 (0과 가까우면) 해당 클래스로 분류하는것에 대한 확신이 줄어든다.
  ⇒ separating hyperplane을 기반으로 한 분류기 : Linear decision boundary를 형성함

9.1.3 The Maximal Margin Classifier

두 개의 클래스를 정확히 분류하는 hyperplane: 무한히 많이 만들 수 있다.
⇒ 왜냐! 관측값 데이터 만나지 않도록 조금 옆으로 가고 돌리고 하면 모두 다른 hyperplane을 만들 수 있기 때문

그중에서 natural selection: Maximal margin hyperplane (MMhyperplane, optimal separating hyperplane)
- margin을 가장 크게 만드는 hyperplane
  - p가 크면 overfit 될 가능성이 더 높아진다.
- support vectors: p 공간에 있으면서 maximal margin hyperplane에 동일한 거 리에 존재하는 관측치를 의미한다.
  - MMhyperplane 또한 직접적으로 의존하고 있다.
  - Hyperplane은 따라서 전체 데이터 중에서 support vector에 해당하는 몇 개의 관측치로 정의될 수 있음 : 이후에 classifier, machine으로 논의가 확장된다.

9.1.4 Construction of the Maximal Margin Classifier

optimization problem의 solution: Maximal Margin hyperplane
- 9.10: hyperplane에 대한 constraint는 아니나, hyperplane=0 식에 의해 K 곱해도 0 모든 0이 아닌 k에 대해서 동일한 식이 성립
- 9.11: 모든 관측치들이 동일한 위치의 hyperplane side에 존재한다 는 것을 보장 (M이 양수일 때)
- 9.10, 9.11 같이 → 모든 관측치들이 정확한 위치에 있으며 hyperplane 과 관측치 사이의 최소한의 margin size M

optimization problem: to maximize M → exact definition of the MM hyperplane
- Constraints

Constraints에 의해 Maximal margin classifier에서 모든 데이터들이 hyperplane을 기준으로 참인 위치로 분류될 수 있다.
- M margin of hyperplane
- Optimization problem: Maximize M

9.1.5 The Non-separable Case

very natural way! (자연스러운 케이스 임)
모든 관측치가 single optimal hyperplane에 의해 구분되어야 하는 것이 문제
⇒ soft margin의 도입 → All 아니라, almost separate, 약간의 violation을 허용한 형태로 확장 (다음 절로 이동)

9.2 Support Vector Classifiers

9.2.1 Overview of the Support Vector Classifier

실제 데이터상에서는 정확히 두 클래스로 나눌 수 없는 데이터가 존재하는 상황이 더욱 빈번하다.
이전 단계의 방법(Maximal margin classification)은 sensitivity to individual observations → 관측치 하나만 추가되어도 dramatic change 유발할 수 있다.
그리고 그 결과: 완벽히 분류는 하지만 아주 작은 margin을 가지는 hyperplane을 최종 결과로 내게 되고, 이것은 데이터의 변화에 민감한 모델임을 의미한다. 그리고 데이터에의 오버피팅의 가능성도 높아진다.
이에 따라 perfectly 하기보다 optimal 한 hyperplane 찾는 문제로 문제를 수정하게 된다.
- Greater robustness to individual observations
- better classification of most of the training observations
⇒ 몇 개를 misclassify 하는 것이 이득일 수 있음 ! → seeking largest possible margin

⇒ SVC, soft margin classifier (soft: violation 허용한다는 뜻)

wrong side margin: support vector 보다 Margin 보다) Hyperplane에 가까이 있다.
wrong side hyperplane: 잘못된 클래스로 분류되었다.

9.2.2 Detailes of the Support Vector Classifier

SVC : classify test observation depending on which side of a hyperplane it lies
optimization problem 9.12~9.15

C nonnegative tuning parameter
M width of the margin
ei : slack variables, allow individual observations to be on the wrong side

추가 파라미터 / variable에 대한 세부 정보
- slack variable ei: i번째 관측치의 위치
  - ei > 0 : wrong side of the margin
  - ei > 1 : wrong side of the hyperplane
- tuning parameter C : ei sum과 결합 → number of severity of the violations to the margin
  - budget for the amout that the margin can be violated by the N observations
  - C = 0, no violation allowed → 모든 ei = 0 → MMhyperplane
  - tuning parameter
    - bias - variance trade-off 조절
    - C larger : margin wider, allow more violations , more biased but lower variance
    - C의 크기에 따라 다르게 측정되는 Support vector classifier

SVC 요약
- hyperplane에 영향 미치는 관측값: margin 내에 존재하거나, violate the margin 인 관측치들
- ⇒ 여전히 strict 하게 구분되는 데이터들은 hyperplane 결정에 영향을 주지 않는다. ⇒ margin 내에 존재하는 관측치 : support vectors → support vector classifier
- SVC based only a small subset of the training observations → quite robust to the behavior 여전히 데이터 중 일부만 hyperplane 결정에 영향을 준다는 점에서 다른 분류 기법과 구분됨
  - LDA 모든 관측치 평균 → 가지고있는 모든 데이터가 분류 모델에 영향을 줌

9.3 Support Vector Machines

→ Automatic way

9.3.1 Classification with Non-linear Decision

그동안 본 것들: linear boundary 보이는 class 들을 분류할 때 활용될 수 있다.

→ linear boundary 가 적용되지 않는 문제의 경우?: nonlinear class boundary?

Feature space 를 확장하는 방식으로 해결
- quadratic, cubic terms, or even higher order polynomial 등의 방식 활용
- enlarged feature space: original space에서 solution은 non-linear 하다

9.3.2 The Support Vector Machine

SVC의 확장판! → feature space를 확장하는 방식으로!
kernel 이용
class 간의 non-linear boundary를 Kernel의 형태로 치환; 컴퓨팅 리소스 활용 측면에서 이점을 가지게 된다.

Non-linear kernel을 활용한 support vector classifier → Support vector machine
- kernel의 활용으로 일반화 가능하다.

두 가지 관측치의 유사도를 수치화해서 나타내는 함수를 달리 활용하여 다양한 형태의 boundary를 얻을 수 있다.
- 선형 (Linear), Linear kernel: SVC, linear
- 다항식(Polynomial)
- 가우시안 RBF (rbf)
- 시그모이드 (sigmoid)

9.3.3 An Application to the Heart Disease Data

→ more flexible less training error

→ Flixible 모델이 테스트 데이터에 대한 좋은 성능을 항상 보장한다고 할 수 없다.

9.4 SVMs with More than Two Classes

9.4.1 One-Versus-One Classification

모든 클래스 중에서 두 개 클래스 씩 뽑아서 분류 모델 만들기 kC2
예측: KC2가지 분류 결과 중 가장 빈번하게 포함된 결과 생성하게 됨

9.4.2 One-Versus-All Classification

한 개 클래스 / 나머지 전체 클래스 분류를 반복
k가지 모델

9.5 Relationship to Logistic Regression

SVM의 등장 초기에는 큰 주목을 받지 못하였으나, 1990 중반 필기 숫자 인식과 같은 실용적인 응용에서 우수한 일반화 능력이 입증되어 패턴인식, 기계학습분야 연구자들의 뜨거운 주목을 받았다.
이때부터, SVM과 다른 전통적인 통계적 분류 기법들의 deep connection들이 연구되어오고 있는데, 그중 한 가지로 (9.12)-(9.15)의 식을 non negative tuning parameter (람다).. 를 포함하는 식 (9.25)로 재정의하여 다시 쓸 수 있다.

람다가 크면: coefficients 가 작고 더 많은 violation을 허용하며, low variance, high bias classifier
람다가 작으면: coefficients 가 크고, 더 적은 violation을 허용하며, high variance, low bias classifier→ loss + penalty 식으로 정리할 수 있음

→ 람다 붙은 ridge penalty : 6장에서 shrinkage penalty

→ loss + penalty 식으로 정리할 수 있음

loss function: hinge loss인데... → logistic regression에서의 loss function과 유사하다 (fig. 9.12)
유사한 loss function 가지면서도 특이한 것은 SVM 은 classifier 정의하는 데 support vector만 관여하니까 → hyperplane 값이 1 이상인 것들의 경우 loss 가 0
반대로 logistic regression에서는 어디에서도 정확히 loss 가 0인 x는 없다.
C 별로 안 중요한 줄 알았는데 알고 보니 중요하더라 → bias-variance, overfit 등을 조절할 수 있다.
feature space를 확장시켜서 non-linear problem을 해결하는 방법이 SVM뿐만 있는 것은 아니나, 주로 SVM이 그런 데에 활용된다.

'Concepts' 카테고리의 다른 글

복제수변이(Copy Number Variants) 이해하기 (0)	2022.09.29
Linear Model Selection and Regularization (0)	2022.09.18
Methylome sequencing data 처리 워크플로우 (0)	2021.12.19
불균형 데이터 처리를 위한 7가지 Over Sampling 기법들 (0)	2021.10.09
Understanding P-value from CAFE results (0)	2021.08.10

Bioinformatics and Evolution

Support Vector Machine

9.1 Maximal Margin Classifier

9.1.1 What is a Hyperplane?

9.1.2 Classification Using a Separating Hyperplane

9.1.3 The Maximal Margin Classifier

9.1.4 Construction of the Maximal Margin Classifier

9.1.5 The Non-separable Case

9.2 Support Vector Classifiers

9.2.1 Overview of the Support Vector Classifier

9.2.2 Detailes of the Support Vector Classifier

9.3 Support Vector Machines

9.3.1 Classification with Non-linear Decision

9.3.2 The Support Vector Machine

9.3.3 An Application to the Heart Disease Data

9.4 SVMs with More than Two Classes

9.4.1 One-Versus-One Classification

9.4.2 One-Versus-All Classification

9.5 Relationship to Logistic Regression

'Concepts' 카테고리의 다른 글

티스토리툴바

Support Vector Machine

9.1 Maximal Margin Classifier

9.1.1 What is a Hyperplane?

9.1.2 Classification Using a Separating Hyperplane

9.1.3 The Maximal Margin Classifier

9.1.4 Construction of the Maximal Margin Classifier

9.1.5 The Non-separable Case

9.2 Support Vector Classifiers

9.2.1 Overview of the Support Vector Classifier

9.2.2 Detailes of the Support Vector Classifier

9.3 Support Vector Machines

9.3.1 Classification with Non-linear Decision

9.3.2 The Support Vector Machine

9.3.3 An Application to the Heart Disease Data

9.4 SVMs with More than Two Classes

9.4.1 One-Versus-One Classification

9.4.2 One-Versus-All Classification

9.5 Relationship to Logistic Regression

'Concepts' 카테고리의 다른 글

'Concepts' Related Articles

티스토리툴바