교보문고

학술논문

예측모형의 오차분산 추정에 관한 모의실험 연구

이용수 63

영문명: A Simulation Study on Prediction Error Variance Estimation
발행기관: 한국자료분석학회
저자명: 이성건(Seong-Keon Lee)
간행물 정보: 『Journal of The Korean Data Analysis Society (JKDAS)』Vol.22 No.6, 2383~2390쪽, 전체 8쪽
주제분류: 자연과학 > 통계학
파일형태: PDF
발행일자: 2020.12.30

4,000원

구매일시로부터 72시간 이내에 다운로드 가능합니다.
이 학술논문 정보는 (주)교보문고와 각 발행기관 사이에 저작물 이용 계약이 체결된 것으로, 교보문고를 통해 제공되고 있습니다.

1:1 문의

국문 초록

최근 빅데이터에 대한 관심이 증가하여 통계적 분석방법론 뿐만 아니라 기계학습 방법론도 널리 사용되어지고 있다. 통계적 분석방법론은 수리적 배경을 바탕으로 다양한 상황의 오차에 대한 추론과정을 제시하고 있으나, 상대적으로 기계학습 분야에는 상대적으로 강조되고 있지 않다. 기계학습 방법론에는 인공신경망모형(neural network), 의사결정나무모형(decision tree), SVM(support vector machine), 배깅(bagging)을 비롯한 랜덤포레스트(random forest) 등이 널리 활용되고 있다. 이러한 알고리즘에 기반한 방법론들은 기저분포를 비롯한 오차항에 관한 추론에는 관심을 두지 않았던 것이 사실이다. 다행히 최근 이러한 방법론들에 대한 예측오차와 그에 대한 분산추정에 관한 연구가 활발하게 이루어지고 있다. 본 연구는 오차분산 추정을 위하여 OOB(out of bag) 방법을 이용한 평균제곱예측오차(MSPE, mean squared prediction error)를 살펴보고 편의(bias)를 확인 하였다. 이러한 편의를 보정하기 위하여 붓스트랩을 이용한 추정방법을 제시하고 이를 설명변수들의 다양한 공분산 구조와 함수에 대한 모의실험을 통해 효율성을 비교하였다. 기계학습방법론으로는 랜덤포레스트를 이용하였으며 분석 결과 제안한 편의 보정 방법은 복잡한 함수보다는 다항함수를 비롯한 비교적 단순한 함수에서 더 효율적임을 확인하였다.

영문 초록

Recently, as interest in big data analysis has increased, not only statistical analysis methodology but also machine learning has been widely used. Statistical analysis methodology suggests processes based on a mathematical background, but it can be said that relatively less emphasis is placed on the field of machine learning. Machine learning methodologies include a neural network, a decision tree, a SVM, and a random forest including bagging. It is true that methodologies based on these algorithms did not pay attention to inferences about the prediction error terms including the distribution assumptions. Recently, studies on prediction error and variance estimation for these methodologies have been actively conducted. This study examined the mean squared prediction error (MSPE) using the out of bag (OOB) method and bias. To adjust the bias, an estimation method using bootstrap was proposed and the efficiency was compared through simulation. Random forest was used as the machine learning methodology, and as a result of analysis, the proposed bias correction method was more efficient in simple polynomial models than in complex models.

키워드

기계학습모형 붓스트랩 랜덤포레스트 평균제곱예측오차 machine learning bootstrap random forest mean squared prediction error

국문 초록

영문 초록

목차

키워드

해당간행물 수록 논문

참고문헌

관련논문

자연과학 > 통계학분야 BEST

자연과학 > 통계학분야 NEW

최근 이용한 논문

APA

MLA