교보문고

학술논문

불균형 데이터의 분류성능 향상을 위한 데이터 리샘플링 효과

이용수 34

영문명: Data resampling effect to improve the classification performance of imbalanced data: a case study on financial data
발행기관: 한국자료분석학회
저자명: 권미지(Mi Ji Kwon) 강현철(Hyuncheol Kang)
간행물 정보: 『Journal of The Korean Data Analysis Society (JKDAS)』Vol.26 No.3, 783~794쪽, 전체 12쪽
주제분류: 자연과학 > 통계학
파일형태: PDF
발행일자: 2024.06.30

4,240원

구매일시로부터 72시간 이내에 다운로드 가능합니다.
이 학술논문 정보는 (주)교보문고와 각 발행기관 사이에 저작물 이용 계약이 체결된 것으로, 교보문고를 통해 제공되고 있습니다.

1:1 문의

국문 초록

머신러닝의 대상이 되는 여러 유형의 데이터에서는 불균형 데이터가 자주 발견된다. 불균형 데이터(imbalanced data)는 사기 탐지, 허가되지 않은 네트워크 침입 탐지, 장애 탐지, 의료 진단 등 다양한 분야에서 찾아볼 수 있으며, 데이터에 불균형 문제가 있으면 학습 단계에도 영향을 미쳐서 학습 모형의 분류성능을 떨어뜨린다고 알려져 있다. 불균형 데이터 문제를 완화하기 위한 기법으로는 데이터의 분포가 높은 클래스를 낮은 클래스만큼 맞추는 언더샘플링(undersampling) 기법과 분포가 낮은 클래스를 높은 클래스만큼 맞추는 오버샘플링(oversampling) 기법이 있다. 본 연구에서는 불균형 데이터에 대한 분류성능을 향상하기 위해, 여러 가지 데이터 리샘플링 기법을 다양한 분석방법에 적용한 후 분류성능을 비교함으로써 불균형 데이터의 문제를 완화할 수 있는지를 파악하고자 한다. 이를 위해 불균형 데이터의 문제를 완화할 수 있는 오버샘플링 기법과 언더샘플링 기법에 대하여 간단히 소개하고, 다양한 분석방법에 따른 데이터 리샘플링 기법의 성능 비교를 하기 위해 데이콘(DACON)에서 제공된 금융 데이터를 활용하여 사례분석을 진행하였다.

영문 초록

Imbalanced data is often found in many types of data subject to machine learning. Imbalanced data can be found in various fields such as fraud detection, unauthorized network intrusion detection, failure detection, and medical diagnosis, and it is known that if there is an imbalance problem in data, it affects the learning stage and reduces the classification performance of the learning model. Techniques to alleviate the imbalance data problem include an under-sampling technique that matches a class with a high distribution of data by a low class and an over-sampling technique that matches a class with a low distribution by a high class. In order to improve the classification performance of imbalanced data, this study aims to determine whether the problem of imbalanced data can be alleviated by applying various data resampling techniques to various analysis methods and then comparing the classification performance. To this end, an oversampling technique and an undersampling technique that can alleviate the problem of unbalanced data were briefly introduced, and a case analysis was conducted using financial data provided by DACON to compare the performance of the data resampling technique according to various analysis methods.

키워드

불균형 데이터 리샘플링 Imbalanced data Resampling SMOTE Tomek Links XGBoost

국문 초록

영문 초록

목차

키워드

해당간행물 수록 논문

참고문헌

관련논문

자연과학 > 통계학분야 BEST

자연과학 > 통계학분야 NEW

최근 이용한 논문

APA

MLA