교보문고

학술논문

불균형 시계열 자료를 위한 분류 알고리즘 적용방안

이용수 244

영문명: Applications of the classification algorithm for unbalanced time series data: Focusing on the corporate default model
발행기관: 한국자료분석학회
저자명: 조용복(Yongbok Cho) 조동우(Dongwoo Cho) 최보승(Boseung Choi)
간행물 정보: 『Journal of The Korean Data Analysis Society (JKDAS)』Vol.24 No.2, 639~651쪽, 전체 13쪽
주제분류: 자연과학 > 통계학
파일형태: PDF
발행일자: 2022.04.30

4,360원

구매일시로부터 72시간 이내에 다운로드 가능합니다.
이 학술논문 정보는 (주)교보문고와 각 발행기관 사이에 저작물 이용 계약이 체결된 것으로, 교보문고를 통해 제공되고 있습니다.

1:1 문의

국문 초록

기업 부도모형에 사용되는 데이터는 정상기업이 부도기업에 비해 압도적으로 많이 관측되는 대표적인 불균형 데이터이며, 과거 및 현재의 재무변수를 사용하여 부도 사건을 예측하는 시계열 데이터의 특성을 지니고 있다. 따라서 예측모형을 구축하는데 있어 이러한 데이터의 불균형 문제와 시계열적 특성의 반영에 각별한 주의가 필요하다. 본 연구는 부도 예측모형 구축과정에서 고려해야 하는 불균형 자료 해소와 시계열 자료의 특성이 반영된 모형 검증방안에 대한 비교연구를 진행하였다. 실증분석을 위해 한국거래소에 상장된 기업을 대상으로 부도모형을 구축하였고, 모형의 예측 성능을 비교분석 하였다. 이를 통하여 첫째, 학습데이터의 불균형 문제는 oversampling기법을 고려할 때 예측모형의 일반화 성능을 확보할 수 있음을 확인하였다. 하지만, oversampling기법 간의 성능 차이는 뚜렷하게 나타나지 않았다. 둘째, 일반적으로 많이 사용하는 k-fold 교차검증과 전진교차검증을 비교한 결과 시간의 흐름에 대한 고려 없이 추정된 모형을 사용하였을 때 그 예측 성과가 과대 추정될 수 있음을 확인하였고, 이를 통해 시계열 데이터에 대한 전진교차검증의 필요성을 보였다.

영문 초록

The data used in the corporate default model is imbalanced; observation of non-default is overwhelmingly more prominent than the default case. Additionally, the data also have the characteristics of time series, i.e., the default events in the future are predicted by lagged financial statements. Therefore, We must consider the characteristics of imbalanced and time-series in the modeling process for default events. In the paper, we studied the model process with imbalance problems and model validation for time-series. We conducted empirical analysis on corporates listed on the Korea Exchange to construct a default model and compared the prediction performance by applying various machine learning classification algorithms. First, we confirmed that the problem of imbalance in train data could secure the generalization performance of the prediction model only when the oversampling method is considered. However, there is no significant difference in predicting performance between the oversampling methods. Second, we compared the k-fold cross-validation and time-series cross-validation. Moreover, we confirmed that the model could overestimate prediction performance without considering a timely manner. Thus, We confirmed the necessity of time-series cross-validation for the classification model using the time series data.

키워드

기업 부도모형 불균형 데이터 시계열 데이터 SMOTE 전진교차검증 Corporate default model Forward cross validation Imbalanced data Time series data SMOTE

국문 초록

영문 초록

목차

키워드

해당간행물 수록 논문

참고문헌

관련논문

자연과학 > 통계학분야 BEST

자연과학 > 통계학분야 NEW

최근 이용한 논문

APA

MLA