교보문고

학술논문

소스 코드 취약점 탐지를 위한 서브워드 토큰화 기반의 딥러닝 모델

이용수 43

영문명: Deep Learning Models Based on Subword Tokenization for Vulnerability Detection of Source Code
발행기관: 글로벌경영학회
저자명
간행물 정보: 『글로벌경영학회지』글로벌경영학회지 제19권 제3호, 47~64쪽, 전체 18쪽
주제분류: 경제경영 > 경영학
파일형태: PDF
발행일자: 2022.06.30

4,960원

구매일시로부터 72시간 이내에 다운로드 가능합니다.
이 학술논문 정보는 (주)교보문고와 각 발행기관 사이에 저작물 이용 계약이 체결된 것으로, 교보문고를 통해 제공되고 있습니다.

1:1 문의

국문 초록

웹 애플리케이션은 오픈 액세스 특성으로 인해 외부 공격에 취약할 수 있기 때문에 소스 코드의 취약점 탐지에 대한 연구는 산업계와 학계에서 주목받고 있다. 본 연구는 소스코드 취약점 탐지 분야에서 딥러닝 모델을 구축하고 성능을 평가하는 것을 목적으로 한다. 제안된 딥러닝 모델은 소스 코드 취약점을 감지하는 데 있어 어려운 문제인 클래스 불균형 문제, 장기 종속성 문제, 어휘 외 문제에 대한 해결책을 제시하였다. 실험 결과, 서브워드토큰화 기반 1차원 컨볼루션 모델의 정밀도는 39%로 우연으로 예측되는 모델의 정밀도인 1.92%보다 약 20배 높은 정확도를 보였다.

영문 초록

The study of vulnerability detection in source code has been attracting attention in practice and academia because web applications can be vulnerable to attacks from the outside due to the open access characteristics. This study aims to build deep learning models and evaluate their performances for the field of source code vulnerability detection. The proposed deep learning models tackle class imbalance problem, long-term dependency problem, and out-of-vocabulary problem which are challenging problems in detecting source code vulnerabilities. As an experiment result, the precision of the subword tokenization-based one-dimensional convolution model showed 39%, which is about 20 times higher than the expected precision of 1.92% of the model predicted by chance. Although Conv1d+BT model using the BERT tokenizer showed the highest AUC value of 0.9116, the precision and recall of this model were 0.39 and 0.35, so it is judged that further improvement is needed for practical application.

키워드

취약점 탐지 소스코드 서브워드 토큰화 딥러닝 모델 1차원 컨볼루션 모델 Vulnerability detection Source code Subword tokenization Deep learning model One-dimensional convolution model

국문 초록

영문 초록

목차

키워드

해당간행물 수록 논문

참고문헌

관련논문

경제경영 > 경영학분야 BEST

경제경영 > 경영학분야 NEW

최근 이용한 논문

APA

MLA