교보문고

학술논문

KFL 작문 자동 채점에서 문서 유사도를 활용한 주제 이탈 텍스트 자동 검출 방안 연구

이용수 9

영문명: A Study on Automatic Detection of Off-Topic Texts in KFL Automatic Writing Assessment Using Document Similarity
발행기관: 우리말학회
저자명: 강혜림 백재파 공태수 윤주희
간행물 정보: 『우리말연구』제80집, 89~111쪽, 전체 23쪽
주제분류: 인문학 > 언어학
파일형태: PDF
발행일자: 2025.01.31

5,560원

구매일시로부터 72시간 이내에 다운로드 가능합니다.
이 학술논문 정보는 (주)교보문고와 각 발행기관 사이에 저작물 이용 계약이 체결된 것으로, 교보문고를 통해 제공되고 있습니다.

1:1 문의

국문 초록

본 연구의 목적은 KFL 작문 평가에서의 주제 이탈 텍스트 검출방안을 마련하는 데 있다. 이를 위해 국립국어원에서 제공하는 한국어 학습자 말뭉치를 활용하여 150편의 텍스트로 이루어진 훈련 데이터 세트를구축하였다. 훈련 데이터 세트를 활용하여 코사인 유사도, 유클리디안 유사도, 맨해튼 유사도 모델을 만들었다. 이후 훈련 데이터 세트와 같은 주제로 구성된 시험 데이터 세트(30)와 훈련 데이터 세트와 다른 주제로구성된 시험 데이터 세트(30)를 만들고, 각 모델의 예측 결과를 비교하였다. 혼동 행렬로 분석한 결과 코사인 유사도 모델이 가장 성능이 우수하다고 나타났으며(정확도 .98, 정밀도 1.00, 재현율 .96, F-1 Score .97) 그 다음으로는 유클리드 유사도(정확도 .81, 정밀도 1.00, 재현율.63, F-1 Score .77), 맨해튼 유사도는 가장 성능이 낮다고 나타났다(정확도 .56, 정밀도 1.00, 재현율 .13, F-1 Score .23). 이를 통해 문서 유사도를 활용한 주제 이탈 텍스트 자동 검출이 실제 평가에서 사용 가능함을 확인할 수 있었다.

영문 초록

The purpose of this study is to propose a method for detecting off-topic texts in KFL (Korean as a Foreign Language) automatic writing assessments. To achieve this, a training dataset consisting of 150 texts was constructed using the Korean learner corpus provided by the National Institute of the Korean Language. Based on this dataset, cosine similarity, Euclidean similarity, and Manhattan similarity models were developed. Subsequently, two test datasets were created: one consisting of texts on the same topic as the training dataset (30 texts) and the other consisting of texts on different topics (30 texts). The predictive performance of each model was compared. Analysis using confusion matrix indicated that the cosine similarity model performed the best (accuracy: .98, precision: 1.00, recall: .96, F-1 score: .97), followed by the Euclidean similarity model (accuracy: .81, precision: 1.00, recall: .63, F-1 score: .77). The Manhattan similarity model demonstrated the lowest performance (accuracy: .56, precision: 1.00, recall: .13, F-1 score: .23). These results confirm that using document similarity for the automatic detection of off-topic texts can be practically applicable in actual assessments.

키워드

자동 채점 작문 평가 주제 이탈 텍스트 한국어 쓰기 automated scoring writing assessment off topic text Korean writing

국문 초록

영문 초록

목차

키워드

해당간행물 수록 논문

참고문헌

관련논문

인문학 > 언어학분야 BEST

인문학 > 언어학분야 NEW

최근 이용한 논문

APA

MLA