교보문고

학술논문

블로그 특징정보를 활용한 클러스터 레이블 선정에 관한 연구

이용수 3

영문명: A Study of Particular Cluster Labeling using the Blog Feature Information
발행기관: 제주대학교 교육과학연구소
저자명: 한승민(Han Seung-min) 이은지(Lee Eun-ji) 김판구(Kim Pan-koo)
간행물 정보: 『교육과학연구』제19권 제1호, 39~59쪽, 전체 21쪽
주제분류: 사회과학 > 교육학
파일형태: PDF
발행일자: 2017.05.31

5,320원

구매일시로부터 72시간 이내에 다운로드 가능합니다.
이 학술논문 정보는 (주)교보문고와 각 발행기관 사이에 저작물 이용 계약이 체결된 것으로, 교보문고를 통해 제공되고 있습니다.

1:1 문의

국문 초록

문서 클러스터링은 문서에 포함하는 키워드의 출현 빈도나 특징들을 통해 유사한 문서들끼리 하나의 클러스터로 묶는 것으로 문서 형태의 데이터의 양이 증가함에 따라 다양하게 사용되고 있다. 이러한 클러스터의 레이블은 문서 클러스터의 의미를 사용자가 쉽게 이해할 수 있도록 도와주며 각각의 문서 클러스터 간의 관계를 파악하는데 도움을 준다. 본 논문의 최종 목표는 최근 지식공유를 위해 활용되고 있는 블로그의 클러스터링과 클러스터의 대표 레이블 선정을 위해 클러스터의 특징을 표현하면서 전체적인 내용을 포괄할 수 있는 대표 레이블을 선정하는 방법을 제안함에 있다. 먼저 블로그의 제목, 본문, 태그를 수집하고 명사만을 추출하여 키워드 정규화, 특정 위치 키워드 가중치를 통한 후보 키워드 집합을 생성하고, 생성된 후보 키워드 집합은 FP-growth 알고리즘을 이용하여 연관규칙을 생성함으로써 해당 클러스터와 의미적 연관성이 있는 대표 레이블을 선정한다. 제안한 방법의 성능평가를 위해 블로그 특징정보를 고려하지 않은 TF-IDF 가중치를 이용한 방법과 비교평가를 수행하였으며, 제안된 방법이 특정 검색에 대한 블로그 클러스터 대표 레이블을 결정하여 사용자에게 제공하는 것을 증명하였다.

영문 초록

Document clustering is used in various clusters as the amount of data in a document type is increased by grouping similar documents into clusters through appearance frequency or characteristics of the keyword included in the document. Labels on these clusters help users to understand the meaning of the document cluster and help to understand the relationship between each document cluster. Therefore, it is necessary to have a label that covers the meaning of the cluster and can express the characteristics. Also, it is used in carious fields such as blogs as well as documents. However, blogs have a problem that it is difficult to select a representative label that can express the meaning of each cluster generated as a result of clustering because the information is widened as a lot of information is produced in real time due to its characteristics. Therefore, in this paper, we select the representative label that can cover the whole contents by expressing the characteristics of the cluster through the problem that it is difficult to select the representative label of the cluster due to a large amount of data due to the nature or the blog. First, we collect the title, body, and tags of a blog, extract only nouns, and generate a candidate keyword set through keyword normalization and specific location keyword weights. The generated candidate keyword set is generated by selecting the semantic association label with the cluster using the FP-growth algorithm. In this way, the performance of the representative label selection method which does not utilize the existing specific weight is improved, and the representative label of the blogs cluster for the specific search is determined and proved to be provided to the user.

키워드

블로그 레이블 클러스터 레이블 연관규칙 마이닝 FP-growth 알고리즘 Blog labeling Cluster labeling Association mining FP-growth algorithm

국문 초록

영문 초록

목차

키워드

해당간행물 수록 논문

참고문헌

관련논문

사회과학 > 교육학분야 BEST

사회과학 > 교육학분야 NEW

최근 이용한 논문

APA

MLA