교보문고

학술논문

Video Vision Transformer를 이용한 벼의 수확량 예측과 Self-attention 시각화

이용수 25

영문명: Rice yield prediction and self-attention visualization using Video Vision Transformer
발행기관: 한국자료분석학회
저자명: 김다현(Dahyun Kim) 나명환(Myung Hwan Na)
간행물 정보: 『Journal of The Korean Data Analysis Society (JKDAS)』Vol.25 No.4, 1249~1259쪽, 전체 11쪽
주제분류: 자연과학 > 통계학
파일형태: PDF
발행일자: 2023.08.31

4,120원

구매일시로부터 72시간 이내에 다운로드 가능합니다.
이 학술논문 정보는 (주)교보문고와 각 발행기관 사이에 저작물 이용 계약이 체결된 것으로, 교보문고를 통해 제공되고 있습니다.

1:1 문의

국문 초록

정부와 농민단체는 매년 쌀을 얼마나 생산할 수 있을지 예측하는 문제에 많은 관심을 기울이고 있다. 하지만 해마다 변하는 기상이변과 다양한 병충해와 같은 변동요인들로 인하여 벼의 수확량을 정확하게 예측하는 것을 어렵게 만든다. 본 연구에서는 무인항공기에 탑재된 다중 스펙트럼 센서를 통해 벼의 생육기간 동안 이미지를 여러 번 수집하고, 딥러닝 알고리즘을 이용하여벼 수확량을 예측하였다. 다중 스펙트럼 이미지는 일정 간격을 두고 여러 번 촬영된 일종의 영상 데이터로 볼 수 있으며, 딥러닝 알고리즘 중에서 트랜스포머 구조를 영상 컴퓨터 비전에 적용한 Video Vision Transformer(ViViT) 모델을 사용하여 벼 수확량을 예측하였다. ViViT 모델은입력 영상을 일정한 크기로 분할한 패치(patch)들을 생성하는데, 이 패치의 크기를 다르게 설정하여 모델을 학습한 결과 작은 패치 크기를 사용할수록 예측력이 좋아지는 것으로 나타났다. 또한이미지 처리 분야에서 사용되어온 CNN(Convolutional Neural Network) 구조에서 영상을 입력으로받는 3D CNN 모델과 예측 성능을 비교해본 결과 작은 패치 크기를 사용한 ViViT 모델의 성능이 더 우수한 것으로 나타났다. ViViT 모델의 학습된 가중치 행렬을 heat map으로 시각화한 결과 8월 중후반에 촬영된 이미지가 수확량 예측에 중요하게 나타나 벼를 수확하기 약 두 달 전에수확량 예측이 가능할 것으로 보인다.

영문 초록

The government and farmers' organizations are paying much attention to the problem of predicting how much rice can be produced each year. However, it is difficult to accurately predict the yield of rice due to variable factors such as extreme climate change and various pests and diseases that change every year. In this study, images were collected several times during the growing season of rice through a multi-spectral sensor mounted on an unmanned aerial vehicle, and rice yield was predicted using a deep learning algorithm. Multispectral images can be viewed as a kind of image data taken several times at regular intervals, and rice yield was predicted using the Video Vision Transformer (ViViT) model, which applies the Transformer structure to image computer vision among deep learning algorithms. The ViViT model generates patches by dividing the input image into a certain size, and as a result of learning the model by setting the size of these patches differently, it was found that the smaller the patch size, the better the predictive power. In addition, as a result of comparing prediction performance with a 3D CNN model that receives an image as an input in a CNN (Convolutional Neural Network) structure used in the image processing field, it was found that the ViViT model using a small patch size performed better. As a result of visualizing the weight matrix of the ViViT model as a heat map, images taken in mid- to late August appear to be important in yield prediction, making it possible to predict yield about two months before rice harvest.

키워드

다중 스펙트럼 영상 수확량 예측 Self-attention Tubelet embedding Video Vision Transformer Multi-spectral images Self-attention Tubelet embedding Video Vision Transformer Yield prediction

국문 초록

영문 초록

목차

키워드

해당간행물 수록 논문

참고문헌

관련논문

자연과학 > 통계학분야 BEST

자연과학 > 통계학분야 NEW

최근 이용한 논문

APA

MLA