IID-NORD: A Comprehensive intrinsic Image Decomposition Dataset

Diclehan Ulucan, Oguzhan Ulucan, Marc Ebner

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:09:02

19 Oct 2022

Vision and language models are easily transferred to other tasks. in particular, they have been demonstrated to work well in the evaluation of automatic image captioning. This has made it possible to evaluate systems without the need for references or additional information apart from the image and the caption. However, these models do not provide a straightforward way of evaluating videos. in this paper, we propose using these models for video captioning evaluation. We explore the use of both single image-based evaluation and different methods to include data from multiple frames. Experiments demonstrate that using clustering methods to select a few frames to compute the final score gives an excellent correlation with human judgment. The bias in the human annotations can also influence the metric, so we propose filtering the human assessments to discard outliers and improve the evaluation process.

Tags:

International Conference on Image Processing

IEEE ICIP 2022

icip