Skip to main content

IID-NORD: A Comprehensive intrinsic Image Decomposition Dataset

Diclehan Ulucan, Oguzhan Ulucan, Marc Ebner

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:09:02
19 Oct 2022

Vision and language models are easily transferred to other tasks. in particular, they have been demonstrated to work well in the evaluation of automatic image captioning. This has made it possible to evaluate systems without the need for references or additional information apart from the image and the caption. However, these models do not provide a straightforward way of evaluating videos. in this paper, we propose using these models for video captioning evaluation. We explore the use of both single image-based evaluation and different methods to include data from multiple frames. Experiments demonstrate that using clustering methods to select a few frames to compute the final score gives an excellent correlation with human judgment. The bias in the human annotations can also influence the metric, so we propose filtering the human assessments to discard outliers and improve the evaluation process.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00