Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 13:29
26 Oct 2020

In this paper, we propose a generative architecture for manipulating images/scenes with natural language descriptions. This is a challenging task as the generative network is expected to perform the given text instruction without changing the non-affiliating contents of the input image. Two main drawbacks of the existing methods are their limitation of performing changes that would affect only a limited region and the inability of handling complex instructions. The proposed approach, designed to address these limitations initially uses two sets of networks to extract the image and text features respectively. Rather than a simple combination of these two modalities during the image manipulation process, we use an improved technique to compose image and text features. Additionally, the generative network utilizes similarity learning to improve text manipulation which also enforces only the text-relevant changes on the input image. Our experiments on CSS and Fashion Synthesis datasets show that the proposed approach performs remarkably well and outperforms the baseline frameworks in terms of R-precision and FID.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00