Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction

Roei Herzig,Moshiko Raboh,Gal Chechik,Jonathan Berant,A. Globerson

Published 2018 in Neural Information Processing Systems

ABSTRACT

Structured prediction is concerned with predicting multiple inter-dependent labels simultaneously. Classical methods like CRF achieve this by maximizing a score function over the set of possible label assignments. Recent extensions use neural networks to either implement the score function or in maximization. The current paper takes an alternative approach, using a neural network to generate the structured output directly, without going through a score function. We take an axiomatic perspective to derive the desired properties and invariances of a such network to certain input permutations, presenting a structural characterization that is provably both necessary and sufficient. We then discuss graph-permutation invariant (GPI) architectures that satisfy this characterization and explain how they can be used for deep structured prediction. We evaluate our approach on the challenging problem of inferring a {\em scene graph} from an image, namely, predicting entities and their relations in the image. We obtain state-of-the-art results on the challenging Visual Genome benchmark, outperforming all recent approaches.

PUBLICATION RECORD

  • Publication year

    2018

  • Venue

    Neural Information Processing Systems

  • Publication date

    2018-02-01

  • Fields of study

    Mathematics, Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-38 of 38 references · Page 1 of 1

CITED BY

Showing 1-100 of 137 citing papers · Page 1 of 2