Structured prediction is concerned with predicting multiple inter-dependent labels simultaneously. Classical methods like CRF achieve this by maximizing a score function over the set of possible label assignments. Recent extensions use neural networks to either implement the score function or in maximization. The current paper takes an alternative approach, using a neural network to generate the structured output directly, without going through a score function. We take an axiomatic perspective to derive the desired properties and invariances of a such network to certain input permutations, presenting a structural characterization that is provably both necessary and sufficient. We then discuss graph-permutation invariant (GPI) architectures that satisfy this characterization and explain how they can be used for deep structured prediction. We evaluate our approach on the challenging problem of inferring a {\em scene graph} from an image, namely, predicting entities and their relations in the image. We obtain state-of-the-art results on the challenging Visual Genome benchmark, outperforming all recent approaches.
Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction
Roei Herzig,Moshiko Raboh,Gal Chechik,Jonathan Berant,A. Globerson
Published 2018 in Neural Information Processing Systems
ABSTRACT
PUBLICATION RECORD
- Publication year
2018
- Venue
Neural Information Processing Systems
- Publication date
2018-02-01
- Fields of study
Mathematics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-38 of 38 references · Page 1 of 1