Visual Relationship Detection with Language Priors

Cewu Lu,Ranjay Krishna,Michael S. Bernstein,Li Fei-Fei

Published 2016 in European Conference on Computer Vision

ABSTRACT

Visual relationships capture a wide variety of interactions between pairs of objects in images (e.g. “man riding bicycle” and “man pushing bicycle”). Consequently, the set of possible relationships is extremely large and it is difficult to obtain sufficient training examples for all possible relationships. Because of this limitation, previous work on visual relationship detection has concentrated on predicting only a handful of relationships. Though most relationships are infrequent, their objects (e.g. “man” and “bicycle”) and predicates (e.g. “riding” and “pushing”) independently occur more frequently. We propose a model that uses this insight to train visual models for objects and predicates individually and later combines them together to predict multiple relationships per image. We improve on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted relationship. Our model can scale to predict thousands of types of relationships from a few examples. Additionally, we localize the objects in the predicted relationships as bounding boxes in the image. We further demonstrate that understanding relationships can improve content based image retrieval.

PUBLICATION RECORD

Publication year
2016
Venue
European Conference on Computer Vision
Publication date
2016-07-31
Fields of study
Computer Science
Identifiers
DOI 10.1007/978-3-319-46448-0_51 arXiv 1608.00187
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
2016cited by this paper
Image retrieval using scene graphs
2015influential reference
Learning semantic relationships for better action retrieval in images
2015cited by this paper
Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval
2015cited by this paper
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014cited by this paper
Incorporating Scene Context and Object Layout into Appearance Modeling
2014cited by this paper
COSTA: Co-Occurrence Statistics for Zero-Shot Classification
2014cited by this paper
Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild
2014cited by this paper
Semantic Parsing for Text to 3D Scene Generation
2014cited by this paper
From captions to visual concepts and back
2014cited by this paper
Grounding Action Descriptions in Videos
2013cited by this paper
Understanding Indoor Scenes Using 3D Geometric Phrases
2013cited by this paper
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
2013influential reference
Efficient Estimation of Word Representations in Vector Space
2013influential reference
BabyTalk: Understanding and Generating Simple Image Descriptions
2013cited by this paper
Learning the Visual Interpretation of Sentences
2013cited by this paper
Translating Video Content to Natural Language Descriptions
2013cited by this paper
YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition
2013cited by this paper
Understanding and predicting importance in images
2012cited by this paper
Semantic Compositionality through Recursive Matrix-Vector Spaces
2012cited by this paper
Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation
2012cited by this paper
Recognition using visual phrases
2011influential reference
Action recognition from a distributed representation of pose and appearance
2011cited by this paper
Measuring the Objectness of Image Windows
2011cited by this paper
Learning to share visual appearance for multiclass object detection
2011cited by this paper
Baby Talk: Understanding and Generating Image Descriptions
2011cited by this paper
Efficiently selecting regions for scene understanding
2010cited by this paper
Context based object categorization: A critical survey
2010cited by this paper
Grouplet: A structured image representation for recognizing human and object interactions
2010cited by this paper
Every Picture Tells a Story: Generating Sentences from Images
2010cited by this paper
Modeling mutual context of object and human pose in human-object interaction activities
2010cited by this paper
Graph Cut Based Inference with Co-occurrence Statistics
2010cited by this paper
Exploiting hierarchical context on a large database of object categories
2010cited by this paper
Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition
2009cited by this paper
Object categorization using co-occurrence, location and appearance
2008cited by this paper
Multi-Class Segmentation with Relative Location Prior
2008cited by this paper
Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers
2008cited by this paper
Tree Kernel-Based Relation Extraction with Context-Sensitive Structured Parse Tree Information
2007cited by this paper
Objects in Context
2007cited by this paper
Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts
2007cited by this paper
Putting Objects in Perspective
2006cited by this paper
Using Multiple Segmentations to Discover Objects and their Extent in Image Collections
2006cited by this paper
Exploring Various Knowledge in Relation Extraction
2005cited by this paper
Discovering objects and their location in images
2005cited by this paper
Distinctive Image Features from Scale-Invariant Keypoints
2004cited by this paper
Dependency Tree Kernels for Relation Extraction
2004cited by this paper
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
2001cited by this paper

CITED BY

Multimodal large model driven pseudo labeling for unbiased scene graph generation
2026cites this paper
RAG-3DSG: Enhancing 3D Scene Graphs with Re-Shot Guided Retrieval-Augmented Generation
2026cites this paper
Synthetic Visual Genome 2: Extracting Large-scale Spatio-Temporal Scene Graphs from Videos
2026cites this paper
RS-Net: Context-Aware Relation Scoring for Dynamic Scene Graph Generation
2026cites this paper
Graph Recognition via Subgraph Prediction
2026cites this paper
NeSyVQA: Neurosymbolic Visual Question Answering With Knowledge-Enriched Scene Graphs
2026influential citation
A multimodal spatiotemporal convolutional network with attention mechanism for athlete anxiety behavior recognition
2026cites this paper
SGR3 Model: Scene Graph Retrieval-Reasoning Model in 3D
2026cites this paper
Bias-aware learning for unbiased scene graph generation in remote sensing imagery
2026cites this paper
Multi-scale scene graph generation for remote sensing imagery
2026cites this paper
Salience-SGG: Enhancing Unbiased Scene Graph Generation with Iterative Salience Estimation
2026cites this paper
SSGR-AR: Semantic-Enhanced Scene Graph Reasoning for Robust Video Action Recognition
2025cites this paper
What and when to look? Temporal span proposal network for video relation detection
2025cites this paper
Hierarchical Scene Graph Generation and Vectorization of Aerial Images
2025cites this paper
STVRM : Spatio-temporal relational modeling with vision transformer for dynamic scene graph generation
2025cites this paper
VisKnow: Constructing Visual Knowledge Base for Object Understanding
2025cites this paper
With Great Context Comes Great Prediction Power: Classifying Objects via Geo-Semantic Scene Graphs
2025cites this paper
Embedding Font Impression Word Tags Based on Co-occurrence
2025cites this paper
A Framework for Generating Artificial Datasets to Validate Absolute and Relative Position Concepts
2025cites this paper
Skynet-V1: Towards Early Warning of Video Abnormal Events via A Spatial-temporal Causal-enhanced MoE Framework
2025cites this paper
VLPrompt-PSG: Vision-Language Prompting for Panoptic Scene Graph Generation
2025influential citation
Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI
2025cites this paper
C2F-Space: Coarse-to-Fine Space Grounding for Spatial Instructions using Vision-Language Models
2025cites this paper
Scene Graph-Based Spatial Reasoning with VLM for High-Level Robotic Tasks
2025influential citation
CoPa-SG: Dense Scene Graphs with Parametric and Proto-Relations
2025cites this paper
M2ST-Net: Human-Object Interaction Recognition Using A Multi-stream Multi-feature Spatial-Temporal Network
2025cites this paper
SPIN-SGG: spatial integration for open-vocabulary scene graph generation
2025cites this paper
Remote Sensing Image Scene Graph Generation Method Based on Knowledge Graph Enhancement and Relationship Filtering
2025cites this paper
Modality-aligned anchor learning based on multi-level fusion for accurate scene graph generation
2025cites this paper
Learning by Imagining: Debiased Feature Augmentation for Compositional Zero-Shot Learning
2025cites this paper
Learning Spatial-Aware Manipulation Ordering
2025cites this paper
SGRD: A Ship Group Relationship Description Method Based on Scene Graph Generation With a Global-Local Context Fusion Network
2025cites this paper
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
2025cites this paper
Contrastive Learning of Image Representations Guided by Spatial Relations
2025cites this paper
On the Potential of Logic and Reasoning in Neurosymbolic Systems Using OWL-Based Knowledge Graphs
2025influential citation
Assured Autonomy with Neuro-Symbolic Perception
2025cites this paper
SAMJAM: Zero-Shot Video Scene Graph Generation for Egocentric Kitchen Videos
2025cites this paper
Two-dimensional spatial orientation relation recognition between image objects
2025cites this paper
Synthetic Visual Genome
2025cites this paper
HOIverse: A Synthetic Scene Graph Dataset with Human Object Interactions
2025cites this paper
Reusing Attention for One-stage Lane Topology Understanding
2025cites this paper
On the Interplay between Graph Structure and Learning Algorithms in Graph Neural Networks
2025cites this paper
Enhancing visual-LLM for construction site safety compliance via prompt engineering and Bi-stage retrieval-augmented generation
2025cites this paper
Contextual Object Grouping (COG): A Specialized Framework for Dynamic Symbol Interpretation in Technical Security Diagrams
2025cites this paper
Scene-Specific Multiprototype Network for Remote Sensing Scene Graph Generation
2025cites this paper
ESCA: Contextualizing Embodied Agents via Scene-Graph Generation
2025cites this paper
Hierarchical Prototype Learning via Aggregation-Decomposition for Fine-Grained Geospatial Scene Graph Generation
2025cites this paper
VOST-SGG: VLM-Aided One-Stage Spatio-Temporal Scene Graph Generation
2025cites this paper
Self-Attention with State-Object Weighted Combination for Compositional Zero Shot Learning
2025cites this paper
Self-Enhancing Video Data Management System for Compositional Events with Large Language Models
2025cites this paper
DiffVsgg: Diffusion-Driven Online Video Scene Graph Generation
2025cites this paper
MuRelSGG: Multimodal Relationship Prediction for Neurosymbolic Scene Graph Generation
2025cites this paper
Dynamic Relation Inference via Verb Embeddings
2025cites this paper
Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLM
2025cites this paper
Remote sensing scene graph generation for improved retrieval based on spatial relationships
2025cites this paper
Group Visual Relation Detection
2025influential citation
Generation of Scene Graph and Semantic Image: A Review and Challenge Ahead
2025cites this paper
Scene Graph Generation Approach with Integrated Environmental Contextual Information
2025cites this paper
Open-Scene Understanding-oriented 3D Scene Graph Generation
2025cites this paper
Hard-Label Black-Box Adversarial Attacks for Implicit Scene Interactions
2025cites this paper
Vision Language Models Cannot Plan, but Can They Formalize?
2025cites this paper
Inferential and Commonsense Visual Question Generation
2025cites this paper
HOIEdit: Human–object interaction editing with text-to-image diffusion model
2025cites this paper
RelationLMM: Large Multimodal Model as Open and Versatile Visual Relationship Generalist
2025influential citation
GroupRF: Panoptic Scene Graph Generation with group relation tokens
2025cites this paper
KnowZRel: Common Sense Knowledge-Based Zero-Shot Relationship Retrieval for Generalized Scene Graph Generation
2025cites this paper
History-Enhanced 3D Scene Graph Reasoning From RGB-D Sequences
2025cites this paper
A survey on dynamic scene understanding using temporal knowledge graphs: From scene knowledge representation to extrapolation
2025cites this paper
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
2025cites this paper
Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation
2025cites this paper
Salient Temporal Encoding for Dynamic Scene Graph Generation
2025cites this paper
Universal Scene Graph Generation
2025cites this paper
A Multi-Hop Graph Reasoning Network for Knowledge-Based VQA
2025cites this paper
CrowdRadar: a mobile crowdsensing framework for urban traffic green travel safety risk assessment
2025cites this paper
Scene graph generation based on lightweight entity pair object detection and relation classification ensemble
2025cites this paper
Deep Object Occlusion Relationship Detection Based on Associative Embedding Clustering
2025cites this paper
Union-Redefined Prototype Network for scene graph generation
2025cites this paper
VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?
2025cites this paper
D-Fusion: Direct Preference Optimization for Aligning Diffusion Models with Visually Consistent Samples
2025cites this paper
CVCPSG: Discovering Composite Visual Clues for Panoptic Scene Graph Generation
2025influential citation
Open World Scene Graph Generation using Vision Language Models
2025cites this paper
Enhancing visual question answering with common sense knowledge: a data-driven neurosymbolic graph routing approach
2025cites this paper
Hybrid Reciprocal Transformer with Triplet Feature Alignment for Scene Graph Generation
2025influential citation
What Demands Attention in Urban Street Scenes? From Scene Understanding towards Road Safety: A Survey of Vision-driven Datasets and Studies
2025cites this paper
ART: Adaptive Relation Tuning for Generalized Relation Prediction
2025cites this paper
Navigating the Unseen: Zero-shot Scene Graph Generation via Capsule-Based Equivariant Features
2025cites this paper
RoRA-VLM: Robust Retrieval-Augmented Vision Language Models
2024cites this paper
A Progressive-Assisted Object Detection Method Based on Instance Attention
2024cites this paper
Mastering Scene Understanding: Scene Graphs to the Rescue
2024cites this paper
Dynamic Scene Graph Generation with Unified Temporal Modeling
2024cites this paper
EgoSG: Learning 3D Scene Graphs from Egocentric RGB-D Sequences
2024influential citation
Relationship detection for manipulation in object stacking scene with fully connected CRF
2024cites this paper
SceneGraMMi: Scene Graph-boosted Hybrid-fusion for Multi-Modal Misinformation Veracity Prediction
2024cites this paper
Bézier Everywhere All at Once: Learning Drivable Lanes as Bézier Graphs
2024cites this paper
Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation
2024cites this paper
Beyond Seen Primitive Concepts and Attribute-Object Compositional Learning
2024cites this paper
A Modern Take on Visual Relationship Reasoning for Grasp Planning
2024cites this paper
ViRED: Prediction of Visual Relations in Engineering Drawings
2024influential citation
A Simple and Efficient Approach for Extracting Object Hierarchy in Image Data
2024cites this paper
Generating Visual Stories with Grounded and Coreferent Characters
2024cites this paper