Multi-view Gaze Target Estimation

Qiaomu Miao,Vivek Raju Golani,Jingyi Xu,Progga Paromita Dutta,Minh Hoai,Dimitris Samaras

Published 2025 in arXiv.org

ABSTRACT

This paper presents a method that utilizes multiple camera views for the gaze target estimation (GTE) task. The approach integrates information from different camera views to improve accuracy and expand applicability, addressing limitations in existing single-view methods that face challenges such as face occlusion, target ambiguity, and out-of-view targets. Our method processes a pair of camera views as input, incorporating a Head Information Aggregation (HIA) module for leveraging head information from both views for more accurate gaze estimation, an Uncertainty-based Gaze Selection (UGS) for identifying the most reliable gaze output, and an Epipolar-based Scene Attention (ESA) module for cross-view background information sharing. This approach significantly outperforms single-view baselines, especially when the second camera provides a clear view of the person's face. Additionally, our method can estimate the gaze target in the first view using the image of the person in the second view only, a capability not possessed by single-view GTE methods. Furthermore, the paper introduces a multi-view dataset for developing and evaluating multi-view GTE methods. Data and code are available at https://www3.cs.stonybrook.edu/~cvl/multiview_gte.html

PUBLICATION RECORD

Publication year
2025
Venue
arXiv.org
Publication date
2025-08-07
Fields of study
Computer Science
Identifiers
DOI 10.48550/arXiv.2508.05857 arXiv 2508.05857
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

A Novel Framework for Multi-Person Temporal Gaze Following and Social Gaze Prediction
2024cited by this paper
Gaze Target Detection Based on Head-Local-Global Coordination
2024cited by this paper
AL-GTD: Deep Active Learning for Gaze Target Detection
2024cited by this paper
Unsupervised Gaze Representation Learning from Multi-view Face Images
2024cited by this paper
Hedge Fund Portfolio Construction Using PolyModel Theory and iTransformer
2024cited by this paper
Sharingan: A Transformer Architecture for Multi-Person Gaze Following
2024cited by this paper
A Unified Model for Gaze Following and Social Gaze Prediction
2024cited by this paper
Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following
2024cited by this paper
Gaze Target Detection by Merging Human Attention and Activity Cues
2024cited by this paper
ViTGaze: gaze following with interaction features in vision transformers
2024cited by this paper
Rotation-Constrained Cross-View Feature Fusion for Multi-View Appearance-based Gaze Estimation
2023cited by this paper
DUSt3R: Geometric 3D Vision Made Easy
2023influential reference
EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
2023cited by this paper
GFIE: A Dataset and Baseline for Gaze-Following from 2D to 3D in Indoor Environments
2023cited by this paper
DVGaze: Dual-View Gaze Estimation
2023cited by this paper
Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image
2023cited by this paper
Object-aware Gaze Target Detection
2023cited by this paper
ChildPlay: A New Benchmark for Understanding Children’s Gaze Behaviour
2023influential reference
Zero-1-to-3: Zero-shot One Image to 3D Object
2023cited by this paper
We Know Where They Are Looking at From the RGB-D Camera: Gaze Following in 3D
2022cited by this paper
Patch-level Gaze Distribution Prediction for Gaze Following
2022cited by this paper
Multimodal Across Domains Gaze Target Detection
2022cited by this paper
A Modular Multimodal Architecture for Gaze Target Prediction: Application to Privacy-Sensitive Settings
2022cited by this paper
ESCNet: Gaze Target Detection with the Understanding of 3D Scenes
2022cited by this paper
Dynamic 3D Gaze from Afar: Deep Gaze Estimation from Temporal Eye-Head-Body Coordination
2022cited by this paper
MultiMAE: Multi-modal Multi-task Masked Autoencoders
2022cited by this paper
End-to-End Human-Gaze-Target Detection with Transformers
2022cited by this paper
MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo
2021cited by this paper
MVS2D: Efficient Multiview Stereo via Attention-Driven 2D Convolutions
2021cited by this paper
DeepGaze IIE: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling
2021cited by this paper
Dual Attention Guided Gaze Target Detection in the Wild
2021cited by this paper
Looking here or there? Gaze Following in 360-Degree Images
2021cited by this paper
ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation
2020cited by this paper
3D Gaze Estimation for Head-Mounted Devices based on Visual Saliency
2020cited by this paper
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2020cited by this paper
Learning to Recover 3D Scene Shape from a Single Image
2020cited by this paper
Towards End-to-end Video-based Eye-Tracking
2020cited by this paper
VoxelPose: Towards Multi-camera 3D Human Pose Estimation in Wild Environment
2020cited by this paper
Denoising Diffusion Probabilistic Models
2020cited by this paper
Detecting Attended Visual Targets in Video
2020cited by this paper
Epipolar Transformers
2020cited by this paper
Learnable Triangulation of Human Pose
2019cited by this paper
What I See Is What You See: Joint Attention Learning for First and Third Person Video Co-analysis
2019cited by this paper
Fast and Robust Multi-Person 3D Pose Estimation From Multiple Views
2019cited by this paper
Gaze Estimation for Assisted Living Environments
2019cited by this paper
Cross View Fusion for 3D Human Pose Estimation
2019cited by this paper
Gaze360: Physically Unconstrained Gaze Estimation in the Wild
2019cited by this paper
Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency
2018cited by this paper
Believe It or Not, We Know What You Are Looking At!
2018cited by this paper
MVSNet: Depth Inference for Unstructured Multi-view Stereo
2018cited by this paper
Fine-Grained Head Pose Estimation Without Keypoints
2017cited by this paper
What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?
2017cited by this paper
A Meta-Analysis of Gaze Differences to Social and Nonsocial Information Between Individuals With and Without Autism.
2017cited by this paper
Social eye gaze in human-robot interaction
2017cited by this paper
Learning a Multi-View Stereo Machine
2017cited by this paper
How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks)
2017cited by this paper
Attention is All you Need
2017cited by this paper
Following Gaze in Video
2017cited by this paper
Deep Residual Learning for Image Recognition
2015cited by this paper
3D Pictorial Structures for Multiple Human Pose Estimation
2014cited by this paper
3D Social Saliency from Head-mounted Cameras
2012cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Learning to Recognize Daily Actions Using Gaze
2012cited by this paper
Social interactions: A first-person perspective
2012cited by this paper
AprilTag: A robust and flexible visual fiducial system
2011cited by this paper
Accurate, Dense, and Robust Multiview Stereopsis
2010cited by this paper
Flexible cooperation between human and robot by interpreting human intention from gaze information
2004cited by this paper
Eye tracking in human-computer interaction and usability research : Ready to deliver the promises
2002cited by this paper
Multiple View Geometry in Computer Vision
2001cited by this paper
The eyes have it: the neuroethology, function and evolution of social gaze.
2000cited by this paper
Gaze behavior in autism
1990cited by this paper
What are they looking for?
1969influential reference
Representing Scenes as Neural Radiance Fields for View Synthesis
year unknowncited by this paper

CITED BY

No citing papers are available for this paper.