The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Richard Zhang,Phillip Isola,Alexei A. Efros,Eli Shechtman,Oliver Wang

Published 2018 in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

ABSTRACT

While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on ImageNet classification has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new dataset of human perceptual similarity judgments. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by large margins on our dataset. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.

PUBLICATION RECORD

Publication year
2018
Venue
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Publication date
2018-01-11
Fields of study
Computer Science
Identifiers
DOI 10.1109/CVPR.2018.00068 arXiv 1801.03924
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Video Frame Interpolation via Adaptive Separable Convolution
2017cited by this paper
NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study
2017cited by this paper
Eigen-Distortions of Hierarchical Representations
2017cited by this paper
Deep Video Deblurring for Hand-Held Cameras
2017cited by this paper
Deep Learning of Human Visual Sensitivity in Image Quality Assessment Framework
2017cited by this paper
Enhanced Deep Residual Networks for Single Image Super-Resolution
2017influential reference
Learned perceptual image enhancement
2017cited by this paper
NIMA: Neural Image Assessment
2017cited by this paper
Photographic Image Synthesis with Cascaded Refinement Networks
2017cited by this paper
DeepSim: Deep similarity for image quality assessment
2017cited by this paper
Generating Images with Perceptual Similarity Metrics based on Deep Networks
2016cited by this paper
- LEVEL ACCURACY WITH 50 X FEWER PARAMETERS AND < 0 . 5 MB MODEL SIZE
2016cited by this paper
Colorful Image Colorization
2016cited by this paper
EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis
2016cited by this paper
Image Quality Assessment by Comparing CNN Features between Images
2016cited by this paper
Image-to-Image Translation with Conditional Adversarial Networks
2016cited by this paper
Image Style Transfer Using Convolutional Neural Networks
2016cited by this paper
Learning Representations for Automatic Colorization
2016cited by this paper
Context Encoders: Feature Learning by Inpainting
2016cited by this paper
Image database TID 2013 : Peculiarities , results and perspectives
2016cited by this paper
Using goal-driven deep learning models to understand sensory cortex
2016cited by this paper
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size
2016cited by this paper
Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction
2016cited by this paper
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
2016cited by this paper
Adversarial Feature Learning
2016influential reference
Learning Features by Watching Objects Move
2016influential reference
Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles
2016cited by this paper
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
2016cited by this paper
Learning visual groups from co-occurrences in space and time
2015cited by this paper
Learning to See by Moving
2015cited by this paper
Visually Indicated Sounds
2015cited by this paper
Image database TID2013: Peculiarities, results and perspectives
2015cited by this paper
Unsupervised Learning of Visual Representations using Videos
2015cited by this paper
Phase-based frame interpolation for video
2015cited by this paper
Deep Networks for Image Super-Resolution with Sparse Prior
2015cited by this paper
Massive Online Crowdsourced Study of Subjective and Objective Picture Quality
2015cited by this paper
Unsupervised Visual Representation Learning by Context Prediction
2015cited by this paper
Data-dependent Initializations of Convolutional Neural Networks
2015influential reference
Accurate Image Super-Resolution Using Very Deep Convolutional Networks
2015cited by this paper
RAISE: a raw images dataset for digital image forensics
2015influential reference
Hand-Held Video Deblurring Via Efficient Fourier Aggregation
2015cited by this paper
One weird trick for parallelizing convolutional neural networks
2014influential reference
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014cited by this paper
ImageNet Large Scale Visual Recognition Challenge
2014cited by this paper
Evaluating Amazon's Mechanical Turk as a Tool for Experimental Behavioral Research
2013cited by this paper
AVA: A large-scale database for aesthetic visual analysis
2012cited by this paper
ImageNet classification with deep convolutional neural networks
2012influential reference
FSIM: A Feature Similarity Index for Image Quality Assessment
2011influential reference
HDR-VDP-2: a calibrated visual metric for visibility and quality predictions in all luminance conditions
2011cited by this paper
Most apparent distortion: full-reference image quality assessment and the role of strategy
2010cited by this paper
Beyond pixels: exploring new representations and applications for motion analysis
2009cited by this paper
Complex Wavelet Structural Similarity: A New Image Similarity Index
2009cited by this paper
Et al
2008cited by this paper
Adobe Systems Inc
2007cited by this paper
A Database and Evaluation Methodology for Optical Flow
2007cited by this paper
The PASCAL Visual Object Classes Challenge
2006cited by this paper
The Pascal Visual Object Classes Challenge 2006 ( VOC 2006 ) Results
2006cited by this paper
A Statistical Evaluation of Recent Full Reference Image Quality Assessment Algorithms
2006influential reference
Nonintentional similarity processing
2005cited by this paper
Image quality assessment: from error visibility to structural similarity
2004influential reference
TID2008 – A database for evaluation of full-reference visual quality assessment metrics
2004cited by this paper
Multiscale structural similarity for image quality assessment
2003cited by this paper
A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms
2001cited by this paper
Respects for similarity
1993cited by this paper
The Adaptive Character of Thought
1990cited by this paper
Features of Similarity
1977cited by this paper
and as an in
year unknowncited by this paper

CITED BY

ShapeAfford: Reconstructing 3D Shape With Manipulation Affordance via Geometry-Affordance Synergy
2026cites this paper
SeCo: Semantic-Guided Multimodal Color Splash Effects
2026cites this paper
A Spatially Aware Crowdsensing Framework for High-Fidelity 3D View Synthesis
2026cites this paper
DOGL-SLAM: Dynamic Object-Level SLAM via Joint Gaussian-Landmark Tracking
2026cites this paper
ICAD-UIE: Naturalness-Ensuring Underwater Image Enhancement With Interchannel Attenuation Difference-Based Dewatering Model
2026cites this paper
SV-GS: Sparse View 4D Reconstruction with Skeleton-Driven Gaussian Splatting
2026cites this paper
Foundation Model-Aided Channel-Adaptive Video Semantic Communication and Prototype Validation
2026cites this paper
An Efficient Accelerator for Dehazing Neural Network Based on Physical Perception Model and Cross-Scale Pixel Attention
2026cites this paper
Real-Scene Image Dehazing via Laplacian Pyramid-Based Conditional Diffusion Model
2026cites this paper
ICDSR: Integrated Conditional Diffusion Model for Single Image Super-Resolution
2026cites this paper
Live High-Fidelity Semantic Communication via Cross-Modal Fusion for Volumetric Video
2026cites this paper
Secure Spread Spectrum Image Steganography Using a CNN-Based Learned Detector
2026cites this paper
Super-Resolution Reconstruction for Neutron Radiography Using Improved Real-ESRGAN
2026cites this paper
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
2026cites this paper
High-fidelity mural inpainting via progressive reconstruction and damage-aware adaptation
2026cites this paper
URA-Net: Uncertainty-Integrated Anomaly Perception and Restoration Attention Network for Unsupervised Anomaly Detection
2026cites this paper
LXIE-Net and HLXray: A Mamba-Based Network and Real-World Dataset for Low-Dose X-Ray Image Enhancement in Industrial Inspection
2026cites this paper
SAT-UIR: Self-Assessment Training for Semi-Supervised Underwater Image Restoration
2026cites this paper
Multimodal LLM-driven language-embedded 3D gaussian splatting for semantic and realistic digitization of historical buildings
2026cites this paper
MDT-FI: Mask-Guided Dual-Branch Transformer With Texture and Structure Feature Interaction for Image Inpainting
2026cites this paper
MantleMark: Migrating Watermarks From Multi-View Images to Radiance Fields via Frequency Modulation
2026cites this paper
DM-VSR: Depth-Aware Diffusion Models With Adaptive Modulation for Video Super-Resolution
2026cites this paper
Reusing source diffusion model for domain perception: Towards few-shot image generation via fine-tuning
2026cites this paper
MPD-GS: Mask-guided point densification for Gaussian splatting
2026cites this paper
SAST: Semantic-Aware stylized Text-to-Image generation
2026cites this paper
HybridFlow: A Hybrid Velocity Generation Framework for Precipitation Nowcasting
2026influential citation
SAB:A stealing and robust backdoor attack based on steganographic algorithm against federated learning
2026cites this paper
VAE-GAN-Based Semantic Communication for High-Quality Image Transmission
2026cites this paper
Depth-Synergized Mamba Meets Memory Experts for All-Day Image Reflection Separation
2026cites this paper
LooC: Effective Low-Dimensional Codebook for Compositional Vector Quantization
2026cites this paper
Instruction-Driven 3D Facial Expression Generation and Transition
2026cites this paper
LRGD: Low-Rank Guided Diffusion for Robust Image Transmission in Semantic Communication
2026cites this paper
SAFAformer: Scale-Aware Frequency-Adaptive Guidance for Nighttime Flare Removal
2026cites this paper
Gradient-Guided Diffusion-Based Restoration of Extremely Compressed Backgrounds for Video Coding for Machines
2026cites this paper
Low-Light Image Enhancement via Global-Local Collaborative Transformer
2026cites this paper
A Framework With Multi-Scale Hybrid Mamba Voxel Flow for Video Prediction
2026cites this paper
Low-Light Image Enhancement via Diffusion Models With Semantic Priors of Any Region
2026cites this paper
Low-Light Image Enhancement Using a Retinex-Based Variational Model With Weighted $L_{p}$ Norm Constraint
2026cites this paper
A differentiable method for novel view SAR image generation via 3D Gaussian Splatting
2026cites this paper
Zoom-Anomaly: Multimodal vision-Language fusion industrial anomaly detection with synthetic data
2026cites this paper
Memory-Efficient Voxelized Renderable Neural 3D Spatial Representation for Vision-Based Robotics
2026cites this paper
Generic-to-Personalised Learning for Multimodal Image Synthesis With Bidirectional Variational GAN
2026cites this paper
Accelerating Adaptive Diffusion and Uncertainty Modeling for Underwater Image Enhancement
2026cites this paper
L-C4: Language-based video colorization for creative and consistent color
2026cites this paper
A Retinex-based variational model for low-light image enhancement with noise transformation
2026cites this paper
Generative Model for 2.5D-Assisted Future Urban Remote Sensing Image Synthesis
2026cites this paper
Temporal attention multi-resolution fusion of satellite image time-series, applied to Landsat-8/9 and Sentinel-2: all bands, any time, at best spatial resolution
2026cites this paper
Enabling diverse styles coverless image steganography with two-stage latent transformation and diffusion model
2026cites this paper
Frequency-aware crack segmentation network (FACS-net) and crack topology loss (CT-loss) for thin cracks
2026cites this paper
Frequency-domain multi-regularization-experts fusion for robust non-line-of-sight imaging
2026cites this paper
AnyoneCue: Gloss-Prompted Fine-Grained and Personalized Cued Speech Video Generation
2026cites this paper
VQ-DeepVSC: A Dual-Stage Vector Quantization System for Video Semantic Communication
2026cites this paper
Remote Sensing Image Blind Super-Resolution via a Convolutional Neural Network-Guided Conditional Diffusion Model
2026cites this paper
Stealth—Black-Box Attack on Industry 4.0 Medical AI: A Low-Rank Perturbation Approach
2026cites this paper
A survey on physical adversarial attacks against face recognition systems
2026cites this paper
Breaking the Diffraction Barrier: A Hybrid Framework Combining Physical Modulation and Neural Diffusion for Subdiffraction Target Recovery
2026cites this paper
Sustainable Steganography via Neural Trojans for Geographic Satellite Remote Sensing
2026cites this paper
A comparative review and benchmark for deep learning based digital image correlation method
2026cites this paper
REMAC: Reference-Based Martian Asymmetrical Image Compression
2026cites this paper
Support-free speckle-correlation imaging with implicit neural representation
2026cites this paper
DynaDrag: Dynamic Drag-Style Image Editing by Motion Prediction
2026cites this paper
TimeColor: Flexible Reference Colorization via Temporal Concatenation
2026cites this paper
H-RSSG: High-Fidelity Robotic Surgical Scene Generation With Implicit Deformable Neural Radiance Field
2026cites this paper
3DGS-Drag: Dragging Gaussians for Intuitive Point-Based 3D Editing
2026cites this paper
A Survey on Deep Learning-Based Chinese Font Style Transfer
2026cites this paper
A Self-Supervised Foundation Model for Robust and Generalizable Representation Learning in STED Microscopy
2026cites this paper
SemiSketch: An ancient mural sketch extraction network based on reference prior and gradient frequency compensation
2026cites this paper
ILVMamba: Illumination-Aware Lightweight Visual Mamba Framework for Efficient High-Resolution Image Enhancement
2026cites this paper
Clinical CT image super-resolution using pixel-wise hybrid high-dimension mapping-based implicit neural representation
2026cites this paper
Fine-Detailed Facial Sketch-to-Photo Synthesis With Detail-Enhanced Codebook Priors
2026cites this paper
Medical VLP Model Is Vulnerable: Toward Multimodal Adversarial Attack on Large Medical Vision-Language Models
2026cites this paper
Uncovering Risks of Data-Free Feature Vector Inversion Attacks Against Vector Databases
2026cites this paper
ComAI: The Convergence of Communication and Artificial Intelligence
2026cites this paper
Image Quality Assessment: Exploring the Similarity of Deep Features via Covariance-Constrained Spectra
2026cites this paper
Voice2Visage: Deciphering Faces From Voices
2026cites this paper
ViMAEdit: Vision-Guided and Mask-Enhanced Adaptive Editing Algorithm for Prompt-Based Image Editing
2026cites this paper
Taming Learnable Codebook Design and Modulation for Digital Semantic Image Communication
2026cites this paper
Reversible Unlearnable Examples: Toward the Copyright Protection in Deep Learning Era
2026cites this paper
Reliable image transmission by boosting multiple weak semantic communications
2026cites this paper
A review of instruction-guided image editing
2026cites this paper
End-to-end predictions of trabecular bone structural and mechanical properties from resolution adaptive CT imaging
2026cites this paper
SPG-GT: Structural Prior Guided GNN-Transformers for Ship Landmark Detection
2026cites this paper
An unsupervised image generation method for translation from fundus structure image to fluorescein angiography with fusion of hard exudates
2026cites this paper
Freeze-Frame With StaticNeRF: Uncertainty-Guided NeRF Map Reconstruction in Dynamic Scenes
2026cites this paper
Adversarial Pruning Networks for Compact 3D Gaussian Splatting
2026cites this paper
HDMDN: Hierarchical-Decoupling Based Meta-Knowledge Single Image Dehazing Network
2026cites this paper
Evaluation of one-image 3D reconstruction for plant model generation
2026cites this paper
Parameters-adaptive mechanism based on degradation degree for nonhomogeneous image dehazing
2026cites this paper
Time-resolved spray characterization via unified optical flow and binarization technique
2026cites this paper
FlowGS: End-to-end correspondence-guided 3D Gaussian Splatting from sparse unposed images
2026cites this paper
Adaptive gradient-oriented sampling and dynamic-blended shadow generation for realistic face relighting
2026cites this paper
PatchNeRF: Patch-based Neural Radiance Fields for real time view synthesis in wide-scale scenes
2026cites this paper
Megatron: Evasive Clean-Label Backdoor Attacks Against Vision Transformer
2026cites this paper
Multi-frequency feature fusion dehazing network based on selective state-space
2026cites this paper
An end-to-end dual-stream fusion framework with subsequent detail-aware diffusion-based refinement for joint low-light segmentation and enhancement
2026cites this paper
HPGS-SLAM: Hybrid Point-Guided Dense Visual SLAM With Online Mapping via Gaussian Splatting
2026cites this paper
An imperceptible dynamic anticipated backdoor attack in federated learning
2026cites this paper
Ordered Hierarchical Encoding for Robust Image Semantic Communication
2026influential citation
Image quality assessment: Unifying spatial and frequency distribution discrepancy in deep feature domains via Rényi divergence
2026cites this paper
EdgeNeRF: Edge-Guided Regularization for Neural Radiance Fields from Sparse Views
2026cites this paper