Webly-Supervised Image Manipulation Localization via Category-Aware Auto-Annotation

Chenfan Qu,Yiwu Zhong,Bin Li,Lianwen Jin

Published 2025 in arXiv.org

ABSTRACT

Images manipulated by image editing tools can mislead viewers and pose significant risks to social security. However, accurately localizing manipulated image regions remains challenging due to the severe scarcity of high-quality annotated data, which is laborious to create. To address this, we propose a novel approach that mitigates data scarcity by leveraging readily available web data. We utilize a large collection of manually forged images from the web, as well as automatically generated annotations derived from a simpler auxiliary task, constrained image manipulation localization.Specifically, we introduce CAAAv2, a novel auto-annotation framework that operates on a category-aware, prior-feature-denoising paradigm that notably reduces task complexity. To further ensure annotation reliability, we propose QES, a novel metric that filters out low-quality annotations. Combining CAAAv2 and QES, we construct MIMLv2, a large-scale, diverse, and high-quality dataset containing 246,212 manually forged images with pixel-level mask annotations. This is over 120 times larger than existing handcrafted datasets like IMD20. Additionally, we introduce Object Jitter, a technique that further enhances model training by generating high-quality manipulation artifacts. Building on these advances, we develop Web-IML, a new model designed to effectively leverage web-scale supervision for the task of image manipulation localization. Extensive experiments demonstrate that our approach substantially alleviates the data scarcity problem and significantly improves the performance of various models on multiple real-world forgery benchmarks. With the proposed web supervision, our Web-IML achieves a striking performance gain of 31% and surpasses the previous state-of-the-art SparseViT by 21.6 average IoU points. The dataset and code will be released at https://github.com/qcf-568/MIML.

PUBLICATION RECORD

Publication year
2025
Venue
arXiv.org
Publication date
2025-08-28
Fields of study
Computer Science
Identifiers
DOI 10.48550/arXiv.2508.20987 arXiv 2508.20987
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

IMDPrompter: Adapting SAM to Image Manipulation Detection by Cross-View Automated Prompt Learning
2025cited by this paper
OpenSDI: Spotting Diffusion-Generated Images in the Open World
2025cited by this paper
MUN: Image Forgery Localization Based on M³ Encoder and UN Decoder
2025cited by this paper
DINOv3
2025cited by this paper
DiRLoc: Disentanglement Representation Learning for Robust Image Forgery Localization
2025cited by this paper
Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer
2024influential reference
SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model
2024cited by this paper
Image Manipulation Detection with Implicit Neural Representation and Limited Supervision
2024cited by this paper
Omni-IML: Towards Unified Image Manipulation Localization
2024cited by this paper
Noise-Assisted Prompt Learning for Image Forgery Detection and Localization
2024cited by this paper
Multi-view Feature Extraction via Tunable Prompts is Enough for Image Manipulation Localization
2024cited by this paper
FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models
2024cited by this paper
UnionFormer: Unified-Learning Transformer with Multi-View Representation for Image Manipulation Detection and Localization
2024cited by this paper
Revisiting Tampered Scene Text Detection in the Era of Generative AI
2024cited by this paper
DiffForensics: Leveraging Diffusion Prior to Image Forgery Detection and Localization
2024cited by this paper
HDF-Net: Capturing Homogeny Difference Features to Localize the Tampered Image
2024cited by this paper
Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods
2024influential reference
Exploring Multi-View Pixel Contrast for General and Robust Image Forgery Localization
2024cited by this paper
Employing Reinforcement Learning to Construct a Decision-Making Environment for Image Forgery Localization
2024cited by this paper
MGQFormer: Mask-Guided Query-Based Transformer for Image Manipulation Localization
2024cited by this paper
Robust Text Image Tampering Localization via Forgery Traces Enhancement and Multiscale Attention
2024cited by this paper
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
2024cited by this paper
Fully Unsupervised Deepfake Video Detection Via Enhanced Contrastive Learning
2024cited by this paper
Generalized Face Liveness Detection via De-Fake Face Generator
2024cited by this paper
TextSleuth: Towards Explainable Tampered Text Detection
2024cited by this paper
SUMI-IFL: An Information-Theoretic Framework for Image Forgery Localization with Sufficiency and Minimality Constraints
2024cited by this paper
Segment Anything
2023cited by this paper
DINOv2: Learning Robust Visual Features without Supervision
2023influential reference
AutoSplice: A Text-prompt Manipulated Image Dataset for Media Forensics
2023cited by this paper
Edge-aware Regional Message Passing Controller for Image Forgery Localization
2023cited by this paper
Uncertainty-guided Learning for Improving Image Manipulation Detection
2023cited by this paper
Toward Real Text Manipulation Detection: New Dataset and New Solution
2023cited by this paper
A New Benchmark and Model for Challenging Image Manipulation Detection
2023influential reference
Pixel-Inconsistency Modeling for Image Manipulation Localization
2023cited by this paper
Pre-training-free Image Manipulation Localization through Non-Mutually Exclusive Contrastive Learning
2023influential reference
Detecting and Grounding Multi-Modal Media Manipulation and Beyond
2023cited by this paper
Towards Generic Image Manipulation Detection with Weakly-Supervised Self-Consistency Learning
2023cited by this paper
Multi-scale Target-Aware Framework for Constrained Splicing Detection and Localization
2023cited by this paper
Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution
2023cited by this paper
Hierarchical Fine-Grained Image Forgery Detection and Localization
2023cited by this paper
Explicit Visual Prompting for Low-Level Structure Segmentations
2023cited by this paper
Jointly Defending DeepFake Manipulation and Adversarial Attack Using Decoy Mechanism
2023cited by this paper
Face Forgery Detection by 3D Decomposition and Composition Search
2023cited by this paper
CFL-Net: Image Forgery Localization Using Contrastive Learning
2022cited by this paper
A ConvNet for the 2020s
2022cited by this paper
Visual attention network
2022cited by this paper
A Principled Design of Image Representation: Towards Forensic Tasks
2022cited by this paper
ObjectFormer for Image Manipulation Detection and Localization
2022cited by this paper
Spoof Trace Disentanglement for Generic Face Anti-Spoofing
2022cited by this paper
Robust Image Forgery Detection over Online Social Network Shared Images
2022cited by this paper
Scale-Adaptive Deep Matching Network for Constrained Image Splicing Detection and Localization
2022cited by this paper
Dilated Neighborhood Attention Transformer
2022cited by this paper
Towards JPEG-Resistant Image Forgery Detection and Localization Via Self-Supervised Domain Adaptation
2022cited by this paper
Learning to Immunize Images for Tamper Localization and Self-Recovery
2022cited by this paper
TruFor: Leveraging All-Round Clues for Trustworthy Image Forgery Detection and Localization
2022influential reference
Extended IMD2020: a large-scale annotated dataset tailored for detecting manipulated images
2021influential reference
MVSS-Net: Multi-View Multi-Scale Supervised Networks for Image Manipulation Detection
2021cited by this paper
Masked-attention Mask Transformer for Universal Image Segmentation
2021cited by this paper
Towards Flexible Blind JPEG Artifacts Removal
2021cited by this paper
Learning JPEG Compression Artifacts for Image Manipulation Detection and Localization
2021influential reference
Detection and Localization of Multiple Image Splicing using MobileNet V1
2021influential reference
Self-Adversarial Training Incorporating Forgery Attention for Image Forgery Localization
2021cited by this paper
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
2021cited by this paper
PSCC-Net: Progressive Spatio-Channel Correlation Network for Image Manipulation Detection and Localization
2021influential reference
Learning Transferable Visual Models From Natural Language Supervision
2021cited by this paper
LoRA: Low-Rank Adaptation of Large Language Models
2021cited by this paper
Reverse Engineering of Generative Models: Inferring Model Hyperparameters From Generated Images
2021cited by this paper
Emerging Properties in Self-Supervised Vision Transformers
2021cited by this paper
Image Tampering Localization Using a Dense Fully Convolutional Network
2021cited by this paper
Image Manipulation Detection by Multi-View Multi-Scale Supervision
2021cited by this paper
DeepFake Detection Based on Discrepancies Between Faces and Their Context
2020cited by this paper
SPAN: Spatial Pyramid Attention Network forImage Manipulation Localization
2020cited by this paper
Constrained Image Splicing Detection and Localization With Attention-Aware Encoder-Decoder and Atrous Convolution
2020cited by this paper
MFC Datasets: Large-Scale Benchmark Datasets for Media Forensic Challenge Evaluation
2019influential reference
RRU-Net: The Ringed Residual U-Net for Image Splicing Forgery Detection
2019cited by this paper
DEFACTO: Image and Face Manipulation Dataset
2019influential reference
ManTra-Net: Manipulation Tracing Network for Detection and Localization of Image Forgeries With Anomalous Features
2019cited by this paper
Adversarial Learning for Constrained Image Splicing Detection and Localization Based on Atrous Convolution
2019cited by this paper
FakeCatcher: Detection of Synthetic Portrait Videos using Biological Signals
2019cited by this paper
Fighting Fake News: Image Splice Detection via Learned Self-Consistency
2018cited by this paper
Learning Rich Features for Image Manipulation Detection
2018cited by this paper
Decoupled Weight Decay Regularization
2017cited by this paper
Scene Parsing through ADE20K Dataset
2017cited by this paper
Deep Matching and Validation Network: An End-to-End Solution to Constrained Image Splicing Localization and Detection
2017influential reference
COVERAGE — A novel database for copy-move forgery detection
2016influential reference
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014cited by this paper
Microsoft COCO: Common Objects in Context
2014cited by this paper
CASIA Image Tampering Detection Evaluation Database
2013influential reference
Implementation and Benchmarking of Perceptual Image Hash Functions
2010cited by this paper
ImageNet: A large-scale hierarchical image database
2009influential reference
A Threshold Selection Method from Gray-Level Histograms
1979cited by this paper

CITED BY

TextShield-R1: Reinforced Reasoning for Tampered Text Detection
2026cites this paper