DeFrame: Debiasing Large Language Models Against Framing Effects

Kahee Lim,Soyeon Kim,Steven Euijong Whang

Published 2026 in Unknown venue

ABSTRACT

As large language models (LLMs) are increasingly deployed in real-world applications, ensuring their fair responses across demographics has become crucial. Despite many efforts, an ongoing challenge is hidden bias: LLMs appear fair under standard evaluations, but can produce biased responses outside those evaluation settings. In this paper, we identify framing -- differences in how semantically equivalent prompts are expressed (e.g.,"A is better than B"vs."B is worse than A") -- as an underexplored contributor to this gap. We first introduce the concept of"framing disparity"to quantify the impact of framing on fairness evaluation. By augmenting fairness evaluation benchmarks with alternative framings, we find that (1) fairness scores vary significantly with framing and (2) existing debiasing methods improve overall (i.e., frame-averaged) fairness, but often fail to reduce framing-induced disparities. To address this, we propose a framing-aware debiasing method that encourages LLMs to be more consistent across framings. Experiments demonstrate that our approach reduces overall bias and improves robustness against framing disparities, enabling LLMs to produce fairer and more consistent responses.

PUBLICATION RECORD

Publication year
2026
Venue
Unknown venue
Publication date
2026-02-04
Fields of study
Linguistics, Computer Science
Identifiers
arXiv 2602.04306
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Semantic Sensitivities and Inconsistent Predictions: Measuring the Fragility of NLI Models
2024cited by this paper
Bias and Volatility: A Statistical Framework for Evaluating Large Language Model's Stereotypes and the Associated Generation Inconsistency
2024cited by this paper
SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection
2024cited by this paper
Strategic Demonstration Selection for Improved Fairness in LLM In-Context Learning
2024cited by this paper
The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Language Models
2024cited by this paper
Walking in Others’ Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias
2024cited by this paper
Social Bias Evaluation for Large Language Models Requires Prompt Variations
2024cited by this paper
Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language Models
2024cited by this paper
“Thinking” Fair and Slow: On the Efficacy of Structured Prompts for Debiasing Language Models
2024cited by this paper
Do-Not-Answer: Evaluating Safeguards in LLMs
2024cited by this paper
Prompting Fairness: Integrating Causality to Debias Large Language Models
2024cited by this paper
Cognitive Bias in Decision-Making with LLMs
2024cited by this paper
Disclosure and Mitigation of Gender Bias in LLMs
2024cited by this paper
Measuring Implicit Bias in Explicitly Unbiased Large Language Models
2024cited by this paper
Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes
2024cited by this paper
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
2023cited by this paper
Large Language Models Can Be Easily Distracted by Irrelevant Context
2023cited by this paper
The Capacity for Moral Self-Correction in Large Language Models
2023cited by this paper
A Trip Towards Fairness: Bias and De-Biasing in Large Language Models
2023cited by this paper
Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models
2023cited by this paper
In-Contextual Gender Bias Suppression for Large Language Models
2023cited by this paper
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
2023cited by this paper
Physics of Language Models: Part 3.2, Knowledge Manipulation
2023cited by this paper
Debiasing Algorithm through Model Adaptation
2023cited by this paper
Mitigating Biases for Instruction-following Language Models via Bias Neurons Elimination
2023cited by this paper
Evaluating and Mitigating Discrimination in Language Model Decisions
2023cited by this paper
Fairness-Aware Structured Pruning in Transformers
2023cited by this paper
Prompting GPT-3 To Be Reliable
2022cited by this paper
“I’m sorry to hear that”: Finding New Biases in Language Models with a Holistic Descriptor Dataset
2022cited by this paper
Chain of Thought Prompting Elicits Reasoning in Large Language Models
2022cited by this paper
FairFil: Contrastive Neural Debiasing Method for Pretrained Text Encoders
2021cited by this paper
BBQ: A hand-built bias benchmark for question answering
2021influential reference
BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation
2021cited by this paper
Understanding by Understanding Not: Modeling Negation in Language Models
2021cited by this paper
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
2020cited by this paper
Language (Technology) is Power: A Critical Survey of “Bias” in NLP
2020cited by this paper
UNQOVERing Stereotypical Biases via Underspecified Questions
2020cited by this paper
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models
2020cited by this paper
Language Models are Few-Shot Learners
2020cited by this paper
StereoSet: Measuring stereotypical bias in pretrained language models
2020cited by this paper
On Measuring and Mitigating Biased Inferences of Word Embeddings
2019cited by this paper
Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology
2019cited by this paper
Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting
2019cited by this paper
The Woman Worked as a Babysitter: On Biases in Language Generation
2019cited by this paper
Gender Bias in Coreference Resolution
2018cited by this paper
Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods
2018cited by this paper
Thinking fast and slow.
2014cited by this paper
Blindspot: Hidden Biases of Good People
2013cited by this paper
Dual-Process and Dual-System Theories of Reasoning
2010cited by this paper
The framing of decisions and the psychology of choice.
1981influential reference

CITED BY

No citing papers are available for this paper.