On the creation of narrow AI: hierarchy and nonlocality of neural network skills

Eric J. Michaud,Asher Parker-Sartori,Max Tegmark

Published 2025 in arXiv.org

ABSTRACT

We study the problem of creating strong, yet narrow, AI systems. While recent AI progress has been driven by the training of large general-purpose foundation models, the creation of smaller models specialized for narrow domains could be valuable for both efficiency and safety. In this work, we explore two challenges involved in creating such systems, having to do with basic properties of how neural networks learn and structure their representations. The first challenge regards when it is possible to train narrow models from scratch. Through experiments on a synthetic task, we find that it is sometimes necessary to train networks on a wide distribution of data to learn certain narrow skills within that distribution. This effect arises when skills depend on each other hierarchically, and training on a broad distribution introduces a curriculum which substantially accelerates learning. The second challenge regards how to transfer particular skills from large general models into small specialized models. We find that model skills are often not perfectly localized to a particular set of prunable components. However, we find that methods based on pruning can still outperform distillation. We investigate the use of a regularization objective to align desired skills with prunable components while unlearning unnecessary skills.

PUBLICATION RECORD

Publication year
2025
Venue
arXiv.org
Publication date
2025-05-21
Fields of study
Computer Science
Identifiers
DOI 10.48550/arXiv.2505.15811 arXiv 2505.15811
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Hardness of Learning Fixed Parities with Neural Networks
2025cited by this paper
The Singapore Consensus on Global AI Safety Research Priorities
2025cited by this paper
DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition
2025cited by this paper
Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report
2025cited by this paper
Open Problems in Mechanistic Interpretability
2025cited by this paper
Physics of Skill Learning
2025cited by this paper
Open Problems in Machine Unlearning for AI Safety
2025cited by this paper
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
2024cited by this paper
Pruning is Optimal for Learning Sparse Features in High-Dimensions
2024cited by this paper
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
2024cited by this paper
An exactly solvable model for emergence and scaling laws in the multitask sparse parity problem
2024cited by this paper
Mechanistic Interpretability for AI Safety - A Review
2024cited by this paper
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
2024cited by this paper
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
2024cited by this paper
Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models
2024cited by this paper
Opening the AI Black Box: Distilling Machine-Learned Algorithms into Code
2024cited by this paper
The Geometry of Concepts: Sparse Autoencoder Feature Structure
2024cited by this paper
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
2024cited by this paper
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
2024cited by this paper
Qwen2.5-Coder Technical Report
2024cited by this paper
The Llama 3 Herd of Models
2024influential reference
Attribution Patching Outperforms Automated Circuit Discovery
2023cited by this paper
Progress measures for grokking via mechanistic interpretability
2023cited by this paper
SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics
2023cited by this paper
The Quantization Model of Neural Scaling
2023influential reference
LLM-Pruner: On the Structural Pruning of Large Language Models
2023cited by this paper
Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models
2023cited by this paper
Provably safe systems: the only path to controllable AGI
2023cited by this paper
Explaining grokking through circuit efficiency
2023cited by this paper
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
2023cited by this paper
Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
2023cited by this paper
Large Language Model Unlearning
2023cited by this paper
Unlearn What You Want to Forget: Efficient Unlearning for LLMs
2023cited by this paper
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
2023cited by this paper
What Causes Polysemanticity? An Alternative Origin Story of Mixed Selectivity from Incidental Causes
2023cited by this paper
Locating and Editing Factual Associations in GPT
2022cited by this paper
In-context Learning and Induction Heads
2022cited by this paper
Structured Pruning Learns Compact and Accurate Models
2022cited by this paper
Pruning-as-Search: Efficient Neural Architecture Search via Channel Pruning and Structural Reparameterization
2022cited by this paper
Toy Models of Superposition
2022cited by this paper
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
2022cited by this paper
Emergent Abilities of Large Language Models
2022cited by this paper
On the Opportunities and Risks of Foundation Models
2021cited by this paper
Movement Pruning: Adaptive Sparsity by Fine-Tuning
2020cited by this paper
Scaling Laws for Neural Language Models
2020cited by this paper
Zoom In: An Introduction to Circuits
2020cited by this paper
Patient Knowledge Distillation for BERT Model Compression
2019cited by this paper
Structured Pruning of a BERT-based Question Answering Model
2019cited by this paper
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
2019cited by this paper
TinyBERT: Distilling BERT for Natural Language Understanding
2019cited by this paper
Machine Unlearning
2019cited by this paper
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
2019cited by this paper
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
2018cited by this paper
Deep Learning Scaling is Predictable, Empirically
2017cited by this paper
Less Is More: Towards Compact CNNs
2016cited by this paper
Linear Algebraic Structure of Word Senses, with Applications to Polysemy
2016cited by this paper
Learning Structured Sparsity in Deep Neural Networks
2016cited by this paper
Why Does Deep and Cheap Learning Work So Well?
2016cited by this paper
Towards Making Systems Forget with Machine Unlearning
2015cited by this paper
Distilling the Knowledge in a Neural Network
2015influential reference
FitNets: Hints for Thin Deep Nets
2014cited by this paper
Model selection and estimation in regression with grouped variables
2006cited by this paper
The mnist database of handwritten digits
2005cited by this paper
Regression Shrinkage and Selection via the Lasso
1996cited by this paper
Feature Visualization
1994cited by this paper
Second Order Derivatives for Network Pruning: Optimal Brain Surgeon
1992cited by this paper
Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems
1991cited by this paper
Learning distributed representations of concepts.
1989cited by this paper
Optimal Brain Damage
1989cited by this paper

CITED BY

Position: Capability Control Should be a Separate Goal From Alignment
2026cites this paper
An Overview of Artificial Intelligence in Neurology
2025cites this paper
Weight-sparse transformers have interpretable circuits
2025cites this paper