Exploring 3D Dataset Pruning

Xiaohan Zhao,Xinyi Shang,Jiacheng Liu,Zhiqiang Shen

Published 2026 in Unknown venue

ABSTRACT

Dataset pruning has been widely studied for 2D images to remove redundancy and accelerate training, while particular pruning methods for 3D data remain largely unexplored. In this work, we study dataset pruning for 3D data, where its observed common long-tail class distribution nature make optimization under conventional evaluation metrics Overall Accuracy (OA) and Mean Accuracy (mAcc) inherently conflicting, and further make pruning particularly challenging. To address this, we formulate pruning as approximating the full-data expected risk with a weighted subset, which reveals two key errors: coverage error from insufficient representativeness and prior-mismatch bias from inconsistency between subset-induced class weights and target metrics. We propose representation-aware subset selection with per-class retention quotas for long-tail coverage, and prior-invariant teacher supervision using calibrated soft labels and embedding-geometry distillation. The retention quota also serves as a switch to control the OA-mAcc trade-off. Extensive experiments on 3D datasets show that our method can improve both metrics across multiple settings while adapting to different downstream preferences. Our code is available at https://github.com/XiaohanZhao123/3D-Dataset-Pruning.

PUBLICATION RECORD

Publication year
2026
Venue
Unknown venue
Publication date
2026-02-28
Fields of study
Computer Science
Identifiers
arXiv 2603.00651
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Medium-Difficulty Samples Constitute Smoothed Decision Boundary for Knowledge Distillation on Pruned Datasets
2025cited by this paper
Class-Proportional Coreset Selection for Difficulty-Separable Data
2025cited by this paper
Non-Uniform Class-Wise Coreset Selection: Characterizing Category Difficulty for Data-Efficient Transfer Learning
2025influential reference
Distilling the Knowledge in Data Pruning
2024cited by this paper
DRoP: Distributionally Robust Data Pruning
2024influential reference
Masked Autoencoders for 3D Point Cloud Self-supervised Learning
2023cited by this paper
Bilevel Coreset Selection in Continual Learning: A New Formulation and Algorithm
2023cited by this paper
Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective
2023cited by this paper
MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis
2022cited by this paper
Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory
2022cited by this paper
PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies
2022cited by this paper
Dataset Pruning: Reducing Training Data by Examining Generalization Influence
2022cited by this paper
GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient Deep Model Training
2021cited by this paper
Foundations of Machine Learning
2021influential reference
Deep Learning on a Data Diet: Finding Important Examples Early in Training
2021cited by this paper
Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data
2019cited by this paper
Class-Balanced Loss Based on Effective Number of Samples
2019cited by this paper
Variational Adversarial Active Learning
2019cited by this paper
Relational Knowledge Distillation
2019influential reference
Decoupling Representation and Classifier for Long-Tailed Recognition
2019cited by this paper
Coresets for Data-efficient Training of Machine Learning Models
2019cited by this paper
MeshNet: Mesh Neural Network for 3D Shape Representation
2018cited by this paper
An Empirical Study of Example Forgetting during Deep Neural Network Learning
2018cited by this paper
Active Learning for Convolutional Neural Networks: A Core-Set Approach
2017influential reference
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
2017cited by this paper
A systematic study of the class imbalance problem in convolutional neural networks
2017cited by this paper
Spectrally-normalized margin bounds for neural networks
2017cited by this paper
A scalable active framework for region annotation in 3D shape collections
2016influential reference
Distilling the Knowledge in a Neural Network
2015cited by this paper
ShapeNet: An Information-Rich 3D Model Repository
2015cited by this paper
Submodularity in Data Subset Selection and Active Learning
2015influential reference
3D ShapeNets: A deep representation for volumetric shapes
2014cited by this paper
A new active labeling method for deep learning
2014cited by this paper
Herding dynamical weights to learn
2009cited by this paper
Integral Probability Metrics and Their Generating Classes of Functions
1997influential reference

CITED BY

No citing papers are available for this paper.