A Sample-Free Compilation Framework for Efficient Dynamic Tensor Computation

Yangjie Zhou,Honglin Zhu,Qian Qiu,Weihao Cui,Zihan Liu,Peng Chen,Mohamed Wahib,Cong Guo,Siyuan Feng,Jintao Meng,Haidong Lan,Jingwen Leng,Yun Lin,Jinsong Dong,Wenxi Zhu,Minwen Deng

Published 2025 in International Conference on Software Composition

ABSTRACT

Dynamic-shape tensor computation poses challenges for shape-specific compilation due to variable input dimensions. Existing compilers rely on shape samples, incurring high tuning costs and performance degradation on unseen inputs. We present Helix, a dynamic tensor compilation framework with sample-free compilation and architecture-guided optimization to achieve both compilation efficiency and shape-general performance. To avoid shape sampling, Helix constructs shape-agnostic compilation by decomposing computations across architectural layers. A bidirectional strategy combines top-down abstraction to align tensor computations with architectural hierarchies, and bottom-up kernel construction to build efficient execution strategies from reusable, architecture-aligned micro-kernels. A hybrid analyzer ensures accuracy through profiling at lower architectural levels, and achieves scalability through architecture-informed modeling at higher levels and runtime. This hierarchical design eliminates shape-specific tuning and enables shape-adaptive execution. Evaluations conducted on x86 CPUs, ARM CPUs, and NVIDIA GPUs demonstrate that Helix reduces compilation time by 174 × over the existing compilers and delivers 2.26 × and 3.29 × execution speedups over vendor libraries and dynamic-shape compilers, respectively.

PUBLICATION RECORD

Publication year
2025
Venue
International Conference on Software Composition
Publication date
2025-11-15
Fields of study
Computer Science
Identifiers
DOI 10.1145/3712285.3759779
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Early-Exit Deep Neural Network - A Comprehensive Survey
2024cited by this paper
Optimizing Dynamic-Shape Neural Networks on Accelerators via On-the-Fly Micro-Kernel Polymerization
2024cited by this paper
Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion
2023cited by this paper
LLaMA: Open and Efficient Foundation Language Models
2023cited by this paper
BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach
2023cited by this paper
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding
2023cited by this paper
AdaptGear: Accelerating GNN Training via Adaptive Subgraph-Level Kernels on GPUs
2023cited by this paper
uGrapher: High-Performance Graph Operator Computation via Unified Abstraction for Graph Neural Networks
2023cited by this paper
AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures
2022cited by this paper
TensorIR: An Abstraction for Automatic Tensorized Program Optimization
2022cited by this paper
Graphiler: Optimizing Graph Neural Networks with Message Passing Data Flow Graph
2022cited by this paper
Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization
2022cited by this paper
ROLLER: Fast and Efficient Tensor Compilation for Deep Learning
2022cited by this paper
Seastar: vertex-centric programming for graph neural networks
2021cited by this paper
Dynamic Neural Networks: A Survey
2021cited by this paper
Equality Saturation for Tensor Graph Superoptimization
2021cited by this paper
DNNFusion: accelerating deep neural networks execution with advanced operator fusion
2020cited by this paper
GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs
2020cited by this paper
FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System
2020cited by this paper
The Arm Neoverse N1 Platform: Building Blocks for the Next-Gen Cloud-to-Edge Infrastructure SoC
2020cited by this paper
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
2020cited by this paper
Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference
2020cited by this paper
Ansor : Generating High-Performance Tensor Programs for Deep Learning
2020cited by this paper
Lazy Batching: An SLA-aware Batching System for Cloud Machine Learning Inference
2020cited by this paper
Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks
2020cited by this paper
Fast Graph Representation Learning with PyTorch Geometric
2019cited by this paper
A Versatile Software Systolic Execution Model for GPU Memory-Bound Kernels
2019cited by this paper
Dynamic-stride-net: deep convolutional neural network with dynamic stride
2019cited by this paper
TASO: optimizing deep learning computation with automatic generation of graph substitutions
2019cited by this paper
A Comprehensive Survey on Graph Neural Networks
2019cited by this paper
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
2018cited by this paper
Learning to Optimize Tensor Programs
2018cited by this paper
Halide
2017cited by this paper
Graph Attention Networks
2017cited by this paper
Attention is All you Need
2017cited by this paper
Scale-Aware Face Detection
2017cited by this paper
TensorFlow: A system for large-scale machine learning
2016cited by this paper
Deep Learning
2016cited by this paper
An Introduction To Sieve Methods And Their Applications
2016cited by this paper
Fast R-CNN
2015cited by this paper
Deep Residual Learning for Image Recognition
2015cited by this paper
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014cited by this paper
cuDNN: Efficient Primitives for Deep Learning
2014cited by this paper
Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines
2013cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Anatomy of high-performance matrix multiplication
2008cited by this paper
Sieve methods
2000cited by this paper

CITED BY

FlashFuser: Expanding the Scale of Kernel Fusion for Compute-Intensive Operators via Inter-Core Connection
2025cites this paper
Session Summary Podcast: Session 3: Auto-Tuning, Compilation, and Code Generation
2025cites this paper