Time Series Forecasting (TSF) is a pivotal capability for critical applications in industrial automation, energy management, and smart infrastructure. While Large Language Models (LLMs) and Transformer-based architectures have set new benchmarks in forecasting accuracy, their exorbitant computational costs and memory requirements prohibit their deployment on resource-constrained Microcontroller Units (MCUs). Lightweight models offer an alternative but often fail to capture complex long-term dependencies. This paper introduces a comprehensive framework to address these interconnected challenges of accuracy and efficiency. We propose a highly efficient Convolutional Neural Network (CNN) architecture, enhanced through a novel heterogeneous knowledge distillation methodology. In our framework, a powerful LLM-based teacher model, utilizing self-attention mechanisms, guides the training of a compact student model that operates solely on lightweight convolutions during inference. This approach is augmented by the Gated Dilated depthwise separable Convolution (GDC) block, which efficiently captures long-range dependencies and multi-scale patterns while drastically minimizing parameter count to meet strict MCU constraints. The robustness and generalizability of our framework are validated through extensive experiments on six long-term and five short-term forecasting benchmarks. Critically, our framework establishes a new state-of-the-art among lightweight architectures, demonstrating consistent performance gains across diverse forecasting horizons. Experimental results confirm that our method reduces Mean Squared Error (MSE) by 11.5% in long-term and 12.4% in short-term forecasting tasks compared to massive LLM-based baselines, all while utilizing less than 0.01% of the parameters. The final, optimized model is successfully deployed on a commercial ESP32-S3 MCU, demonstrating its practical viability for real-time, accurate, and efficient forecasting on edge devices with an inference latency of just 256 ms.
Distilling Privileged Knowledge From Transformers to Lightweight CNNs for On-Device Time Series Forecasting
Sangjin Na,Yong-Jun Cho,Yunju Baek
Published 2026 in IEEE Access
ABSTRACT
PUBLICATION RECORD
- Publication year
2026
- Venue
IEEE Access
- Publication date
Unknown publication date
- Fields of study
Not labeled
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-32 of 32 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1