Multispectral (MS) and hyperspectral (HS) imageries are widely used in agriculture, environmental monitoring, and urban planning due to their complementary spatial and spectral characteristics. A fundamental tradeoff persists: MS imagery offers high spatial but limited spectral resolution, while HS imagery provides rich spectra at lower spatial resolution. Prior HS generation approaches often struggle to jointly preserve spatial detail and spectral fidelity. In response, we propose SpecSwin3D, a Swin Transformer-based model that generates HS imagery from MS inputs while preserving both spatial and spectral qualities. Specifically, SpecSwin3D uses five MS input bands to generate 224 HS bands at the same spatial resolution. In addition, existing methods that construct all HS bands using a single global model suffer from increasing generation errors for bands that are spectrally distant from the input bands, while training separate models (band-specific) for individual bands is computationally intensive. To address this tradeoff, we propose a curriculum-based cascade training strategy that progressively expands the spectral range from easier, MS-adjacent bands to more challenging, spectrally distant bands. This approach enables stable learning from spectrally proximal to distal bands and improves reconstruction fidelity for each individual band while significantly improving computational efficiency. Moreover, we design an optimized band sequence that strategically repeats and orders the five selected MS bands to better capture pairwise relations of the spectrum within a 3-D shifted-window Transformer framework. Quantitatively, our model achieves a peak signal-to-noise ratio (PSNR) of 35.84 dB, spectral angle mapper (SAM) of 2.39°, and structural similarity index metric (SSIM) of 0.96, outperforming state-of-the-art deep learning based approach by +5.8 dB in PSNR and reducing ERGAS by more than half. Beyond HS band generation, we further demonstrate the practical value of SpecSwin3D on two downstream tasks, including land-use classification and burned area segmentation, achieving satisfactory results with enhanced both spatial and spectral resolutions. Although SpecSwin3D uses a single unified model, cascade training substantially reduces the training budget for the majority of target bands (e.g., from 80 to 20 epochs at later levels), resulting in about 75% lower training cost compared with uniform training.
SpecSwin3D: Generating Hyperspectral Imagery From Multispectral Data via Transformer Networks and Curriculum-Based Cascade Training
Tang Sui,Songxi Yang,Qunying Huang
Published 2025 in IEEE Transactions on Geoscience and Remote Sensing
ABSTRACT
PUBLICATION RECORD
- Publication year
2025
- Venue
IEEE Transactions on Geoscience and Remote Sensing
- Publication date
2025-09-07
- Fields of study
Computer Science, Environmental Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-78 of 78 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1