SpecSwin3D: Generating Hyperspectral Imagery From Multispectral Data via Transformer Networks and Curriculum-Based Cascade Training

Tang Sui,Songxi Yang,Qunying Huang

Published 2025 in IEEE Transactions on Geoscience and Remote Sensing

ABSTRACT

Multispectral (MS) and hyperspectral (HS) imageries are widely used in agriculture, environmental monitoring, and urban planning due to their complementary spatial and spectral characteristics. A fundamental tradeoff persists: MS imagery offers high spatial but limited spectral resolution, while HS imagery provides rich spectra at lower spatial resolution. Prior HS generation approaches often struggle to jointly preserve spatial detail and spectral fidelity. In response, we propose SpecSwin3D, a Swin Transformer-based model that generates HS imagery from MS inputs while preserving both spatial and spectral qualities. Specifically, SpecSwin3D uses five MS input bands to generate 224 HS bands at the same spatial resolution. In addition, existing methods that construct all HS bands using a single global model suffer from increasing generation errors for bands that are spectrally distant from the input bands, while training separate models (band-specific) for individual bands is computationally intensive. To address this tradeoff, we propose a curriculum-based cascade training strategy that progressively expands the spectral range from easier, MS-adjacent bands to more challenging, spectrally distant bands. This approach enables stable learning from spectrally proximal to distal bands and improves reconstruction fidelity for each individual band while significantly improving computational efficiency. Moreover, we design an optimized band sequence that strategically repeats and orders the five selected MS bands to better capture pairwise relations of the spectrum within a 3-D shifted-window Transformer framework. Quantitatively, our model achieves a peak signal-to-noise ratio (PSNR) of 35.84 dB, spectral angle mapper (SAM) of 2.39°, and structural similarity index metric (SSIM) of 0.96, outperforming state-of-the-art deep learning based approach by +5.8 dB in PSNR and reducing ERGAS by more than half. Beyond HS band generation, we further demonstrate the practical value of SpecSwin3D on two downstream tasks, including land-use classification and burned area segmentation, achieving satisfactory results with enhanced both spatial and spectral resolutions. Although SpecSwin3D uses a single unified model, cascade training substantially reduces the training budget for the majority of target bands (e.g., from 80 to 20 epochs at later levels), resulting in about 75% lower training cost compared with uniform training.

PUBLICATION RECORD

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-78 of 78 references · Page 1 of 1

CITED BY

  • No citing papers are available for this paper.

Showing 0-0 of 0 citing papers · Page 1 of 1