An Exploration of Using the Intel AVX2 Gather Load Instructions for Vectorised Image Processing

Published 2018 in Image and Vision Computing New Zealand

ABSTRACT

Processing image data with single-instruction multiple-data (SIMD) CPU instructions provides a means of vectorising, thus speeding up execution, of standard image processing operators. SIMD register loads normally load from consecutive locations in memory, that is, consecutive pixels in a row of the image. For some algorithms, however, data dependencies between pixels along rows render SIMD vectorisation useless. If one could efficiently load pixels from columns of images this problem would be fixed. The Intel AVX2 CPU extension introduces an instruction for the gather loading of data from multiple memory locations into a single CPU SIMD register. We explore using these instructions for column loads of image data in two common image operations, transposing images and mean filtering, to test 1) whether they provide useful speed-ups when other vectorised approaches exist (and find that they do not), and 2) whether they provide means of implementing operations that otherwise would be difficult or extremely inefficient to achieve without a column load (they can provide speed-ups over scalar code).

PUBLICATION RECORD

Publication year
2018
Venue
Image and Vision Computing New Zealand
Publication date
2018-11-01
Fields of study
Computer Science
Identifiers
DOI 10.1109/IVCNZ.2018.8634707
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Enhanced Vector Math Support on the Intel®AVX-512 Architecture
2018cited by this paper
The ARM Scalable Vector Extension
2017cited by this paper
Vectorised SIMD implementations of morphology algorithms
2015cited by this paper
Efficient 2-D Grayscale Morphological Transformations With Arbitrary Flat Structuring Elements
2008cited by this paper
A novel 2D filter design methodology for heterogeneous devices
2005cited by this paper
Decomposition of morphological structuring elements
2005cited by this paper
N-bit unsigned division via n-bit multiply-add
2005cited by this paper
Division by invariant integers using multiplication
1994cited by this paper
SIMD architectures and algorithms for image processing and computer vision
1989cited by this paper
Morphological structuring element decomposition
1986cited by this paper

CITED BY

Optimized Bit-Packing for Bit-Wise Software-Defined GNSS Radio
2021cites this paper