A framework for low-communication 1-D FFT

P. T. P. Tang,Jongsoo Park,Daehyun Kim,V. Petrov

Published 2012 in International Conference for High Performance Computing, Networking, Storage and Analysis

ABSTRACT

In high-performance computing on distributed-memory systems, communication often represents a significant part of the overall execution time. The relative cost of communication will certainly continue to rise as compute-density growth follows the current technology and industry trends. Design of lower-communication alternatives to fundamental computational algorithms has become an important field of research. For distributed 1-D FFT, communication cost has hitherto remained high as all industry-standard implementations perform three all-to-all internode data exchanges (also called global transposes). These communication steps indeed dominate execution time. In this paper, we present a mathematical framework from which many single-all-to-all and easy-to-implement 1-D FFT algorithms can be derived. For large-scale problems, our implementation can be twice as fast as leading FFT libraries on state-of-the-art computer clusters. Moreover, our framework allows tradeoff between accuracy and performance, further boosting performance if reduced accuracy is acceptable.

PUBLICATION RECORD

  • Publication year

    2012

  • Venue

    International Conference for High Performance Computing, Networking, Storage and Analysis

  • Publication date

    2012-11-10

  • Fields of study

    Computer Science, Engineering

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-35 of 35 references · Page 1 of 1

CITED BY

Showing 1-20 of 20 citing papers · Page 1 of 1