In high-performance computing on distributed-memory systems, communication often represents a significant part of the overall execution time. The relative cost of communication will certainly continue to rise as compute-density growth follows the current technology and industry trends. Design of lower-communication alternatives to fundamental computational algorithms has become an important field of research. For distributed 1-D FFT, communication cost has hitherto remained high as all industry-standard implementations perform three all-to-all internode data exchanges (also called global transposes). These communication steps indeed dominate execution time. In this paper, we present a mathematical framework from which many single-all-to-all and easy-to-implement 1-D FFT algorithms can be derived. For large-scale problems, our implementation can be twice as fast as leading FFT libraries on state-of-the-art computer clusters. Moreover, our framework allows tradeoff between accuracy and performance, further boosting performance if reduced accuracy is acceptable.
A framework for low-communication 1-D FFT
P. T. P. Tang,Jongsoo Park,Daehyun Kim,V. Petrov
Published 2012 in International Conference for High Performance Computing, Networking, Storage and Analysis
ABSTRACT
PUBLICATION RECORD
- Publication year
2012
- Venue
International Conference for High Performance Computing, Networking, Storage and Analysis
- Publication date
2012-11-10
- Fields of study
Computer Science, Engineering
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-35 of 35 references · Page 1 of 1
CITED BY
Showing 1-20 of 20 citing papers · Page 1 of 1