We present a new implementation of the Floyd-Warshall All-Pairs Shortest Paths algorithm on CUDA. Our algorithm runs approximately 5 times faster than the previously best reported algorithm. In order to achieve this speedup, we applied a new technique to reduce usage of on-chip shared memory and allow the CUDA scheduler to more effectively hide instruction latency.
A Multi-Stage CUDA Kernel for Floyd-Warshall
Published 2010 in arXiv.org
ABSTRACT
PUBLICATION RECORD
- Publication year
2010
- Venue
arXiv.org
- Publication date
2010-01-22
- Fields of study
Mathematics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-9 of 9 references · Page 1 of 1
CITED BY
Showing 1-26 of 26 citing papers · Page 1 of 1