Recent chip multiprocessors incorporate several on-chip accelerators, marking the beginning of the Accelerated Chip Multi-Processor (XMP) era in datacenters. Despite the close proximity of accelerators and general-purpose cores, offloading functions to accelerators may not always be beneficial. Offloading to hardware accelerators can introduce several end-to-end overheads that can negate the speedup of the accelerable function. In this article, we design RACER, a hardware architecture and runtime system that evades the danger of end-to-end slowdowns when using hardware acceleration. RACER leverages a low-overhead interface between general-purpose cores and on-chip accelerators, fine-grained context switching, accelerator-initiated preemption, and seamless data motion between general-purpose cores and accelerators to improve the performance of workloads that use on-chip accelerators. We evaluate RACER on five representative request processing workloads featuring diverse memory access patterns, accelerable functions, and compute intensities. RACER improves the performance of hardware acceleration on a real XMP by an average of 1.31× on a range of diverse workloads and guarantees that accelerator offloads never cause slowdowns.
RACER: Avoiding End-to-End Slowdowns in Accelerated Chip Multi-Processors
Neel Patel,Ren Wang,Mohammad Alian
Published 2025 in ACM Transactions on Architecture and Code Optimization (TACO)
ABSTRACT
PUBLICATION RECORD
- Publication year
2025
- Venue
ACM Transactions on Architecture and Code Optimization (TACO)
- Publication date
2025-07-24
- Fields of study
Computer Science, Engineering
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-42 of 42 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1