Performance-Aligned LLMs for Generating Fast Code

Daniel Nichols,Pranav Polasam,Harshitha Menon,Aniruddha Marathe,T. Gamblin,A. Bhatele

Published 2024 in arXiv.org

ABSTRACT

Optimizing scientific software is a difficult task because codebases are often large and complex, and performance can depend upon several factors including the algorithm, its implementation, and hardware among others. Causes of poor performance can originate from disparate sources and be difficult to diagnose. Recent years have seen a multitude of work that use large language models (LLMs) to assist in software development tasks. However, these tools are trained to model the distribution of code as text, and are not specifically designed to understand performance aspects of code. In this work, we introduce a reinforcement learning based methodology to align the outputs of code LLMs with performance. This allows us to build upon the current code modeling capabilities of LLMs and extend them to generate better performing code. We demonstrate that our fine-tuned model improves the expected speedup of generated code over base models for a set of benchmark tasks from 0.9 to 1.6 for serial code and 1.9 to 4.5 for OpenMP code.

PUBLICATION RECORD

Publication year
2024
Venue
arXiv.org
Publication date
2024-04-29
Fields of study
Computer Science
Identifiers
DOI 10.48550/arXiv.2404.18864 arXiv 2404.18864
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Secrets of RLHF in Large Language Models Part II: Reward Modeling
2024cited by this paper
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
2024cited by this paper
OMPGPT: A Generative Pre-trained Transformer Model for OpenMP
2024cited by this paper
DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence
2024cited by this paper
Can Large Language Models Write Parallel Code?
2024influential reference
HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM
2023cited by this paper
An Analysis of the Automatic Bug Fixing Performance of ChatGPT
2023cited by this paper
An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation
2023cited by this paper
ChatGPT Prompt Patterns for Improving Code Quality, Refactoring, Requirements Elicitation, and Software Design
2023cited by this paper
GPT-4 Technical Report
2023cited by this paper
ChatGPT outperforms crowd workers for text-annotation tasks
2023cited by this paper
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators
2023cited by this paper
Teaching Large Language Models to Self-Debug
2023cited by this paper
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
2023cited by this paper
StarCoder: may the source be with you!
2023cited by this paper
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
2023influential reference
Faster sorting algorithms discovered using deep reinforcement learning
2023cited by this paper
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
2023cited by this paper
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
2023cited by this paper
LM4HPC: Towards Effective Language Model Application in High-Performance Computing
2023cited by this paper
LongCoder: A Long-Range Pre-trained Language Model for Code Completion
2023cited by this paper
Modeling Parallel Programs using Large Language Models
2023influential reference
A Comprehensive Overview of Large Language Models
2023cited by this paper
Llama 2: Open Foundation and Fine-Tuned Chat Models
2023cited by this paper
Data Race Detection Using Large Language Models
2023cited by this paper
Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation
2023cited by this paper
HPC-GPT: Integrating Large Language Model for High-Performance Computing
2023cited by this paper
Magicoder: Source Code Is All You Need
2023influential reference
Repairing Bugs in Python Assignments Using Large Language Models
2022cited by this paper
Piloting Copilot, Codex, and StarCoder2: Hot temperature, cold prompts, or black magic?
2022cited by this paper
The Stack: 3 TB of permissively licensed source code
2022cited by this paper
Competition-level code generation with AlphaCode
2022influential reference
Learning to Reduce False Positives in Analytic Bug Detectors
2022cited by this paper
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
2022cited by this paper
Learning code summarization from a small and local dataset
2022cited by this paper
Grounded Copilot: How Programmers Interact with Code-Generating Models
2022cited by this paper
Can we learn from developer mistakes? Learning to localize and repair real bugs from real bug fixes
2022cited by this paper
CodeT: Code Generation with Generated Tests
2022cited by this paper
Automatic Code Documentation Generation Using GPT-3
2022cited by this paper
Assemble Foundation Models for Automatic Code Summarization
2022cited by this paper
Training language models to follow instructions with human feedback
2022influential reference
Break-It-Fix-It: Unsupervised Learning for Program Repair
2021cited by this paper
Evaluating Large Language Models Trained on Code
2021influential reference
Program Synthesis with Large Language Models
2021influential reference
A Transformer-based Approach for Source Code Summarization
2020cited by this paper
The Curious Case of Neural Text Degeneration
2019cited by this paper
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
2019cited by this paper
Fine-Tuning Language Models from Human Preferences
2019cited by this paper
Transformers
2018cited by this paper
Attention is All you Need
2017cited by this paper
Semantic Similarity Metrics for Evaluating Source Code Summarization
2017cited by this paper
Fixing Weight Decay Regularization in Adam
2017cited by this paper
Proximal Policy Optimization Algorithms
2017cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper

CITED BY

Improving HPC Code Generation Capability of LLMs via Online Reinforcement Learning with Real-Machine Benchmark Rewards
2026influential citation
RioCC: Efficient and Accurate Class-Level Code Recommendation Based on Deep Code Clone Detection.
2026cites this paper
Reinforcement Learning vs Supervised Learning: A tug of war to generate refactored code accurately
2025cites this paper
Industrial LLM-based Code Optimization under Regulation: A Mixture-of-Agents Approach
2025cites this paper
Agnostics: Learning to Code in Any Programming Language via Reinforcement with a Universal Learning Environment
2025cites this paper
Integrating Performance Tools in Model Reasoning for GPU Kernel Optimization
2025cites this paper
SwiftSolve: A Self-Iterative, Complexity-Aware Multi-Agent Framework for Competitive Programming
2025cites this paper
HPCAgentTester: a Multi-Agent LLM Approach for Enhanced HPC Unit Test Generation
2025cites this paper
PerfCoder: Large Language Models for Interpretable Code Performance Optimization
2025cites this paper
Comprehensive Evaluation of LLMs in HPC Code Performance Optimization
2025cites this paper
KernelBench: Can LLMs Write Efficient GPU Kernels?
2025cites this paper
Do Large Language Models Understand Performance Optimization?
2025cites this paper
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
2025cites this paper
Can Large Language Models Predict Parallel Code Performance?
2025cites this paper
Leveraging AI for Productive and Trustworthy HPC Software: Challenges and Research Directions
2025cites this paper
Generative AI and Large Language Models for Cyber Security: All Insights You Need
2024cites this paper
HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages
2024cites this paper
A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications
2024cites this paper
PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback
2024cites this paper
LLM4VV: Exploring LLM-as-a-Judge for Validation and Verification Testsuites
2024cites this paper
Dynamic Scoring Code Token Tree: A Novel Decoding Strategy for Generating High-Performance Code
2024influential citation
AutoParLLM: GNN-guided Context Generation for Zero-Shot Code Parallelization using LLMs
2023cites this paper