Blended, precise semantic program embeddings

Published 2020 in ACM-SIGPLAN Symposium on Programming Language Design and Implementation

ABSTRACT

Learning neural program embeddings is key to utilizing deep neural networks in program languages research --- precise and efficient program representations enable the application of deep models to a wide range of program analysis tasks. Existing approaches predominately learn to embed programs from their source code, and, as a result, they do not capture deep, precise program semantics. On the other hand, models learned from runtime information critically depend on the quality of program executions, thus leading to trained models with highly variant quality. This paper tackles these inherent weaknesses of prior approaches by introducing a new deep neural network, Liger, which learns program representations from a mixture of symbolic and concrete execution traces. We have evaluated Liger on two tasks: method name prediction and semantics classification. Results show that Liger is significantly more accurate than the state-of-the-art static model code2seq in predicting method names, and requires on average around 10x fewer executions covering nearly 4x fewer paths than the state-of-the-art dynamic model DYPRO in both tasks. Liger offers a new, interesting design point in the space of neural program embeddings and opens up this new direction for exploration.

PUBLICATION RECORD

Publication year
2020
Venue
ACM-SIGPLAN Symposium on Programming Language Design and Implementation
Publication date
2020-06-06
Fields of study
Computer Science
Identifiers
DOI 10.1145/3385412.3385999
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Learning Scalable and Precise Representation of Program Semantics
2019influential reference
COSET: A Benchmark for Evaluating Neural Program Embeddings
2019influential reference
code2vec: learning distributed representations of code
2018influential reference
Path-based function embedding and its application to error-handling specification mining
2018cited by this paper
code2seq: Generating Sequences from Structured Representations of Code
2018cited by this paper
Code vectors: understanding programs through embedded abstracted symbolic traces
2018cited by this paper
Dynamic Neural Program Embedding for Program Repair
2017influential reference
Attention is All you Need
2017cited by this paper
Learning to Represent Programs with Graphs
2017cited by this paper
DeepFix: Fixing Common C Language Errors by Deep Learning
2017cited by this paper
sk_p: a neural program corrector for MOOCs
2016cited by this paper
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
2015cited by this paper
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
2015influential reference
Attention-Based Models for Speech Recognition
2015cited by this paper
End-to-end attention-based large vocabulary speech recognition
2015cited by this paper
Multiple Object Recognition with Visual Attention
2014cited by this paper
Convolutional Neural Networks over Tree Structures for Programming Language Processing
2014cited by this paper
Distributed Representations of Sentences and Documents
2014cited by this paper
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
2014cited by this paper
Fast and Robust Neural Network Joint Models for Statistical Machine Translation
2014cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Recurrent Models of Visual Attention
2014cited by this paper
Distributed Representations of Words and Phrases and their Compositionality
2013cited by this paper
Efficient Estimation of Word Representations in Vector Space
2013cited by this paper
Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach
2011cited by this paper
Randoop: feedback-directed random testing for Java
2007cited by this paper
A Neural Probabilistic Language Model
2003cited by this paper
A neural probabilistic language model
2003cited by this paper
Recurrent Neural Networks: Design and Applications
1999cited by this paper

CITED BY

Complexity-Based Code Embeddings
2026cites this paper
Trace Gadgets: Minimizing Code Context for Machine Learning-Based Vulnerability Prediction
2025influential citation
Reimagining Unit Test Generation With AI: A Journey From Evolutionary Models to Transformers
2025influential citation
Loupe: End-to-End Learning of Loop Unrolling Heuristics for Abstract Interpretation
2025cites this paper
A Survey of Learning-based Method Name Prediction
2025cites this paper
Deep Learning Representations of Programs: A Systematic Literature Review
2025cites this paper
Combining Structured Static Code Information and Dynamic Symbolic Traces for Software Vulnerability Prediction
2024cites this paper
Debugging convergence problems in probabilistic programs via program representation learning with SixthSense
2024cites this paper
Revolutionizing Software Development: Autonomous Software Evolution
2024cites this paper
Evaluating the Effectiveness of Deep Learning Models for Foundational Program Analysis Tasks
2024cites this paper
Deep learning based identification of inconsistent method names: How far are we?
2024cites this paper
Detecting Source Code Vulnerabilities using High-Precision Code Representation and Bimodal Contrastive Learning
2024cites this paper
Exploiting Code Symmetries for Learning Program Semantics
2023cites this paper
TRACED: Execution-Aware Pre-Training for Source Code
2023cites this paper
Demystifying What Code Summarization Models Learned
2023cites this paper
Discrete Adversarial Attack to Models of Code
2023cites this paper
TraceFixer: Execution Trace-Driven Program Repair
2023cites this paper
Learning Deep Semantics for Test Completion
2023cites this paper
LExecutor: Learning-Guided Execution
2023cites this paper
Learning Approximate Execution Semantics From Traces for Binary Function Similarity
2023cites this paper
Abstract Syntax Tree for Method Name Prediction: How Far Are We?
2023influential citation
Learning Generalizable Program and Architecture Representations for Performance Modeling
2023cites this paper
An Explanation Method for Models of Code
2023cites this paper
A text classification approach to API type resolution for incomplete code snippets
2023cites this paper
NeuDep: neural binary memory dependence analysis
2022influential citation
Statically Identifying XSS using Deep Learning
2022cites this paper
VUDENC: Vulnerability Detection with Deep Learning on a Natural Codebase for Python
2022cites this paper
SelfAPR: Self-supervised Program Repair with Test Execution Diagnostics
2022cites this paper
CODE-MVP: Learning to Represent Source Code from Multiple Views with Contrastive Pre-Training
2022cites this paper
Exploring GNN Based Program Embedding Technologies for Binary Related Tasks
2022cites this paper
Represent Code as Action Sequence for Predicting Next Method Call
2022cites this paper
SymLM: Predicting Function Names in Stripped Binaries via Context-Sensitive Execution-Aware Code Embeddings
2022cites this paper
sem2vec: Semantics-aware Assembly Tracelet Embedding
2022cites this paper
Code Quality Prediction Under Super Extreme Class Imbalance
2022cites this paper
Graph Neural Networks Based Memory Inefficiency Detection Using Selective Sampling
2022cites this paper
CURE: Code-Aware Neural Machine Translation for Automatic Program Repair
2021cites this paper
A Context-Based Automated Approach for Method Name Consistency Checking and Suggestion
2021cites this paper
Demystifying Code Summarization Models
2021cites this paper
Nalin: learning from Runtime Behavior to Find Name-Value Inconsistencies in Jupyter Notebooks
2021cites this paper
A Comparison of Code Embeddings and Beyond
2021cites this paper
Lightweight global and local contexts guided method name recommendation with prior knowledge
2021cites this paper
WheaCha: A Method for Explaining the Predictions of Code Summarization Models
2021cites this paper
ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations
2021cites this paper
TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer
2021cites this paper
Self-Supervised Bug Detection and Repair
2021cites this paper
Neural Program Repair with Execution-based Backpropagation
2021cites this paper
How could Neural Networks understand Programs?
2021cites this paper
WheaCha: A Method for Explaining the Predictions of Models of Code
2021cites this paper
A Survey on Heap Analysis
2020cites this paper
On the Generalizability of Neural Program Analyzers with respect to Semantic-Preserving Program Transformations
2020cites this paper
On the generalizability of Neural Program Models with respect to semantic-preserving program transformations
2020cites this paper
InferCode: Self-Supervised Learning of Code Representations by Predicting Subtrees
2020cites this paper
Learning semantic program embeddings with graph interval neural network
2020influential citation
AlgoLabel: A Large Dataset for Multi-Label Classification of Algorithmic Challenges
2020cites this paper
Deep Data Flow Analysis
2020cites this paper
GRAPHSPY: Fused Program Semantic-Level Embedding via Graph Neural Networks for Dead Store Detection
2020influential citation
Neural software analysis
2020cites this paper
Towards demystifying dimensions of source code embeddings
2020cites this paper
Program Embeddings for Rapid Mechanism Evaluation
year unknowncites this paper