MalDy: Portable, data-driven malware detection using natural language processing and machine learning techniques on behavioral analysis reports

Published 2018 in Digital Investigation. The International Journal of Digital Forensics and Incident Response

ABSTRACT

Abstract In response to the volume and sophistication of malicious software or malware, security investigators rely on dynamic analysis for malware detection to thwart obfuscation and packing issues. Dynamic analysis is the process of executing binary samples to produce reports that summarise their runtime behaviors. The investigator uses these reports to detect malware and attribute threat types leveraging manually chosen features. However, the diversity of malware and the execution environments make manual approaches not scalable because the investigator needs to manually engineer fingerprinting features for new environments. In this paper, we propose, MalDy (mal die), a portable (plug and play) malware detection and family threat attribution framework using supervised machine learning techniques. The key idea of MalDy portability is the modeling of the behavioral reports into a sequence of words, along with advanced natural language processing (NLP) and machine learning (ML) techniques for automatic engineering of relevant security features to detect and attribute malware without the investigator intervention. More precisely, we propose to use bag-of-words (BoW) NLP model to formulate the behavioral reports. Afterward, we build ML ensembles on top of BoW features. We extensively evaluate MalDy on various datasets from different platforms (Android and Win32) and execution environments. The evaluation shows the effectiveness and the portability of MalDy across the spectrum of the analyses and settings.

PUBLICATION RECORD

Publication year
2018
Venue
Digital Investigation. The International Journal of Digital Forensics and Incident Response
Publication date
2018-12-26
Fields of study
Computer Science
Identifiers
DOI 10.1016/J.DIIN.2019.01.017 arXiv 1812.10327
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Automatic Investigation Framework for Android Malware Cyber-Infrastructures
2018cited by this paper
MalDozer: Automatic framework for android malware detection using deep learning
2018influential reference
ToGather: Automatic Investigation of Android Malware Cyber-Infrastructures
2018cited by this paper
Malrec: Compact Full-Trace Malware Recording for Retrospective Deep Analysis
2018cited by this paper
Android Malware Detection using Deep Learning on API Method Sequences
2017cited by this paper
Cypider: building community-based cyber-defense infrastructure for android malware detection
2016cited by this paper
IntelliDroid: A Targeted Input Generator for the Dynamic Analysis of Android Malware
2016cited by this paper
Glassbox: Dynamic Analysis Platform for Malware Android Applications on Real Devices
2016cited by this paper
Automated Dynamic Analysis of Ransomware: Benefits, Limitations and use for Detection
2016cited by this paper
DySign: dynamic fingerprinting for the automatic detection of android malware
2016cited by this paper
StormDroid: A Streaminglized Machine Learning-Based System for Detecting Android Malware
2016influential reference
Dynalog: an automated dynamic analysis framework for characterizing android applications
2016cited by this paper
UNVEIL: A large-scale, automated approach to detecting ransomware (keynote)
2016cited by this paper
MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models
2016cited by this paper
I find your behavior disturbing: Static and dynamic app behavioral analysis for detection of Android malware
2016cited by this paper
Fingerprinting Android packaging: Generating DNAs for malware detection
2016cited by this paper
AndroZoo: Collecting Millions of Android Apps for the Research Community
2016cited by this paper
Needles in a Haystack: Mining Information from Public Dynamic Analysis Sandboxes for Malware Intelligence
2015cited by this paper
DROIT: Dynamic Alternation of Dual-Level Tainting for Malware Analysis
2015cited by this paper
DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket
2014influential reference
DroidDolphin: a dynamic Android malware detection framework using big data and machine learning
2014cited by this paper
MutantX-S: Scalable Malware Clustering Based on Static Features
2013cited by this paper
Dissecting Android Malware: Characterization and Evolution
2012cited by this paper
Automatic analysis of malware behavior using machine learning
2011influential reference
Kernel-based Behavior Analysis for Android Malware Detection
2011cited by this paper
A comparative assessment of malware classification using binary texture analysis and dynamic analysis
2011cited by this paper
Feature hashing for large scale multitask learning
2009cited by this paper
Effective and Efficient Malware Detection at the End Host
2009cited by this paper
Hash Kernels
2009cited by this paper
Scalable, Behavior-Based Malware Clustering
2009cited by this paper
Toward Automated Dynamic Malware Analysis Using CWSandbox
2007cited by this paper
N-gram-based detection of new malicious code
2004influential reference

CITED BY

A multimodal approach for windows malware detection using comprehensive analysis on called APIs
2026cites this paper
Hybrid feature extraction and integrated deep learning for cloud-based malware detection
2025cites this paper
Leveraging Fine-Tuned LightGBM for Advanced AI-Driven Android Malware Detection
2025cites this paper
Ransomware Detection Using Printable Strings
2025cites this paper
Demystifying Feature Engineering in Malware Analysis of API Call Sequences
2025cites this paper
A systematic review on insider threat detection using natural language processing
2025cites this paper
Data-Driven Incident Response: Enhancing Detection and Containment Through Adversarial Reasoning and Malware Behavior Analytics
2025cites this paper
Efficient malware detection using NLP and deep learning model
2025cites this paper
ViTGuard: a synergistic approach to malware detection using vision transformers and genetic algorithms optimization
2025cites this paper
Multimodal Windows Malware Detection via Hybrid Analysis and Enriched Graphs: Effectiveness and Explainability
2025cites this paper
DawnGNN: Documentation augmented windows malware detection using graph neural network
2024cites this paper
Android malware detection and identification frameworks by leveraging the machine and deep learning techniques: A comprehensive review
2024cites this paper
IMCNN:Intelligent Malware Classification using Deep Convolution Neural Networks as Transfer learning and ensemble learning in honeypot enabled organizational network
2024cites this paper
Enhanced Malware Detection in Distributed IoT Environment Using Optimized Cascaded LSTM-GRU Framework
2024cites this paper
EarlyMalDetect: A Novel Approach for Early Windows Malware Detection Based on Sequences of API Calls
2024cites this paper
CAG-Malconv: A Byte-Level Malware Detection Method With CBAM and Attention-GRU
2024cites this paper
ML-Based Behavioral Malware Detection Is Far From a Solved Problem
2024cites this paper
Leveraging deep learning and image conversion of executable files for effective malware detection: A static malware analysis approach
2024cites this paper
Cross-Platform Malware Classification: Fusion of CNN and GRU Models
2024cites this paper
Machine Learning Enables Malware Detection and Classification Techniques
2024cites this paper
Clustering android ransomware families using fuzzy hashing similarities
2024cites this paper
Metamorphic Malware and Obfuscation: A Survey of Techniques, Variants, and Generation Kits
2023cites this paper
Artificial Intelligence-Based Malware Detection, Analysis, and Mitigation
2023cites this paper
Exploring the Potential of Cyber Manufacturing System in the Digital Age
2023cites this paper
DeepCall: A Fast and Robust Malware Classification System with DGCNN and Function Call Graph
2023cites this paper
Considerations for Using Artificial Intelligence to Manage Authorized Push Payment (APP) Scams
2023cites this paper
A Survey on Malware Attacks Analysis and Detected
2023cites this paper
API-MalDetect: Automated malware detection framework for windows based on API calls and deep learning techniques
2023cites this paper
Decoding the Secrets of Machine Learning in Malware Classification: A Deep Dive into Datasets, Feature Extraction, and Model Performance
2023cites this paper
Threat Hunting System for Protecting Critical Infrastructures Using a Machine Learning Approach
2023cites this paper
Malware Analysis of Cryptojacks: Crackonosh and Winstar Nss Miner
2023cites this paper
A Comprehensive Survey on IoT Attacks: Taxonomy, Detection Mechanisms and Challenges
2023cites this paper
DFRWS EU 10-Year Review and Future Directions in Digital Forensic Research
2023cites this paper
A systematic literature review on Windows malware detection: Techniques, research issues, and future directions
2023cites this paper
Humans vs. Machines in Malware Classification
2023cites this paper
MalAnalyser: An effective and efficient Windows malware detection method based on API call sequences
2023cites this paper
Android-IoT Malware Classification and Detection Approach Using Deep URL Features Analysis
2023influential citation
Unsupervised Learning Approaches for Construction of Malware Families
2022cites this paper
Embedding vector generation based on function call graph for effective malware detection and classification
2022cites this paper
An Attribute Extraction for Automated Malware Attack Classification and Detection Using Soft Computing Techniques
2022cites this paper
A Deep Learning Approach for Identifying User Interest from Targeted Advertising
2022cites this paper
Identification of malware families using stacking of textural features and machine learning
2022cites this paper
MalDetConv: Automated Behaviour-based Malware Detection Framework Based on Natural Language Processing and Deep Learning Techniques
2022cites this paper
An Effective Malware Detection Method Using Hybrid Feature Selection and Machine Learning Algorithms
2022cites this paper
A malware detection system using a hybrid approach of multi-heads attention-based control flow traces and image visualization
2022cites this paper
Detecting malware using text documents extracted from spam email through machine learning
2022cites this paper
A Malware Family Classification Method Based on the Point Cloud Model DGCNN
2021cites this paper
Introduction
2021cites this paper
Hybrid sequence‐based Android malware detection using natural language processing
2021cites this paper
Naive bayes-correlation based feature weighting technique for sports match result prediction
2021cites this paper
PetaDroid: Adaptive Android Malware Detection Using Deep Learning
2021cites this paper
Resilient and Adaptive Framework for Large Scale Android Malware Fingerprinting using Deep Learning and NLP Techniques
2021cites this paper
AI-Driven Cybersecurity: An Overview, Security Intelligence Modeling and Research Directions
2021cites this paper
Deep learning feature exploration for Android malware detection
2021cites this paper
“Dirclustering”: a semantic clustering approach to optimize website structure discovery during penetration testing
2021cites this paper
Static Analysis of Malware in Android-based Platforms: A Progress Study
2021cites this paper
Malware Detection Using Ensemble N-gram Opcode Sequences
2021cites this paper
A study on malicious software behaviour analysis and detection techniques: Taxonomy, current trends and challenges
2021cites this paper
Conclusion
2021cites this paper
A Comprehensive Review on Malware Detection Approaches
2020cites this paper
Artificial Intelligence Security Threat, Crime, and Forensics: Taxonomy and Open Issues
2020cites this paper
Scalable and robust unsupervised Android malware fingerprinting using community-based network partitioning
2020cites this paper
Adaptive Machine learning: A Framework for Active Malware Detection
2020cites this paper
Revisión del estado del arte en técnicas de procesamiento de lenguaje natural para análisis de malware
2020cites this paper
A Survey of different machine learning models for static and dynamic malware detection
2020cites this paper
Android Malware Clustering using Community Detection on Android Packages Similarity Network
2020cites this paper
Using a Subtractive Center Behavioral Model to Detect Malware
2020cites this paper
K-NEAREST NEIGHBOUR CLASSIFIER USAGE FOR PERMISSION BASED MALWARE DETECTION IN ANDROID
2020cites this paper
Neurlux: dynamic malware analysis without feature engineering
2019influential citation
Malware Detection Using Multilevel Ensemble Supervised Learning
2019cites this paper
Identifying Malicious Software Using Deep Residual Long-Short Term Memory
2019cites this paper
Classification of malicious process using high‐level activity based dynamic analysis
2019cites this paper