Multi-Step LLM Pipeline for Enhancing TTP Extraction in Cyber Threat Intelligence

Hyoung Rok Kim,Donghyeon Lee,Insup Lee,Soohan Lee,Sangjin Lee

Published 2025 in IEEE Access

ABSTRACT

Tactics, techniques, and procedures (TTPs) are essential for modeling adversary behavior and supporting cyber defense operations. Despite their importance, most cyber threat intelligence (CTI) is provided in unstructured formats, making automated TTP extraction challenging. While manual identification is labor-intensive, current automated approaches suffer from limited accuracy and coverage. To address these challenges, we present a novel multi-step framework based on large language models (LLMs) for extracting MITRE ATT&CK techniques from raw CTI document. Our framework consists of three components: an LLM-based Extractor for extracting procedure-level threat actions, an embedding-driven Technique Candidate Generator for retrieving semantically relevant technique candidates, and a Validator that ranks candidate techniques by likelihood using LLM inference to refine final predictions and reduce false positives. Experimental results on the benchmark dataset demonstrate that our approach significantly outperforms existing baselines, achieving an F1-score of 82.28%, thereby validating its effectiveness. Additionally, the modularity of our framework allows seamless integration of future LLMs, suggesting continual performance gains as foundation models evolve.

PUBLICATION RECORD

Publication year
2025
Venue
IEEE Access
Publication date
Unknown publication date
Fields of study
Computer Science, Engineering
Identifiers
DOI 10.1109/ACCESS.2025.3622350
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence
2024influential reference
SEvenLLM: Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence
2024cited by this paper
When LLMs meet cybersecurity: a systematic literature review
2024cited by this paper
Automated discovery and mapping ATT&CK tactics and techniques for unstructured cyber threat intelligence
2024influential reference
TTPXHunter: Actionable Threat Intelligence Extraction as TTPs from Finished Cyber Threat Reports
2024influential reference
The Faiss library
2024cited by this paper
A Comprehensive Survey on Advanced Persistent Threat (APT) Detection Techniques
2024cited by this paper
Advancing TTP Analysis: Harnessing the Power of Large Language Models with Retrieval Augmented Generation
2023cited by this paper
A Survey of Large Language Models
2023cited by this paper
GPT-4 Technical Report
2023influential reference
Looking Beyond IoCs: Automatically Extracting Attack Patterns from External CTI
2022cited by this paper
Automatic Mapping of Unstructured Cyber Threat Intelligence: An Experimental Study: (Practical Experience Report)
2022influential reference
SecureBERT: A Domain-Specific Language Model for Cybersecurity
2022cited by this paper
From Threat Reports to Continuous Threat Intelligence: A Comparison of Attack Technique Extraction Methods from Textual Artifacts
2022influential reference
AttacKG: Constructing Technique Knowledge Graph from Cyber Threat Intelligence Reports
2021cited by this paper
A review of threat modelling approaches for APT-style attacks
2021cited by this paper
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
2021cited by this paper
What Are the Attackers Doing Now? Automating Cyberthreat Intelligence Extraction from Text on Pace with the Changing Threat Landscape: A Survey
2021cited by this paper
CyBERT: Contextualized Embeddings for the Cybersecurity Domain
2021cited by this paper
Language Models are Few-Shot Learners
2020cited by this paper
Automated Retrieval of ATT&CK Tactics and Techniques for Cyber Threat Reports
2020cited by this paper
Project
2019cited by this paper
SciBERT: A Pretrained Language Model for Scientific Text
2019cited by this paper
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
2019cited by this paper
A comparative analysis of incident reporting formats
2018cited by this paper
TTPDrill: Automatic and Accurate Extraction of Threat Actions from Unstructured Text of CTI Sources
2017cited by this paper
FeatureSmith: Automatically Engineering Features for Malware Detection by Mining the Security Literature
2016cited by this paper
for Cyber Security
2012cited by this paper
Mining Multi-label Data
2010cited by this paper
Intelligence-Driven Computer Network Defense Informed by Analysis of Adversary Campaigns and Intrusion Kill Chains
2010cited by this paper
Guide to Intrusion Detection and Prevention Systems (IDPS)
2007cited by this paper
The MITRE corporation
1961cited by this paper

CITED BY

Chimera-RL: An End-to-End Autonomous Red-Teaming Framework for LLM Applications
2026cites this paper