Forestrank: Automatic Keyphrase Extraction Leveraging Random Forest Classifier and Multi-Criteria Decision-Making

Published 2025 in 2025 5th Asia Conference on Information Engineering (ACIE)

ABSTRACT

Considering the rapidly expanding area of textual data, extracting meaningful insights is a significant challenge. With our novel method for automatic keyphrase extraction, which integrates natural language processing (NLP) and the Random Forest classification model, we offer a unique synthesis of strengths. For the purpose of obtaining clean, meaningful, and labeled noun phrases, we begin by preprocessing documents, utilizing standard NLP techniques, fuzzy strings for similarity measurement, and data labeling. To determine a document's significance, we evaluate multiple feature scores. The Random Forest classifier is used to identify the most relevant features, and Feature Importance scores are calculated using Gini Impurity. Then, we apply these top features to extract keyphrases from noun phrases by using the Weighted Aggregated Sum Product Assessment (WASPAS), MCDM technique. We use noun phrases as alternatives, criteria as the best features, and weights as the Feature Importance scores. Our method is evaluated by measuring accuracy with both the initial set of features and the refined set identified by the Random Forest classifier with a similarity threshold of 0.75. We evaluate our method using SemEval 2017 Task 10, where n counts are 5, 10, and 15 (top nranked predicted keyphrases matched with actual keyphrases). Comparing the results of the selected features to those of all the features, it is evident that the selected features are more accurate. In addition, we compared ForestRank with state-of-the-art methods, and our method outperformed them.

PUBLICATION RECORD

Publication year
2025
Venue
2025 5th Asia Conference on Information Engineering (ACIE)
Publication date
2025-01-10
Fields of study
Not labeled
Identifiers
DOI 10.1109/ACIE64499.2025.00017
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Unsupervised technical phrase extraction by incorporating structure and position information
2024cited by this paper
AdaptiveUKE: Towards adaptive unsupervised keyphrase extraction with gated topic modeling
2024cited by this paper
Transformers in the Real World: A Survey on NLP Applications
2023cited by this paper
Learning to extract from multiple perspectives for neural keyphrase extraction
2023cited by this paper
RoBERTa: A Robustly Optimized BERT Pretraining Approach
2019cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
YAKE! Collection-Independent Automatic Keyword Extractor
2018cited by this paper
PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents
2017cited by this paper
SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications
2017cited by this paper
Automatic keyword extraction for wikification of East Asian language documents
2016cited by this paper
GloVe: Global Vectors for Word Representation
2014cited by this paper
Webpage Keyword Extraction Using Term Frequency
2013cited by this paper
An Extended Keyword Extraction Method
2012cited by this paper
Automatic Keyword Extraction from Documents Using Conditional Random Fields
2008cited by this paper
Keyword Extraction Using Support Vector Machine
2006cited by this paper
Domain-specific keyphrase extraction
2005cited by this paper
TextRank: Bringing Order into Text
2004cited by this paper
Using TF-IDF to Determine Word Relevance in Document Queries
2003cited by this paper
Random Forests
2001cited by this paper
Learning Algorithms for Keyphrase Extraction
2000cited by this paper
Term-Weighting Approaches in Automatic Text Retrieval
1988cited by this paper
Language-independent extractive automatic text summarization based on automatic keyword extraction
year unknowncited by this paper

CITED BY

No citing papers are available for this paper.