Considering the rapidly expanding area of textual data, extracting meaningful insights is a significant challenge. With our novel method for automatic keyphrase extraction, which integrates natural language processing (NLP) and the Random Forest classification model, we offer a unique synthesis of strengths. For the purpose of obtaining clean, meaningful, and labeled noun phrases, we begin by preprocessing documents, utilizing standard NLP techniques, fuzzy strings for similarity measurement, and data labeling. To determine a document's significance, we evaluate multiple feature scores. The Random Forest classifier is used to identify the most relevant features, and Feature Importance scores are calculated using Gini Impurity. Then, we apply these top features to extract keyphrases from noun phrases by using the Weighted Aggregated Sum Product Assessment (WASPAS), MCDM technique. We use noun phrases as alternatives, criteria as the best features, and weights as the Feature Importance scores. Our method is evaluated by measuring accuracy with both the initial set of features and the refined set identified by the Random Forest classifier with a similarity threshold of 0.75. We evaluate our method using SemEval 2017 Task 10, where n counts are 5, 10, and 15 (top nranked predicted keyphrases matched with actual keyphrases). Comparing the results of the selected features to those of all the features, it is evident that the selected features are more accurate. In addition, we compared ForestRank with state-of-the-art methods, and our method outperformed them.
Forestrank: Automatic Keyphrase Extraction Leveraging Random Forest Classifier and Multi-Criteria Decision-Making
Published 2025 in 2025 5th Asia Conference on Information Engineering (ACIE)
ABSTRACT
PUBLICATION RECORD
- Publication year
2025
- Venue
2025 5th Asia Conference on Information Engineering (ACIE)
- Publication date
2025-01-10
- Fields of study
Not labeled
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-22 of 22 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1