Optimizing course assignment in higher education using natural language processing and semantic similarity

Gabriel Cotera-Ramírez,Jaime Meza,Sebastián Ventura

Published 2026 in PeerJ Computer Science

ABSTRACT

Most studies on course assignment in higher education rely primarily on administrative criteria or heuristic optimization, focusing on workload balance or availability constraints. However, they rarely account for the semantic alignment between course content and professors’ academic profiles. Existing models typically frame course assignment as integer or mixed linear programming problems, overlooking the rich textual information embedded in course plans, faculty training, experience, and publications. Although some authors acknowledge the relevance of professors’ preferences, few have systematically incorporated them into their models in combination with semantic similarity. This gap limits the assignment process in terms of both fairness and transparency, particularly in universities where courses and faculty profiles are highly specialized and heterogeneous. To address this gap, we propose a model that integrates natural language processing (NLP) techniques with faculty preferences for course assignment. First, three topic modeling methods—Latent Dirichlet Allocation (LDA), Top2Vec, and BERTopic—were compared using UMASS, coefficient of variation (Cv), and normalized pointwise mutual information (NPMI) coherence metrics; BERTopic achieved the best performance. Next, three sentence transformers (multi-qa-mpnet-base-dot-v1, all-MiniLM-L6-v2, and all-mpnet-base-v2) were evaluated with cosine similarity, with multi-qa-mpnet-base-dot-v1 selected for its superior embeddings. Standard preprocessing steps (case normalization, stopword removal, lemmatization) were applied before generating semantic representations. A weighted similarity score combined semantic similarity (70%) with professor preferences (30%). Five assignment strategies were then tested under identical conditions: manual, greedy, Hungarian algorithm, similarity-threshold (0.65), and a hybrid Hungarian + threshold approach. The hybrid method was selected for its balance of accuracy and feasibility. Finally, two versions of the model—with and without preferences—were compared to assess the impact of incorporating professor preferences. All strategies were evaluated under identical conditions using precision, recall, and F1-score on a dataset of 42 courses and 35 professors. The hybrid strategy combining the Hungarian algorithm with a similarity threshold (0.65) performed best, achieving precision = 1.00 (100%), recall = 0.2736 (27.36%), and F1-score = 0.4296 (42.96%). The threshold-based method also reached perfect precision (1.00), with recall = 0.2925 and F1-score = 0.4525. The Hungarian algorithm alone obtained 0.8286, 0.2736, and 0.4113, respectively. The Greedy method performed less well (0.7143, 0.2358, 0.3546). Human-made assignments showed the lowest performance, with precision = 0.0227, recall = 0.0094, and F1-score = 0.0133.

PUBLICATION RECORD

Publication year
2026
Venue
PeerJ Computer Science
Publication date
2026-02-04
Fields of study
Not labeled
Identifiers
DOI 10.7717/peerj-cs.3557
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Hybridization and artificial intelligence in optimizing university examination timetabling problem: A systematic review
2025cited by this paper
Architecting Immersive Education: Designing an Intelligent Online Virtual University
2024cited by this paper
Measuring the Matches Between University Courses: Ontological Comparison-Based Approach
2024cited by this paper
Advancements in Natural Language Processing: Implications, Challenges, and Future Directions
2024cited by this paper
Natural Language Processing of Student's Feedback to Instructors: A Systematic Review
2024cited by this paper
Integrated Assessment of Teaching Efficacy: A Natural Language Processing Approach
2023cited by this paper
A Large Language Model Approach to Educational Survey Feedback Analysis
2023cited by this paper
Enhancing API Documentation through BERTopic Modeling and Summarization
2023cited by this paper
A decade of research into the application of big data and analytics in higher education: A systematic review of the literature
2023cited by this paper
Exploring the evolution and applications of natural language processing in education
2023cited by this paper
Mathematical Modeling and Exact Optimizing of University Course Scheduling Considering Preferences of Professors
2023cited by this paper
A Review of the Trends and Challenges in Adopting Natural Language Processing Methods for Education Feedback Analysis
2023cited by this paper
Do Financial Investment, Disciplinary Differences, and Level of Development Impact on the Efficiency of Resource Allocation in Higher Education: Evidence from China
2023cited by this paper
BERTopic: Neural topic modeling with a class-based TF-IDF procedure
2022cited by this paper
An Exploratory Analysis of GSDMM and BERTopic on Short Text Topic Modelling
2022cited by this paper
University Class Schedule Assignment by a Tabu Search Algorithm
2022cited by this paper
Understanding Students’ Perception of Sustainability: Educational NLP in the Analysis of Free Answers
2022cited by this paper
Current Approaches and Applications in Natural Language Processing
2022cited by this paper
Utilizing a Pretrained Language Model (BERT) to Classify Preservice Physics Teachers’ Written Reflections
2022cited by this paper
A mixed-integer programming approach for solving university course timetabling problems
2022cited by this paper
A Survey of University Course Timetabling Problem: Perspectives, Trends and Opportunities
2021cited by this paper
A general ontological timetabling-model driven metaheuristics approach based on elite solutions
2021cited by this paper
A Compromise Programming for Multi-Objective Task Assignment Problem
2021cited by this paper
University Course Timetabling Problem with Professor Assignment
2021cited by this paper
Neural labeled LDA: a topic model for semi-supervised document classification
2021cited by this paper
Machine learning in medicine: a practical introduction to natural language processing
2021cited by this paper
A mathematical programming tool for an efficient decision-making on teaching assignment under non-regular time schedules
2021cited by this paper
A Mathematical Model for Course Timetabling Problem With Faculty-Course Assignment Constraints
2021cited by this paper
The Analysis of Unbalanced Assignment Problems Using The Kotwal-Dhope Method To Develop A Massive Open Online Course
2020cited by this paper
Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models
2020cited by this paper
An AI-Based Methodology for the Automatic Classification of a Multiclass Ebook Collection Using Information From the Tables of Contents
2020cited by this paper
The Department Team Teaching Assignment Problem Using Zero-One Integer Programming
2020cited by this paper
Aspect-based sentiment analysis of reviews in the domain of higher education
2020cited by this paper
Integer Linear Programming for the Tutor Allocation Problem: A Practical Case in a British University
2020cited by this paper
A systematic mapping study on solving university timetabling problems using meta-heuristic algorithms
2020cited by this paper
Top2Vec: Distributed Representations of Topics
2020cited by this paper
Using a Multiplatform Chatbot as an Online Tutor in a University Course
2020cited by this paper
Interactive Planning of Competency-Driven University Teaching Staff Allocation
2020cited by this paper
A framework to assess higher education faculty workload in U.S. universities
2020cited by this paper
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
2019cited by this paper
An Assignment Problem and Its Application in Education Domain: A Review and Potential Path
2018cited by this paper
Automatic distractor generation for multiple-choice English vocabulary questions
2018cited by this paper
Evaluation Metrics and Evaluation
2018cited by this paper
University staff teaching allocation: formulating and optimising a many-objective problem
2017cited by this paper
Optimizing Student Learning: A Faculty-Course Assignment Problem Using Linear Programming
2017cited by this paper
Natural language processing: state of the art, current trends and challenges
2017cited by this paper
A MILP model for the teacher assignment problem considering teachers' preferences
2016cited by this paper
Exploring the Space of Topic Coherence Measures
2015cited by this paper
Teaching load allocation in a teaching unit: Optimizing equity and quality
2014cited by this paper
Natural Language Processing and its Use in Education
2014cited by this paper
Recommender systems survey
2013cited by this paper
Optimizing Semantic Coherence in Topic Models
2011cited by this paper
Promoting Ranking Diversity for Biomedical Information Retrieval Based on LDA
2011cited by this paper
Introduction to Recommender Systems Handbook
2011cited by this paper
Normalized (pointwise) mutual information in collocation extraction
2009cited by this paper
Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions
2005cited by this paper
Latent Dirichlet Allocation
2003cited by this paper
Hybrid Recommender Systems: Survey and Experiments
2002cited by this paper
A Survey of Automated Timetabling
1999cited by this paper
Fundamentals of Algorithmics
1995cited by this paper
Algorithms for the Assignment and Transportation Problems
1957cited by this paper
The Hungarian method for the assignment problem
1955cited by this paper

CITED BY

No citing papers are available for this paper.