Most studies on course assignment in higher education rely primarily on administrative criteria or heuristic optimization, focusing on workload balance or availability constraints. However, they rarely account for the semantic alignment between course content and professors’ academic profiles. Existing models typically frame course assignment as integer or mixed linear programming problems, overlooking the rich textual information embedded in course plans, faculty training, experience, and publications. Although some authors acknowledge the relevance of professors’ preferences, few have systematically incorporated them into their models in combination with semantic similarity. This gap limits the assignment process in terms of both fairness and transparency, particularly in universities where courses and faculty profiles are highly specialized and heterogeneous. To address this gap, we propose a model that integrates natural language processing (NLP) techniques with faculty preferences for course assignment. First, three topic modeling methods—Latent Dirichlet Allocation (LDA), Top2Vec, and BERTopic—were compared using UMASS, coefficient of variation (Cv), and normalized pointwise mutual information (NPMI) coherence metrics; BERTopic achieved the best performance. Next, three sentence transformers (multi-qa-mpnet-base-dot-v1, all-MiniLM-L6-v2, and all-mpnet-base-v2) were evaluated with cosine similarity, with multi-qa-mpnet-base-dot-v1 selected for its superior embeddings. Standard preprocessing steps (case normalization, stopword removal, lemmatization) were applied before generating semantic representations. A weighted similarity score combined semantic similarity (70%) with professor preferences (30%). Five assignment strategies were then tested under identical conditions: manual, greedy, Hungarian algorithm, similarity-threshold (0.65), and a hybrid Hungarian + threshold approach. The hybrid method was selected for its balance of accuracy and feasibility. Finally, two versions of the model—with and without preferences—were compared to assess the impact of incorporating professor preferences. All strategies were evaluated under identical conditions using precision, recall, and F1-score on a dataset of 42 courses and 35 professors. The hybrid strategy combining the Hungarian algorithm with a similarity threshold (0.65) performed best, achieving precision = 1.00 (100%), recall = 0.2736 (27.36%), and F1-score = 0.4296 (42.96%). The threshold-based method also reached perfect precision (1.00), with recall = 0.2925 and F1-score = 0.4525. The Hungarian algorithm alone obtained 0.8286, 0.2736, and 0.4113, respectively. The Greedy method performed less well (0.7143, 0.2358, 0.3546). Human-made assignments showed the lowest performance, with precision = 0.0227, recall = 0.0094, and F1-score = 0.0133.
Optimizing course assignment in higher education using natural language processing and semantic similarity
Gabriel Cotera-Ramírez,Jaime Meza,Sebastián Ventura
Published 2026 in PeerJ Computer Science
ABSTRACT
PUBLICATION RECORD
- Publication year
2026
- Venue
PeerJ Computer Science
- Publication date
2026-02-04
- Fields of study
Not labeled
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-62 of 62 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1