Unsupervised Detection of Anomalous Commits in Software Repositories

Nafiseh Soveizi,Tomás Candeias,M. Zivkovic,Zhiming Zhao

Published 2025 in 2025 IEEE International Conference on Software Services Engineering (SSE)

ABSTRACT

Identifying anomalous commits is essential for maintaining software quality and reliability, as these anomalies can indicate potential issues in code, development practices, or repository management. Current anomaly detection methods typically rely on prede-fined rules or supervised learning, which suffer from limitations such as dependence on labeled datasets, rigid rule definitions, and high maintenance overhead in rapidly evolving repositories. This paper introduces a novel unsupervised framework for effectively detecting anomalous commits without requiring labeled data or rigid rules, providing a scalable and adaptable solution to enhance code quality in modern version control systems. To address the high-dimensional and mul-tifaceted nature of commit data, our approach com-bines dimensionality reduction techniques with tar-geted feature engineering, enhancing both precision and adaptability in anomaly detection. We systematically evaluate three state-of-the-art unsupervised techniques-Local Outlier Factor (LOF), Isolation Forest (IF), and Histogram-Based Outlier Score (HBOS)-across five diverse open-source repositories. Our results demonstrate that Isolation Forest achieves the highest detection accuracy, effectively balancing precision and recall while capturing both global and local anomalies. Additionally, expert validation confirms the practical relevance of our approach, providing insights into frequent and high-impact anomalies encountered in real-world repositories.

PUBLICATION RECORD

  • Publication year

    2025

  • Venue

    2025 IEEE International Conference on Software Services Engineering (SSE)

  • Publication date

    2025-07-07

  • Fields of study

    Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-28 of 28 references · Page 1 of 1

CITED BY

  • No citing papers are available for this paper.

Showing 0-0 of 0 citing papers · Page 1 of 1