‘specleanr': an R package for automated flagging of environmental outliers in ecological data for modeling workflows

Anthony Basooma,A. Schmidt‐Kloiber,Sami Domisch,Yusdiel Torres‐Cambas,Marija Smederevac‐Lalić,Vanessa Bremerich,P. Meulenbroek,Martin Tschikof,Andrea Funk,Thomas Hein,F. Borgwardt

Published 2025 in Ecography

ABSTRACT

Developing species distribution models (SDMs) requires high‐quality species occurrence records. These records, stemming from various sources with different sampling procedures, are often archived in open‐access databases, making automated data quality checks inevitable. Temporal, geographic, and taxonomic quality checks are usually conducted in SDM workflows, but checking for records distant in environmental space, i.e. outliers, is often ignored. Here, we present ‘specleanr', an R package that contains 20 outlier detection methods (ODMs) that can be ensembled to identify potential outliers in environmental predictors. These methods are categorized into 1) species‐specific ecological range, 2) univariate, and 3) multivariate ODMs. All potential outliers flagged by the different methods are pooled to identify absolute outliers (records appearing in multiple methods). The local regression (LOESS) method is then used to automatically set a threshold that optimally identifies the absolute outliers. Additionally, clustering records into poor, fair, moderate, very strong, and perfect outliers, as well as non‐outliers, is possible based on each record's likelihood as a potential outlier, which allows expert assessment. We demonstrated the approach to 15 fish species from the Danube River Basin, including native, alien, threatened, and common species. We fitted SDMs using bioclimatic and hydromorphological parameters. We compared the model area under the curve (AUC) before and after outlier removal using three scenarios: 1) the LOESS method, 2) removing very strong outliers, and 3) removing perfect outliers. The results showed a significant improvement in the model AUC, with generally small to moderate effect sizes after outlier removal. ‘specleanr' is generalizable across taxonomic groups, data types, ecological realms, and geographic regions. Beyond SDMs, it can also be broadly used in general data analysis where outlier detection is essential. We provide detailed vignettes to support package use. ‘specleanr' offers a user‐friendly and reproducible approach for handling outliers in biogeographical modeling and general data analysis workflows.

PUBLICATION RECORD

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-47 of 47 references · Page 1 of 1