Large Language Models (LLMs) have shown remarkable capabilities in question answering across various domains, yet their effectiveness in ecological knowledge remains underexplored. Understanding their potential to recall and synthesize ecological information is crucial as AI tools become increasingly integrated into scientific workflows. Here, we assess the ecological knowledge of two LLMs, Gemini 1.5 Pro and GPT-4o, across a suite of ecologically focused tasks. These tasks evaluate an LLM’s ability to predict species presence, generate range maps, list critically endangered species, classify threats, and estimate species traits. We introduce a new benchmark dataset to quantify LLM performance against expert-derived data. While the LLMs tested outperform naive baselines, achieving around 20 percent-age points higher accuracy in species presence prediction, they reach only a third of the mean F1 score for range map generation and improve threat classification by just around 10 points over random guessing. These results highlight both the promise and challenges of applying LLMs in ecology. Our findings suggest that domain-specific fine-tuning is necessary to improve eco-logical knowledge in LLMs. By providing a repeatable evaluation framework, our benchmark dataset will facilitate future research in this area, helping to refine AI applications for ecological science.
Large language models possess some ecological knowledge, but how much?
Filip Dorm,Joseph W. Millard,Drew Purves,Michael Harfoot,Oisin Mac Aodha
Published 2026 in bioRxiv
ABSTRACT
PUBLICATION RECORD
- Publication year
2026
- Venue
bioRxiv
- Publication date
2026-02-04
- Fields of study
Biology, Computer Science, Environmental Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-40 of 40 references · Page 1 of 1
CITED BY
Showing 1-8 of 8 citing papers · Page 1 of 1