Impacts of Data Splitting Strategies on Parameterized Link Prediction Algorithms

Xinshan Jiao,Yuxin Luo,Yilin Bi,Tao Zhou

Published 2025 in Unknown venue

ABSTRACT

Link prediction is a fundamental problem in network science, aiming to infer potential or missing links based on observed network structures. With the increasing adoption of parameterized models, the rigor of evaluation protocols has become critically important. However, a previously common practice of using the test set during hyperparameter tuning has led to human-induced information leakage, thereby inflating the reported model performance. To address this issue, this study introduces a novel evaluation metric, Loss Ratio, which quantitatively measures the extent of performance overestimation. We conduct large-scale experiments on 60 real-world networks across six domains. The results demonstrate that the information leakage leads to an average overestimation about 3.6\%, with the bias reaching over 15\% for specific algorithms. Meanwhile, heuristic and random-walk-based methods exhibit greater robustness and stability. The analysis uncovers a pervasive information leakage issue in link prediction evaluation and underscores the necessity of adopting standardized data splitting strategies to enable fair and reproducible benchmarking of link prediction models.

PUBLICATION RECORD

Publication year
2025
Venue
Unknown venue
Publication date
2025-11-08
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 2511.05834
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Comparing discriminating abilities of evaluation metrics in link prediction
2024cited by this paper
Inconsistency among evaluation metrics in link prediction
2024cited by this paper
Quantifying discriminability of evaluation metrics in link prediction for real networks
2024cited by this paper
A comprehensive survey of link prediction methods
2023cited by this paper
Graph Neural Networks for Link Prediction with Subgraph Sketching
2022cited by this paper
Discriminating Abilities of Threshold-Free Evaluation Metrics in Link Prediction
2022cited by this paper
Machine learning for medical imaging: methodological failures and recommendations for the future
2022cited by this paper
Variational Graph Normalized AutoEncoders
2021cited by this paper
Progresses and challenges in link prediction
2021cited by this paper
Recommender Systems
2021cited by this paper
Link prediction techniques, applications, and performance: A survey
2020cited by this paper
OpenBioLink: a benchmarking framework for large-scale biomedical link prediction
2019cited by this paper
Temporal Link Prediction: A Survey
2019cited by this paper
How Powerful are Graph Neural Networks?
2018cited by this paper
Link Prediction Based on Graph Neural Networks
2018cited by this paper
Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure
2017cited by this paper
Inductive Representation Learning on Large Graphs
2017cited by this paper
Graph Attention Networks
2017cited by this paper
Semi-Supervised Classification with Graph Convolutional Networks
2016influential reference
A Survey of Link Prediction in Complex Networks
2016cited by this paper
node2vec: Scalable Feature Learning for Networks
2016cited by this paper
Complex Embeddings for Simple Link Prediction
2016cited by this paper
Toward link predictability of complex networks
2015cited by this paper
The Network Data Repository with Interactive Graph Analytics and Visualization
2015cited by this paper
Embedding Entities and Relations for Learning and Inference in Knowledge Bases
2014cited by this paper
DeepWalk: online learning of social representations
2014cited by this paper
Link prediction in social networks: the state-of-the-art
2014cited by this paper
From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks
2013cited by this paper
KONECT: the Koblenz network collection
2013cited by this paper
Translating Embeddings for Modeling Multi-relational Data
2013cited by this paper
Using community information to improve the precision of link prediction methods
2012cited by this paper
Leakage in data mining: formulation, detection, and avoidance
2011cited by this paper
On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation
2010cited by this paper
Link prediction based on local random walk
2010influential reference
Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement
2010cited by this paper
Link Prediction in Complex Networks: A Survey
2010cited by this paper
Learning spectral graph transformations for link prediction
2009cited by this paper
Predicting missing links via local information
2009cited by this paper
A survey of cross-validation procedures for model selection
2009cited by this paper
Missing and spurious interactions and the reconstruction of complex networks
2009cited by this paper
Information filtering based on transferring similarity.
2008cited by this paper
Bias in error estimation when using cross-validation for model selection
2006cited by this paper
Vertex similarity in networks.
2005cited by this paper
Friends and neighbors on the Web
2003cited by this paper
The link prediction problem for social networks
2003cited by this paper
Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms
1998cited by this paper
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection
1995cited by this paper
A new status index derived from sociometric analysis
1953cited by this paper

CITED BY

Domain matters: Towards domain-informed evaluation for link prediction
2025cites this paper