Recent years have witnessed increasing attention on the semantic knowledge integration between curated knowledge bases (CKBs) and open knowledge bases (OKBs), which is non-trivial due to the intrinsically heterogeneous features involved in CKBs and OKBs. OKB canonicalization and OKB linking are regarded as two vital tasks to achieve the knowledge integration. Although these two tasks are inherently complementary with each other, previous studies just solve them separately or via superficial interaction. To address this issue, we propose CLUE+, a novel framework that jointly encodes the OKB and CKB into a unified embedding space, to tackle OKB canonicalization and OKB linking simultaneously and make them benefit each other reciprocally. We design an expectation-maximization (EM) based approach to iteratively refine the unified embedding space via performing seed generation and embedding refinement alternately, by leveraging the deep interaction between OKB canonicalization and OKB linking. Curriculum learning is employed to yield high-quality canonicalization seeds and linking seeds adaptively, according to two elaborately designed metrics (i.e., a margin-based linking metric and an entropy-based cluster metric). Additionally, active learning is incorporated to further complement the seed generation process by selectively annotating the most informative noun phrases within low-quality clusters, driven by an innovative acquisition function comprising three key criteria (i.e., uncertainty, diversity and specificity). A thorough experimental study over two public benchmark data sets demonstrates that our proposed CLUE+ consistently outperforms state-of-the-art baselines for the task of OKB canonicalization (resp. OKB linking) in terms of average F1 (resp. accuracy).
Actively Learning Unified Embeddings for Joint Open Knowledge Base Canonicalization and Linking
Binhan Yang,Junqing Gong,Wei Shen,Yinan Liu,Guoliang Li
Published 2026 in IEEE Transactions on Knowledge and Data Engineering
ABSTRACT
PUBLICATION RECORD
- Publication year
2026
- Venue
IEEE Transactions on Knowledge and Data Engineering
- Publication date
2026-01-01
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-53 of 53 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1