Construction of a Japanese Word Similarity Dataset

Published 2017 in International Conference on Language Resources and Evaluation

ABSTRACT

An evaluation of distributed word representation is generally conducted using a word similarity task and/or a word analogy task. There are many datasets readily available for these tasks in English. However, evaluating distributed representation in languages that do not have such resources (e.g., Japanese) is difficult. Therefore, as a first step toward evaluating distributed representations in Japanese, we constructed a Japanese word similarity dataset. To the best of our knowledge, our dataset is the first resource that can be used to evaluate distributed representations in Japanese. Moreover, our dataset contains various parts of speech and includes rare words in addition to common words.

PUBLICATION RECORD

Publication year
2017
Venue
International Conference on Language Resources and Evaluation
Publication date
2017-03-17
Fields of study
Linguistics, Computer Science
Identifiers
arXiv 1703.05916
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints
2017cited by this paper
SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity
2016cited by this paper
Counter-fitting Word Vectors to Linguistic Constraints
2016cited by this paper
Controlled and Balanced Dataset for Japanese Lexical Simplification
2016influential reference
Unsupervised Morphology Induction Using Word Embeddings
2015cited by this paper
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
2015cited by this paper
Representation Based Translation Evaluation Metrics
2015cited by this paper
Judgment Language Matters: Multilingual Vector Space Models for Judgment Language Aware Lexical Semantics
2015cited by this paper
SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation
2014cited by this paper
An Unsupervised Model for Instance Level Subcategorization Acquisition
2014cited by this paper
Distributed Representations of Words and Phrases and their Compositionality
2013cited by this paper
Better Word Representations with Recursive Neural Networks for Morphology
2013cited by this paper
Improving Word Representations via Global Context and Multiple Word Prototypes
2012influential reference
Investigations on Word Senses and Word Usages
2009cited by this paper
Development of the Japanese WordNet
2008influential reference
Placing search in context: the concept revisited
2002influential reference
WordNet: A Lexical Database for English
1995cited by this paper
Contextual correlates of semantic similarity
1991cited by this paper
Contextual correlates of synonymy
1965cited by this paper

CITED BY

MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs
2025cites this paper
Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models
2022cites this paper
JWSAN: Japanese word similarity and association norm
2021influential citation
Multi-SimLex: A Large-Scale Evaluation of Multilingual and Crosslingual Lexical Semantic Similarity
2020cites this paper
Manual Clustering and Spatial Arrangement of Verbs for Multilingual Evaluation and Typology Analysis
2020cites this paper
Word Similarity Datasets for Thai: Construction and Evaluation
2019influential citation
A survey of semantic relatedness evaluation datasets and procedures
2019cites this paper
Unsupervised Learning of Style-sensitive Word Vectors
2018cites this paper
Learning Style-sensitive Word Vector via Unsupervised-manner
2018cites this paper
Segmentation-free compositional n-gram embedding
2018cites this paper
Subcharacter Information in Japanese Embeddings: When Is It Worth It?
2018cites this paper
ACL 2018 Relevance of Linguistic Structure in Neural Architectures for NLP
2018cites this paper
Semantically Readable Distributed Representation Learning and Its Expandability Using a Word Semantic Vector Dictionary
2018cites this paper