Improving Spoken Language Understanding by Wisdom of Crowds

Koichiro Yoshino,Kana Ikeuchi,Katsuhito Sudoh,Satoshi Nakamura

Published 2020 in International Conference on Computational Linguistics

ABSTRACT

Spoken language understanding (SLU), which converts user requests in natural language to machine-interpretable expressions, is becoming an essential task. The lack of training data is an important problem, especially for new system tasks, because existing SLU systems are based on statistical approaches. In this paper, we proposed to use two sources of the “wisdom of crowds,” crowdsourcing and knowledge community website, for improving the SLU system. We firstly collected paraphrasing variations for new system tasks through crowdsourcing as seed data, and then augmented them using similar questions from a knowledge community website. We investigated the effects of the proposed data augmentation method in SLU task, even with small seed data. In particular, the proposed architecture augmented more than 120,000 samples to improve SLU accuracies.

PUBLICATION RECORD

Publication year
2020
Venue
International Conference on Computational Linguistics
Publication date
2020-12-01
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.18653/V1/2020.COLING-MAIN.234
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Filtering Noisy Dialogue Corpora by Connectivity and Content Relatedness
2020cited by this paper
Machine Speech Chain
2020cited by this paper
Overview of the sixth dialog system technology challenge: DSTC6
2019cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019influential reference
RoBERTa: A Robustly Optimized BERT Pretraining Approach
2019cited by this paper
Training Neural Response Selection for Task-Oriented Dialogue Systems
2019cited by this paper
An Incremental Turn-Taking Model For Task-Oriented Dialog Systems
2019cited by this paper
FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance
2019cited by this paper
Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems
2019cited by this paper
Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax
2019cited by this paper
Robust Spoken Language Understanding via Paraphrasing
2018cited by this paper
Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding
2018cited by this paper
Data Augmentation for Neural Online Chats Response Selection
2018cited by this paper
Data Augmentation for Spoken Language Understanding via Joint Variational Generation
2018cited by this paper
DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension
2018cited by this paper
Incremental Parsing with Minimal Features Using Bi-Directional LSTM
2016cited by this paper
The Dialog State Tracking Challenge Series
2014influential reference
Distributed Representations of Sentences and Documents
2014cited by this paper
Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning
2014cited by this paper
The third Dialog State Tracking Challenge
2014cited by this paper
Incorporating semantic information to selection of web texts for language model of spoken dialogue system
2013cited by this paper
Distributed Representations of Words and Phrases and their Compositionality
2013cited by this paper
Incremental Relabeling for Active Learning with Noisy Crowdsourced Annotations
2011cited by this paper
Language model adaptation using WWW documents obtained by utterance-based queries
2010cited by this paper
Bootstrapping Language Models for Spoken Dialog Systems From The World Wide Web
2006cited by this paper
A bootstrapping approach for developing language model of new spoken dialogue systems by selecting web texts
2006cited by this paper
Rapid language model development using external resources for new spoken dialog domains
2005cited by this paper
Web-data augmented language models for Mandarin conversational speech recognition
2005cited by this paper
Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class-Dependent Mixtures
2003cited by this paper
Expanding the Scope of the ATIS Task: The ATIS-3 Corpus
1994cited by this paper
A vector space model for automatic indexing
1975cited by this paper
Distributional Structure
1954cited by this paper

CITED BY

No citing papers are available for this paper.