We present NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text in the articles. We collect this dataset through a four-stage process designed to solicit exploratory questions that require reasoning. Analysis confirms that NewsQA demands abilities beyond simple word matching and recognizing textual entailment. We measure human performance on the dataset and compare it to several strong neural models. The performance gap between humans and machines (13.3% F1) indicates that significant progress can be made on NewsQA through future research. The dataset is freely available online.
NewsQA: A Machine Comprehension Dataset
Adam Trischler,Tong Wang,Xingdi Yuan,Justin Harris,Alessandro Sordoni,Philip Bachman,Kaheer Suleman
Published 2016 in Rep4NLP@ACL
ABSTRACT
PUBLICATION RECORD
- Publication year
2016
- Venue
Rep4NLP@ACL
- Publication date
2016-11-29
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
LINKED PAPERS
- SQuAD: 100,000+ Questions for Machine Comprehension of Text
- reading comprehension related to · NewsQA is described as a machine comprehension dataset of question-answer pairs with span answers, which matches the reading-comprehension task of answering questions grounded in a passage.
- BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
- reading comprehension related to · NewsQA is described as a machine comprehension dataset of question-answer pairs with span answers, which matches the reading-comprehension task of answering questions grounded in a passage.
CLAIMS
CONCEPTS
- exploratory questions
Questions intended to probe article understanding and require reasoning rather than direct lookup.
뀨 (7c402c1b98) extractionAnonymous (12632b8b5f) review - f1 score
The overlap-based evaluation metric used to compare human and model answers.
Aliases: F1
뀨 (7c402c1b98) extractionAnonymous (12632b8b5f) review - four-stage process
The four-step annotation workflow used to collect the dataset from crowdworkers.
뀨 (7c402c1b98) extractionAnonymous (12632b8b5f) review - newsqa
A machine comprehension dataset of over 100,000 human-generated question-answer pairs from CNN news articles, with answers selected as text spans.
뀨 (7c402c1b98) extractionAnonymous (12632b8b5f) review - performance gap between humans and machines
The measured difference in F1 between human answers and model answers on NewsQA.
Aliases: performance gap
뀨 (7c402c1b98) extractionAnonymous (12632b8b5f) review - simple word matching and recognizing textual entailment
Shallow lexical matching and entailment-style inference abilities mentioned as comparison points in the analysis.
Aliases: word matching, textual entailment
뀨 (7c402c1b98) extractionAnonymous (12632b8b5f) review - strong neural models
Neural machine comprehension systems used as comparison baselines in the paper.
Aliases: neural models
뀨 (7c402c1b98) extractionAnonymous (12632b8b5f) review
REFERENCES
Showing 1-26 of 26 references · Page 1 of 1