A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics, and Benchmark Datasets

Chengchang Zeng,Shaobo Li,Qin Li,Jie Hu,Jianjun Hu

Published 2020 in Applied Sciences

ABSTRACT

Machine Reading Comprehension (MRC) is a challenging Natural Language Processing (NLP) research field with wide real-world applications. The great progress of this field in recent years is mainly due to the emergence of large-scale datasets and deep learning. At present, a lot of MRC models have already surpassed human performance on various benchmark datasets despite the obvious giant gap between existing MRC models and genuine human-level reading comprehension. This shows the need for improving existing datasets, evaluation metrics, and models to move current MRC models toward “real” understanding. To address the current lack of comprehensive survey of existing MRC tasks, evaluation metrics, and datasets, herein, (1) we analyze 57 MRC tasks and datasets and propose a more precise classification method of MRC tasks with 4 different attributes; (2) we summarized 9 evaluation metrics of MRC tasks, 7 attributes and 10 characteristics of MRC datasets; (3) We also discuss key open issues in MRC research and highlighted future research directions. In addition, we have collected, organized, and published our data on the companion website where MRC researchers could directly access each MRC dataset, papers, baseline projects, and the leaderboard.

PUBLICATION RECORD

Publication year
2020
Venue
Applied Sciences
Publication date
2020-06-21
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.3390/app10217640 arXiv 2006.11880
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

The Process of Question Answering
2022cited by this paper
A Survey on Machine Reading Comprehension Systems
2020influential reference
5分で分かる!? 有名論文ナナメ読み：Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding
2020influential reference
Conversational Machine Comprehension: a Literature Review
2020cited by this paper
Deep Learning--based Text Classification
2020cited by this paper
Measuring text readability with machine comprehension: a pilot study
2019cited by this paper
Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets
2019influential reference
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
2019influential reference
Can Machines Learn to Comprehend Scientific Literature?
2019influential reference
QA with Wiki : improving information retrieval and machine comprehension
2019cited by this paper
DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension
2019influential reference
SG-Net: Syntax-Guided Machine Reading Comprehension
2019cited by this paper
RoBERTa: A Robustly Optimized BERT Pretraining Approach
2019cited by this paper
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
2019cited by this paper
Two Forms of Knowledge Representations in the Human Brain
2019cited by this paper
Machine Reading Comprehension: a Literature Review
2019influential reference
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches
2019cited by this paper
Neural Machine Reading Comprehension: Methods and Trends
2019influential reference
XLNet: Generalized Autoregressive Pretraining for Language Understanding
2019cited by this paper
A Survey on Neural Machine Reading Comprehension
2019influential reference
Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches
2019cited by this paper
Natural Questions: A Benchmark for Question Answering Research
2019influential reference
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
2018influential reference
A Deep Cascade Model for Multi-Document Reading Comprehension
2018cited by this paper
ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension
2018cited by this paper
ReviewQA: a relational aspect-based opinion reading dataset
2018influential reference
Stochastic Answer Networks for SQuAD 2.0
2018cited by this paper
Tourism Review Sentiment Classification Using a Bidirectional Recurrent Neural Network with an Attention Mechanism and Topic-Enriched Word Vectors
2018cited by this paper
DeepPatent: patent classification with convolutional neural networks and word embedding
2018cited by this paper
Multilingual Extractive Reading Comprehension by Runtime Machine Translation
2018cited by this paper
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
2018influential reference
Interpretation of Natural Language Rules in Conversational Machine Reading
2018influential reference
Multi-Passage Machine Reading Comprehension with Cross-Passage Answer Verification
2018cited by this paper
RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes
2018influential reference
QuAC: Question Answering in Context
2018influential reference
SciTaiL: A Textual Entailment Dataset from Science Question Answering
2018cited by this paper
CoQA: A Conversational Question Answering Challenge
2018influential reference
Know What You Don’t Know: Unanswerable Questions for SQuAD
2018influential reference
Neural reading comprehension and beyond
2018influential reference
Deep Contextualized Word Representations
2018influential reference
MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge
2018influential reference
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
2018influential reference
CliCR: a Dataset of Clinical Case Reports for Machine Reading Comprehension
2018influential reference
Tracking State Changes in Procedural Text: a Challenge Dataset and Models for Process Paragraph Comprehension
2018influential reference
Stochastic Answer Networks for Natural Language Inference
2018cited by this paper
DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension
2018influential reference
Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences
2018influential reference
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
2018influential reference
Neural Approaches to Conversational AI
2018cited by this paper
Weaver: Deep Co-Encoding of Questions and Documents for Machine Reading
2018cited by this paper
Stochastic Answer Networks for Machine Reading Comprehension
2017cited by this paper
Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension
2017influential reference
Crowdsourcing Multiple Choice Science Questions
2017cited by this paper
Quasar: Datasets for Question Answering by Search and Reading
2017influential reference
The NarrativeQA Reading Comprehension Challenge
2017influential reference
Recent Trends in Deep Learning Based Natural Language Processing
2017cited by this paper
Reading Wikipedia to Answer Open-Domain Questions
2017influential reference
RACE: Large-scale ReAding Comprehension Dataset From Examinations
2017cited by this paper
Adversarial Examples for Evaluating Reading Comprehension Systems
2017cited by this paper
Constructing Datasets for Multi-hop Reading Comprehension Across Documents
2017influential reference
SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine
2017influential reference
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
2017influential reference
Evaluation Metrics for Machine Reading Comprehension: Prerequisite Skills and Readability
2017cited by this paper
Document-Level Multi-Aspect Sentiment Classification as Machine Comprehension
2017cited by this paper
Large-scale Cloze Test Dataset Created by Teachers
2017influential reference
Key-Value Memory Networks for Directly Reading Documents
2016influential reference
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
2016influential reference
Building Large Machine Reading-Comprehension Datasets using Paragraph Vectors
2016influential reference
Embracing data abundance: BookTest Dataset for Reading Comprehension
2016influential reference
WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia
2016influential reference
ConceptNet 5.5: An Open Multilingual Graph of General Knowledge
2016influential reference
NewsQA: A Machine Comprehension Dataset
2016influential reference
The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives
2016influential reference
An Analysis of Prerequisite Skills for Reading Comprehension
2016influential reference
SQuAD: 100,000+ Questions for Machine Comprehension of Text
2016influential reference
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
2016cited by this paper
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
2016influential reference
Bidirectional Attention Flow for Machine Comprehension
2016influential reference
The LAMBADA dataset: Word prediction requiring a broad discourse context
2016influential reference
Reading Pictures for Story Comprehension Requires Mental Imagery Skills
2016influential reference
Who did What: A Large-Scale Person-Centered Cloze Dataset
2016influential reference
Machine Comprehension with Discourse Relations
2015cited by this paper
The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations
2015influential reference
VQA: Visual Question Answering
2015influential reference
Teaching Machines to Read and Comprehend
2015cited by this paper
WikiQA: A Challenge Dataset for Open-Domain Question Answering
2015influential reference
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
2015influential reference
MovieQA: Understanding Stories in Movies through Question-Answering
2015influential reference
A large annotated corpus for learning natural language inference
2015cited by this paper
Machine Comprehension with Syntax, Frames, and Semantics
2015cited by this paper
Modeling of the Question Answering Task in the YodaQA System
2015influential reference
Learning Answer-Entailing Structures for Machine Comprehension
2015cited by this paper
Cognitive Neuroscience of Language
2014influential reference
MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text
2013influential reference
A Survey of Crowdsourcing Systems
2011cited by this paper
Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems
2011influential reference
METEOR-NEXT and the METEOR Paraphrase Tables: Improved Evaluation Support for Five Target Languages
2010cited by this paper
Embodied Cognition: Lessons from Linguistic Determinism
2010cited by this paper
Information Retrieval
2008cited by this paper

CITED BY

Multi-choice machine reading comprehension benchmark datasets: A survey
2026cites this paper
Exploring the potential of gamified reading: the effects of duolingo on L2 reading, self-efficacy, and learner experiences in a Chinese university EFL context
2025cites this paper
Resolving passage ambiguity in machine reading comprehension using lightweight transformer architectures
2025cites this paper
Exploring unanswerability in machine reading comprehension: approaches, benchmarks, and open challenges
2025cites this paper
OutfitAI: Novel Architecture to Leverage Generative AI for Immersive Fashion E-Commerce Solutions
2025cites this paper
A comprehensive evaluation of large language models for information extraction from unstructured electronic health records in residential aged care
2025cites this paper
Integrated Survey Classification and Trend Analysis via LLMs: An Ensemble Approach for Robust Literature Synthesis
2025cites this paper
UQuAD+: Benchmark Dataset for Urdu Machine Reading Comprehension
2025cites this paper
Hajj-FQA: A benchmark Arabic dataset for developing question-answering systems on Hajj fatwas
2025cites this paper
An Entity Linking Agent for Question Answering
2025cites this paper
A Substring Extraction-Based RAG Method for Minimising Hallucinations in Aircraft Maintenance Question Answering
2025cites this paper
Absolute Evaluation Measures for Machine Learning: A Survey
2025cites this paper
IoT Based Health Monitoring with Diet, Exercise and Calories recommendation Using Machine Learning
2025cites this paper
Automatic evaluation and enhancement of reading strategies in English reading comprehension based on the BERT model
2025cites this paper
Hybrid large language model approach for prompt and sensitive defect management: A comparative analysis of hybrid, non-hybrid, and GraphRAG approaches
2025cites this paper
ThaiMRC: A Comprehensive Corpus for Advancing Machine Reading Comprehension in Thai
2025cites this paper
Desiderata For The Context Use Of Question Answering Systems
2024cites this paper
NLPLego: Assembling Test Generation for Natural Language Processing Applications
2024cites this paper
Prediction of Total Phosphorus Based on Distance Correlation and Machine Learning Methods—a Case Study of Dongjiang River, China
2024cites this paper
Knowledge interaction graph guided prompting for event causality identification
2024cites this paper
UnAnswGen: A Systematic Approach for Generating Unanswerable Questions in Machine Reading Comprehension
2024cites this paper
READING PROFICIENCY AND COGNITIVE READING STRATEGIES THROUGH ONLINE DYNAMIC ASSESSMENT (ODA) IN ENGLISH FOR ECONOMY
2024cites this paper
Improved bidirectional attention flow (BIDAF) model for Arabic machine reading comprehension
2024cites this paper
Recent Advances in Multi-Choice Machine Reading Comprehension: A Survey on Methods and Datasets
2024cites this paper
SESAME - self-supervised framework for extractive question answering over document collections
2024cites this paper
It Is Not About What You Say, It Is About How You Say It: A Surprisingly Simple Approach for Improving Reading Comprehension
2024cites this paper
A comparative evaluation for question answering over Greek texts by using machine translation and BERT
2024cites this paper
Neural models for semantic analysis of handwritten document images
2024cites this paper
Safeguarding large language models: a survey
2024cites this paper
Numerical reasoning reading comprehension on Vietnamese COVID-19 news: task, corpus, and challenges
2024cites this paper
Robot-Assisted Language Learning: A Meta-Analysis
2024cites this paper
QLSC: A Query Latent Semantic Calibrator for Robust Extractive Question Answering
2024cites this paper
emrQA-msquad: A Medical Dataset Structured with the SQuAD V2.0 Framework, Enriched with emrQA Medical Information
2024cites this paper
EuSQuAD: Automatically Translated and Aligned SQuAD2.0 for Basque
2024cites this paper
A survey of deep learning techniques for machine reading comprehension
2023cites this paper
A Graph Fusion Approach for Cross-Lingual Machine Reading Comprehension
2023cites this paper
Spatial-Semantic Collaborative Graph Network for Textbook Question Answering
2023cites this paper
Mind Reasoning Manners: Enhancing Type Perception for Generalized Zero-Shot Logical Reasoning Over Text
2023cites this paper
Slovak Dataset for Multilingual Question Answering
2023cites this paper
Information Extraction from Documents: Question Answering vs Token Classification in real-world setups
2023cites this paper
Neural Ranking with Weak Supervision for Open-Domain Question Answering : A Survey
2023cites this paper
Machine Reading Comprehension using Case-based Reasoning
2023cites this paper
NAG-NER: a Unified Non-Autoregressive Generation Framework for Various NER Tasks
2023cites this paper
Building a deep learning-based QA system from a CQA dataset
2023cites this paper
Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling
2023cites this paper
On solving textual ambiguities and semantic vagueness in MRC based question answering using generative pre-trained transformers
2023cites this paper
Human Fall Detection Using Spatial Temporal Graph Convolutional Networks.
2023cites this paper
DAQAS: Deep Arabic Question Answering System based on duplicate question detection and machine reading comprehension
2023cites this paper
RoBERTa-CoA: RoBERTa-Based Effective Finetuning Method Using Co-Attention
2023cites this paper
PoQuAD - The Polish Question Answering Dataset - Description and Analysis
2023cites this paper
Retracted: Analyzing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets
2023influential citation
README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP
2023cites this paper
CPCM-MRC: A Few-shot Machine Reading Comprehension Approach for Intelligent Bridge Management via Continual Pre-training and Copy Mechanism
2023cites this paper
Investigating the role of Named Entity Recognition in Question Answering Models
2022cites this paper
Building Narrative Structures from Knowledge Graphs
2022cites this paper
RadQA: A Question Answering Dataset to Improve Comprehension of Radiology Reports
2022cites this paper
Comparison of text preprocessing methods
2022cites this paper
Cross-document attention-based gated fusion network for automated medical licensing exam
2022cites this paper
Evaluation of Transfer Learning for Polish with a Text-to-Text Model
2022cites this paper
TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages
2022cites this paper
DTW at Qur’an QA 2022: Utilising Transfer Learning with Transformers for Question Answering in a Low-resource Domain
2022cites this paper
ProQA: Structural Prompt-based Pre-training for Unified Question Answering
2022cites this paper
Clozer”:" Adaptable Data Augmentation for Cloze-style Reading Comprehension
2022cites this paper
Feeding What You Need by Understanding What You Learned
2022cites this paper
Recognition-free Question Answering on Handwritten Document Collections
2022influential citation
Low-Resource Dense Retrieval for Open-Domain Question Answering: A Comprehensive Survey
2022cites this paper
Learning Complex Natural Language Inferences with Relational Neural Models
2022cites this paper
Special Issue on Machine Learning and Natural Language Processing
2022cites this paper
Arabic machine reading comprehension on the Holy Qur'an using CL-AraBERT
2022cites this paper
Answering Count Questions with Structured Answers from Text
2022cites this paper
WeLM: A Well-Read Pre-trained Language Model for Chinese
2022cites this paper
Multiple-Choice Question Generation: Towards an Automated Assessment Framework
2022cites this paper
Exploring the Influence of Dialog Input Format for Unsupervised Clinical Questionnaire Filling
2022cites this paper
To What Extent Do Natural Language Understanding Datasets Correlate to Logical Reasoning? A Method for Diagnosing Logical Reasoning.
2022cites this paper
Automated Question Answering for Improved Understanding of Compliance Requirements: A Multi-Document Study
2022cites this paper
Knowledge Graph Enhanced Relation Extraction Datasets
2022cites this paper
ESSM: an extractive summarization model with enhanced spatial-temporal information and span mask encoding
2022cites this paper
Bilingual Question Answering over DBpedia Abstracts through Machine Translation and BERT
2022cites this paper
A Comprehensive Survey on Multi-hop Machine Reading Comprehension Datasets and Metrics
2022cites this paper
A Comprehensive Survey on Multi-hop Machine Reading Comprehension Approaches
2022cites this paper
DialogQAE: N-to-N Question Answer Pair Extraction from Customer Service Chatlog
2022cites this paper
ViMRC - VLSP 2021: Improving Retrospective Reader for Vietnamese Machine Reading Comprehension
2022cites this paper
Learning Invariant Representation Improves Robustness for MRC Models
2022cites this paper
An exploratory study on the potential of machine reading comprehension as an instructional scaffolding device in second language reading lessons
2022cites this paper
Pre-reading Activity over Question for Machine Reading Comprehension
2022cites this paper
Knowledge-Enhanced Relation Extraction Dataset
2022cites this paper
New State-of-the-Art for Question Answering on Portuguese SQuAD v1.1
2022influential citation
UFRGSent at SemEval-2022 Task 10: Structured Sentiment Analysis using a Question Answering Model
2022cites this paper
PaintTeR: Automatic Extraction of Text Spans for Generating Art-Centered Questions
2022cites this paper
The effects of robot-assisted language learning: A meta-analysis
2021cites this paper
A Survey on non-English Question Answering Dataset
2021cites this paper
Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering
2021cites this paper
Conversational Agents: Goals, Technologies, Vision and Challenges
2021cites this paper
Semantic search as extractive paraphrase span detection
2021cites this paper
Analysing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets
2021influential citation
English Machine Reading Comprehension Datasets: A Survey
2021influential citation
WebSRC: A Dataset for Web-Based Structural Reading Comprehension
2021cites this paper
Analyzing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets
2021cites this paper
ComQA: Compositional Question Answering via Hierarchical Graph Neural Networks
2021cites this paper
TOS: A Relative Metric Approach for Model Selection in Machine Learning Solutions
2021cites this paper