Rapidly Bootstrapping a Question Answering Dataset for COVID-19

Raphael Tang,Rodrigo Nogueira,Edwin Zhang,Nikhil Gupta,Phuong Cam,Kyunghyun Cho,Jimmy J. Lin

Published 2020 in arXiv.org

ABSTRACT

We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19, built by hand from knowledge gathered from Kaggle's COVID-19 Open Research Dataset Challenge. To our knowledge, this is the first publicly available resource of its type, and intended as a stopgap measure for guiding research until more substantial evaluation resources become available. While this dataset, comprising 124 question-article pairs as of the present version 0.1 release, does not have sufficient examples for supervised machine learning, we believe that it can be helpful for evaluating the zero-shot or transfer capabilities of existing models on topics specifically related to COVID-19. This paper describes our methodology for constructing the dataset and presents the effectiveness of a number of baselines, including term-based techniques and various transformer-based models. The dataset is available at this http URL

PUBLICATION RECORD

Publication year
2020
Venue
arXiv.org
Publication date
2020-04-23
Fields of study
Medicine, Computer Science
Identifiers
arXiv 2004.11339
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset
2020cited by this paper
Document Ranking with a Pretrained Sequence-to-Sequence Model
2020influential reference
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019cited by this paper
SciBERT: A Pretrained Language Model for Scientific Text
2019influential reference
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
2019cited by this paper
Cascade Ranking for Operational E-commerce Search
2017cited by this paper
Anserini: Enabling the Use of Lucene for Information Retrieval Research
2017cited by this paper
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
2016influential reference
SQuAD: 100,000+ Questions for Machine Comprehension of Text
2016cited by this paper
An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition
2015cited by this paper
A cascade ranking model for efficient ranked retrieval
2011cited by this paper
High accuracy retrieval with multiple nested ranker
2006cited by this paper
What Makes a Good Answer? The Role of Context in Question Answering
2003cited by this paper
Overview of the TREC 2001 Question Answering Track
2001cited by this paper
The TREC-8 Question Answering Track Evaluation
2000cited by this paper
Okapi at TREC-4
1995cited by this paper
Okapi at TREC-3
1994influential reference

CITED BY

A high precision and speed question answering system about the post-COVID-19
2025cites this paper
ViQA-COVID: COVID-19 Machine Reading Comprehension Dataset for Vietnamese
2025cites this paper
MedQuery: A Graph-Driven Medical Literature-Enhanced Query Answering System
2025cites this paper
A Comprehensive Evaluation of Embedding Models and LLMs for IR and QA Across English and Italian
2025cites this paper
A Brief Review on Benchmarking for Large Language Models Evaluation in Healthcare
2025cites this paper
Algorithmic Behaviors Across Regions: A Geolocation Audit of YouTube Search for COVID-19 Misinformation between the United States and South Africa
2024cites this paper
OQA: A question-answering dataset on orthodontic literature
2024cites this paper
Transformer-Based Question Answering Model for the Biomedical Domain
2023cites this paper
Leveraging Deep Learning for Abstractive Code Summarization of Unofficial Documentation
2023cites this paper
Towards Mitigating Hallucination in Large Language Models via Self-Reflection
2023cites this paper
Constructing a disease database and using natural language processing to capture and standardize free text clinical information
2023cites this paper
Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach
2023cites this paper
Analyzing COVID-19 Medical Papers Using Artificial Intelligence: Insights for Researchers and Medical Professionals
2022cites this paper
The Role of Natural Language Processing during the COVID-19 Pandemic: Health Applications, Opportunities, and Challenges
2022influential citation
COVID-19-Related Scientific Literature Exploration: Short Survey and Comparative Study
2022cites this paper
New Frontiers of Scientific Text Mining: Tasks, Data, and Tools
2022cites this paper
Applications of natural language processing in ophthalmology: present and future
2022cites this paper
Automatic question answering for multiple stakeholders, the epidemic question answering dataset
2022cites this paper
Contextual embedding and model weighting by fusing domain knowledge on biomedical question answering
2022cites this paper
A COVID-19 Search Engine (CO-SE) with Transformer-based architecture
2022influential citation
CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice
2022influential citation
Establishing Strong Baselines for TripClick Health Retrieval
2022cites this paper
A Comparative Study on Transfer Learning and Distance Metrics in Semantic Clustering over the COVID-19 Tweets
2021cites this paper
Deep Learning applications for COVID-19
2021cites this paper
Biomedical Question Answering: A Comprehensive Review
2021influential citation
Revealing Opinions for COVID-19 Questions Using a Context Retriever, Opinion Aggregator, and Question-Answering Model: Model Development Study
2021cites this paper
COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization
2021cites this paper
Hidden Backdoors in Human-Centric Language Models
2021cites this paper
COPER: a Query-Adaptable Semantics-based Search Engine for Persian COVID-19 Articles
2021cites this paper
Multilingual COVID-QA: Learning towards Global Information Sharing via Web Question Answering in Multiple Languages
2021cites this paper
Meta-research on COVID-19: An overview of the early trends
2021cites this paper
KAAPA: Knowledge Aware Answers from PDF Analysis
2021cites this paper
COBERT: COVID-19 Question Answering System Using BERT
2021cites this paper
Synthetic Target Domain Supervision for Open Retrieval QA
2021cites this paper
Text Data Augmentation for Deep Learning
2021cites this paper
Machine Learning and Artificial Intelligence for the Prediction of Host–Pathogen Interactions: A Viral Case
2021cites this paper
Biomedical Question Answering: A Survey of Approaches and Challenges
2021influential citation
Pre-trained Language Models in Biomedical Domain: A Survey from Multiscale Perspective
2021cites this paper
Pre-trained Language Models in Biomedical Domain: A Systematic Survey
2021cites this paper
Open-Domain Question-Answering for COVID-19 and Other Emergent Domains
2021cites this paper
COVIDRead: A Large-scale Question Answering Dataset on COVID-19
2021cites this paper
Data science approaches to confronting the COVID-19 pandemic: a narrative review
2021cites this paper
Question Answering Systems for Covid-19
2021cites this paper
Multi-Domain Multilingual Question Answering
2021influential citation
SLEDGE: A Simple Yet Effective Baseline for Coronavirus Scientific Knowledge Search
2020cites this paper
CO-Search: COVID-19 Information Retrieval with Semantic Search, Question Answering, and Abstractive Summarization
2020cites this paper
CAiRE-COVID: A Question Answering and Multi-Document Summarization System for COVID-19 Research
2020cites this paper
Text mining approaches for dealing with the rapidly expanding literature on COVID-19
2020influential citation
End-to-End QA on COVID-19: Domain Adaptation with Synthetic Training
2020cites this paper
A Multilingual Reading Comprehension System for more than 100 Languages
2020influential citation
Collecting Verified COVID-19 Question Answer Pairs
2020cites this paper
TopiQAL: Topic-aware Question Answering using Scalable Domain-specific Supercomputers
2020cites this paper
Towards building a Robust Industry-scale Question Answering System
2020influential citation
COVID-19 Candidate Treatments, a Data Analytics Approach
2020cites this paper
A Smart Virtual Assistant Answering Questions About COVID-19
2020cites this paper
Artificial Intelligence (AI) in Action: Addressing the COVID-19 Pandemic with Natural Language Processing (NLP)
2020cites this paper
Scientific Claim Verification with VerT5erini
2020cites this paper
A comparative analysis of system features used in the TREC-COVID information retrieval challenge
2020cites this paper
AWS CORD-19 Search: A Neural Search Engine for COVID-19 Literature
2020cites this paper
Can questions summarize a corpus? Using question generation for characterizing COVID-19 research
2020cites this paper
UCD-CS at W-NUT 2020 Shared Task-3: A Text to Text Approach for COVID-19 Event Extraction on Social Media
2020cites this paper
orgFAQ: A New Dataset and Analysis on Organizational FAQs and User Questions
2020cites this paper
CAiRE-COVID: A Question Answering and Query-focused Multi-Document Summarization System for COVID-19 Scholarly Information Management
2020influential citation
COVID-19: A Semantic-Based Pipeline for Recommending Biomedical Entities
2020cites this paper
Study of Different Deep Learning Approach with Explainable AI for Screening Patients with COVID-19 Symptoms: Using CT Scan and Chest X-ray Image Dataset
2020cites this paper
Repurposing TREC-COVID Annotations to Answer the Key Questions of CORD-19
2020cites this paper
COVID-19 Symptoms Detection Based on NasNetMobile with Explainable AI Using Various Imaging Modalities
2020cites this paper
AWS CORD19-Search: A Scientific Literature Search Engine for COVID-19
2020cites this paper
End-to-End AI-Based Point-of-Care Diagnosis System for Classifying Respiratory Illnesses and Early Detection of COVID-19: A Theoretical Framework
2020cites this paper
Coronavirus Knowledge Graph: A Case Study
2020cites this paper
S_Covid: An Engine to Explore COVID-19 Scientific Literature
year unknowncites this paper