Rapidly Bootstrapping a Question Answering Dataset for COVID-19

Raphael Tang,Rodrigo Nogueira,Edwin Zhang,Nikhil Gupta,Phuong Cam,Kyunghyun Cho,Jimmy J. Lin

Published 2020 in arXiv.org

ABSTRACT

We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19, built by hand from knowledge gathered from Kaggle's COVID-19 Open Research Dataset Challenge. To our knowledge, this is the first publicly available resource of its type, and intended as a stopgap measure for guiding research until more substantial evaluation resources become available. While this dataset, comprising 124 question-article pairs as of the present version 0.1 release, does not have sufficient examples for supervised machine learning, we believe that it can be helpful for evaluating the zero-shot or transfer capabilities of existing models on topics specifically related to COVID-19. This paper describes our methodology for constructing the dataset and presents the effectiveness of a number of baselines, including term-based techniques and various transformer-based models. The dataset is available at this http URL

PUBLICATION RECORD

  • Publication year

    2020

  • Venue

    arXiv.org

  • Publication date

    2020-04-23

  • Fields of study

    Medicine, Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-18 of 18 references · Page 1 of 1

CITED BY

Showing 1-71 of 71 citing papers · Page 1 of 1