QuestionBank: Creating a Corpus of Parse-Annotated Questions

John Judge,Aoife Cahill,Josef van Genabith

Published 2006 in Annual Meeting of the Association for Computational Linguistics

ABSTRACT

This paper describes the development of QuestionBank, a corpus of 4000 parse-annotated questions for (i) use in training parsers employed in QA, and (ii) evaluation of question parsing. We present a series of experiments to investigate the effectiveness of QuestionBank as both an exclusive and supplementary training resource for a state-of-the-art parser in parsing both question and non-question test sets. We introduce a new method for recovering empty nodes and their antecedents (capturing long distance dependencies) from parser output in CFG trees using LFG f-structure reentrancies. Our main findings are (i) using QuestionBank training data improves parser performance to 89.75% labelled bracketing f-score, an increase of almost 11% over the baseline; (ii) back-testing experiments on non-question data (Penn-II WSJ Section 23) shows that the retrained parser does not suffer a performance drop on non-question material; (iii) ablation experiments show that the size of training material provided by QuestionBank is sufficient to achieve optimal results; (iv) our method for recovering empty nodes captures long distance dependencies in questions from the ATIS corpus with high precision (96.82%) and low recall (39.38%). In summary, QuestionBank provides a useful new resource in parser-based QA research.

PUBLICATION RECORD

Publication year
2006
Venue
Annual Meeting of the Association for Computational Linguistics
Publication date
2006-07-17
Fields of study
Computer Science
Identifiers
DOI 10.3115/1220175.1220238
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Strong domain variation and treebank-induced LFG resources
2005cited by this paper
Object-Extraction and Question-Parsing using CCG
2004cited by this paper
Long-Distance Dependency Resolution in Automatically Acquired Wide-Coverage PCFG-Based LFG Approximations
2004influential reference
Head-Driven Statistical Models for Natural Language Parsing
2003influential reference
Design of a multi-lingual, parallel-processing statistical parsing engine
2002cited by this paper
Learning Question Classifiers
2002cited by this paper
A Simple Pattern-matching Algorithm for Recovering Empty Nodes and their Antecedents
2002cited by this paper
Corpus Variation and Parser Performance
2001influential reference
Building a Large Annotated Corpus of English: The Penn Treebank
1993cited by this paper
The ATIS Spoken Language Systems Pilot Corpus
1990influential reference

CITED BY

Textflows: an open science NLP evaluation approach
2024cites this paper
Sneaking Syntax into Transformer Language Models with Tree Regularization
2024cites this paper
Reducing tail entity hallucinations with dependency edge prediction in text to text transfer transformer based auto-generated questions
2024cites this paper
ICON: A Linguistically-Motivated Large-Scale Benchmark Indonesian Constituency Treebank
2023cites this paper
DinG – a corpus of transcriptions of real-life, oral, spontaneous multi-party dialogues between French-speaking players of Catan
2023cites this paper
TwittIrish: A Universal Dependencies Treebank of Tweets in Modern Irish
2022cites this paper
A survey on syntactic processing techniques
2022cites this paper
Square One Bias in NLP: Towards a Multi-Dimensional Exploration of the Research Manifold
2022cites this paper
Domain-Aware Dependency Parsing for Questions
2021cites this paper
Skeleton parsing for complex question answering over knowledge bases
2021cites this paper
Bootstrapping Dependency Treebank of Urdu Noisy Text
2021cites this paper
ELIT: Emory Language and Information Toolkit
2021influential citation
An Approach to Inference-Driven Dialogue Management within a Social Chatbot
2021influential citation
A Systematic Analysis of Morphological Content in BERT Models for Multiple Languages
2020cites this paper
A Survey of the Usages of Deep Learning for Natural Language Processing
2020cites this paper
Question Answering over Knowledge Bases by Leveraging Semantic Parsing and Neuro-Symbolic Reasoning
2020cites this paper
基于抽象语义表示的汉语疑问句的标注与分析(Chinese Interrogative Sentences Annotation and Analysis Based on the Abstract Meaning Representation)
2020cites this paper
Fluent Response Generation for Conversational Question Answering
2020cites this paper
Text Genre and Training Data Size in Human-like Parsing
2019cites this paper
Automatic Generation of High Quality CCGbanks for Parser Domain Adaptation
2019cites this paper
Natural Language Data Management and Interfaces
2018cites this paper
Sequential Parsing with In-order Tree Traversals
2018influential citation
A Survey of the Usages of Deep Learning in Natural Language Processing
2018influential citation
Extending a Parser to Distant Domains Using a Few Dozen Partially Annotated Examples
2018cites this paper
Improving Domain Independent Question Parsing with Synthetic Treebanks
2018influential citation
Dependency Parsing and Dialogue Systems : an investigation of dependency parsing for commercial application
2017influential citation
Deep Dependency Graph Conversion in English
2017cites this paper
Verb-Particle Constructions in Questions
2017cites this paper
Parsing Universal Dependencies without training
2017cites this paper
Tour d'Horizon du French QuestionBank : Construire un Corpus Arboré de Questions pour le Français
2017influential citation
Toward Solving Penn Treebank Parsing
2017influential citation
A bootstrapping method for development of Treebank
2017cites this paper
Morpho-syntactically Annotated Amharic Treebank
2016cites this paper
Detecting Differential Item Functioning and Differential Test Functioning on Math School Final-exam
2016cites this paper
Globally Normalized Transition-Based Neural Networks
2016influential citation
Generating Questions and Multiple-Choice Answers using Semantic Analysis of Texts
2016cites this paper
Hard Time Parsing Questions: Building a QuestionBank for French
2016influential citation
Irish dependency treebanking and parsing
2016influential citation
Tree-based Convolution for Sentence Modeling
2015influential citation
Syntactic Parse Fusion
2015cites this paper
Edge-Linear First-Order Dependency Parsing with Undirected Minimum Spanning Tree Inference
2015cites this paper
Dependency-based Convolutional Neural Networks for Sentence Embedding
2015influential citation
Improved Transition-Based Parsing and Tagging with Neural Networks
2015cites this paper
Parsing Paraphrases with Joint Inference
2015cites this paper
Structured Training for Neural Network Transition-Based Parsing
2015cites this paper
Bilexical Dependencies as an Intermedium for Data-Driven and HPSG-Based Parsing
2015cites this paper
The role of syntax and semantics in machine translation and quality estimation of machine-translated user-generated content
2015cites this paper
Evaluating Parsers with Dependency Constraints
2015cites this paper
Error Analysis in Open-Domain Question Answering Systems
2015cites this paper
Use of Syntax in Question Answering Tasks
2014cites this paper
Message Passing for Soft Constraint Dual Decomposition
2014cites this paper
Treebank Parsing and Knowledge of Language 1
2014cites this paper
Answering Natural Language Questions with Intui3
2014cites this paper
Decomposing Consumer Health Questions
2014cites this paper
Using subtitles to deal with Out-of-Domain interactions
2014cites this paper
Grammar as a Foreign Language
2014influential citation
Graph-Based Semi-Supervised Learning
2014cites this paper
An Out-of-Domain Test Suite for Dependency Parsing of German
2014cites this paper
Feature-driven Question Answering With Natural Language Alignment
2014cites this paper
JUST.ASK, a QA system that learns to answer new questions from previous interactions
2014cites this paper
Robust French syntax analysis: reconciling statistical methods and linguistic knowledge in the Talismane toolkit. (Analyse syntaxique robuste du français : concilier méthodes statistiques et connaissances linguistiques dans l'outil Talismane)
2013cites this paper
Treebank Parsing and Knowledge of Language
2013cites this paper
Learning to answer questions
2013cites this paper
A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books
2013cites this paper
Large-scale deep linguistic processing IN COLLABORATION WITH: Analyse Linguistique Profonde A Grande Echelle (ALPAGE)
2013cites this paper
Just.Ask - a Multi-pronged Approach to Question Answering
2013cites this paper
Semi-Supervised Semantic Tagging of Conversational Understanding using Markov Topic Regression
2013influential citation
Source-Side Classifier Preordering for Machine Translation
2013cites this paper
Automatic Keyword Extraction from Single-Sentence Natural Language Queries
2012cites this paper
Syntactic Annotations for the Google Books NGram Corpus
2012influential citation
A Common Evaluation Setting for Just.Ask, Open Ephyra and Aranea QA systems
2012cites this paper
Accurate Unbounded Dependency Recovery using Generalized Categorial Grammars
2012cites this paper
Question Generation based on Lexico-Syntactic Patterns Learned from the Web
2012cites this paper
Task-specific Word-Clustering for Part-of-Speech Tagging
2012cites this paper
Intégration de ressources lexicales riches dans un analyseur syntaxique probabiliste. (Integration of lexical resources in a probabilistic parser)
2012cites this paper
Irish Treebanking and Parsing: A Preliminary Evaluation
2012cites this paper
Parsing Any Domain English text to CoNLL dependencies
2012influential citation
Parallel Syntactic Annotation in CReST
2012cites this paper
Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
2012influential citation
Active Learning and the Irish Treebank
2012cites this paper
Using Search-Logs to Improve Query Tagging
2012cites this paper
A Dynamic Oracle for Arc-Eager Dependency Parsing
2012cites this paper
50th Annual Meeting of the Association for Computational Linguistics Proceedings of the Conference Volume 2: Short Papers
2012cites this paper
On Stochastic Tree Distances and Their Training via Expectation-Maximisation
2012cites this paper
From News to Comment: Resources and Benchmarks for Parsing the Language of Web 2.0
2011cites this paper
Exploring Difficulties in Parsing Imperatives and Questions
2011cites this paper
Bootstrapping Multiple-Choice Tests with The-Mentor
2011cites this paper
The Uppsala-FBK systems at WMT 2011
2011cites this paper
EMNLP 2011 Conference on Empirical Methods in Natural Language Processing Proceedings of the UCNLG+Eval: Language Generation and Evaluation Workshop
2011cites this paper
Training a Parser for Machine Translation Reordering
2011influential citation
A minimally supervised approach for question generation : what can we learn from a single seed ?
2011cites this paper
EMNLP 2011 Conference on Empirical Methods in Natural Language Processing Proceedings of the UCNLG+Eval: Language Generation and Evaluation Workshop
2011cites this paper
Improving dependency label accuracy using statistical post-editing: A cross-framework study
2011cites this paper
Exploring linguistically-rich patterns for question generation
2011cites this paper
Learning Dependency-Based Compositional Semantics
2011cites this paper
A Semi-Automatic, Iterative Method for Creating a Domain-Specific Treebank
2011cites this paper
Training dependency parsers by jointly optimizing multiple objectives
2011influential citation
Enhanced Question Classification with Optimal Combination of Features
2011cites this paper
Training Structured Prediction Models with Extrinsic Loss Functions
2011cites this paper
Parsing Natural Language Queries for Life Science Knowledge
2011cites this paper