Does syntax help discourse segmentation? Not so much

Chloé Braud,Ophélie Lacroix,Anders Søgaard

Published 2017 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

Discourse segmentation is the first step in building discourse parsers. Most work on discourse segmentation does not scale to real-world discourse parsing across languages, for two reasons: (i) models rely on constituent trees, and (ii) experiments have relied on gold standard identification of sentence and token boundaries. We therefore investigate to what extent constituents can be replaced with universal dependencies, or left out completely, as well as how state-of-the-art segmenters fare in the absence of sentence boundaries. Our results show that dependency information is less useful than expected, but we provide a fully scalable, robust model that only relies on part-of-speech information, and show that it performs well across languages in the absence of any gold-standard annotation.

PUBLICATION RECORD

Publication year
2017
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
2017-09-01
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.18653/v1/D17-1258
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

DyNet: The Dynamic Neural Network Toolkit
2017cited by this paper
Cross-lingual and cross-domain discourse segmentation of entire documents
2017influential reference
UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing
2016cited by this paper
Transition-Based Dependency Parsing Exploiting Supertags
2016cited by this paper
Discourse Segmentation of German Texts
2015cited by this paper
CODRA: A Novel Discriminative Framework for Rhetorical Analysis
2015influential reference
Adaptation of Discourse Parsing Models for the Portuguese Language
2015cited by this paper
A Linear-Time Bottom-Up Discourse Parser with Constraints and Post-Editing
2014cited by this paper
Recursive Deep Models for Discourse Parsing
2014cited by this paper
Potsdam Commentary Corpus 2.0: Annotation for Discourse Research
2014influential reference
Representation Learning for Text-level Discourse Parsing
2014influential reference
Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis
2013cited by this paper
A Reranking Model for Discourse Segmentation using Subtree Features
2012influential reference
DiSeg 1.0: The first system for Spanish discourse segmentation
2012cited by this paper
Complex Sentences as Leaky Units in Discourse Parsing
2011cited by this paper
CSTNews - A Discourse-Annotated Corpus for Single and Multi-Document Summarization of News Texts in Brazilian Portuguese
2011cited by this paper
On the Development of the RST Spanish Treebank
2011cited by this paper
Learning Recursive Segments for Discourse Parsing
2010cited by this paper
A Sequential Model for Discourse Segmentation
2010cited by this paper
Syntax-based Discourse Segmentation of Dutch Text
2010cited by this paper
DiSeg: Un segmentador discursivo automático para el español
2010cited by this paper
A Syntactic and Lexical-Based Discourse Segmenter
2009cited by this paper
An effective Discourse Parser that uses Rich Linguistic Information
2009cited by this paper
On the Development and Evaluation of a Brazilian Portuguese Discourse Parser
2008cited by this paper
The utility of parse-derived features for automatic discourse segmentation
2007influential reference
Automatic Discourse Segmentation using Neural Networks
2007cited by this paper
Logics of Conversation
2005cited by this paper
Discourse Chunking and its Application to Sentence Compression
2005cited by this paper
Generating Discourse Structures for Written Text
2004cited by this paper
The Potsdam Commentary Corpus
2004cited by this paper
Sentence Level Discourse Parsing using Syntactic and Lexical Information
2003cited by this paper
Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory
2001influential reference
Discourse Tagging Reference Manual
2001cited by this paper
Long Short-Term Memory
1997cited by this paper
Building a Large Annotated Corpus of English: The Penn Treebank
1993influential reference
Rhetorical Structure Theory: Toward a functional theory of text organization
1988cited by this paper

CITED BY

DisCuT and DiscReT: MELODI at DISRPT 2025 Multilingual discourse segmentation, connective tagging and relation classification
2025cites this paper
Where Frameworks (Dis)agree: A Study of Discourse Segmentation
2025cites this paper
Experimenting with Discourse Segmentation of Taiwan Southern Min Spontaneous Speech
2024cites this paper
Using and comparing Rhetorical Structure Theory parsers with rst-workbench
2021cites this paper
Context, Structure and Syntax-aware RST Discourse Parsing
2021cites this paper
DisCoDisCo at the DISRPT2021 Shared Task: A System for Discourse Segmentation, Classification, and Connective Detection
2021influential citation
A Transformer Based Approach towards Identification of Discourse Unit Segments and Connectives
2021cites this paper
Multi-lingual Discourse Segmentation and Connective Identification: MELODI at Disrpt2021
2021cites this paper
Syntax-Guided Sequence to Sequence Modeling for Discourse Segmentation
2020cites this paper
Joint Learning of Syntactic Features Helps Discourse Segmentation
2020influential citation
Chinese and English Elementary Discourse Units Segmentation based on Bi-LSTM-CRF Model
2020cites this paper
Comparing PTB and UD information for PDTB discourseconnective identification
2020cites this paper
From News to Medical: Cross-domain Discourse Segmentation
2019cites this paper
ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents
2019influential citation
Segmenting a French Meeting Corpus into Elementary Discourse Units
year unknowncites this paper