Topic Segmentation and Labeling in Asynchronous Conversations

Published 2013 in Journal of Artificial Intelligence Research

ABSTRACT

Topic segmentation and labeling is often considered a prerequisite for higher-level conversation analysis and has been shown to be useful in many Natural Language Processing (NLP) applications. We present two new corpora of email and blog conversations annotated with topics, and evaluate annotator reliability for the segmentation and labeling tasks in these asynchronous conversations. We propose a complete computational framework for topic segmentation and labeling in asynchronous conversations. Our approach extends state-of-the-art methods by considering a fine-grained structure of an asynchronous conversation, along with other conversational features by applying recent graph-based methods for NLP. For topic segmentation, we propose two novel unsupervised models that exploit the fine-grained conversational structure, and a novel graph-theoretic supervised model that combines lexical, conversational and topic features. For topic labeling, we propose two novel (unsupervised) random walk models that respectively capture conversation specific clues from two different sources: the leading sentences and the fine-grained conversational structure. Empirical evaluation shows that the segmentation and the labeling performed by our best models beat the state-of-the-art, and are highly correlated with human annotations.

PUBLICATION RECORD

Publication year
2013
Venue
Journal of Artificial Intelligence Research
Publication date
2013-05-01
Fields of study
Computer Science
Identifiers
DOI 10.1613/jair.3940 arXiv 1402.0586
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Hierarchical Conversation Structure Prediction in Multi-Party Chat
2012cited by this paper
Machine learning - a probabilistic perspective
2012influential reference
TIARA: Interactive, Topic-Based Visual Text Summarization and Analysis
2012cited by this paper
SITS: A Hierarchical Nonparametric Model using Speaker Identity for Topic Segmentation in Multiparty Conversations
2012cited by this paper
Automatic Labelling of Topic Models
2011cited by this paper
Learning online discussion structures by conditional random fields
2011cited by this paper
Supervised Topic Segmentation of Email Conversations
2011cited by this paper
Graph-Based Natural Language Processing and Information Retrieval: Notations, Properties, and Representations
2011cited by this paper
Reconstruction of Threaded Conversations in Online Discussion Forums
2011cited by this paper
Disentangling Chat with Local Coherence Models
2011influential reference
Topical Keyphrase Extraction from Twitter
2011cited by this paper
Comparing Twitter and Traditional Media Using Topic Models
2011cited by this paper
Methods for mining and summarizing text conversations
2011cited by this paper
Unsupervised Modeling of Dialog Acts in Asynchronous Conversations
2011cited by this paper
SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles
2010cited by this paper
Disentangling Chat
2010influential reference
Evaluating N-gram based Evaluation Metrics for Automatic Keyphrase Extraction
2010influential reference
Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails
2010cited by this paper
Hierarchical Text Segmentation from Multi-Scale Lexical Cohesion
2009cited by this paper
Human-competitive tagging using automatic keyphrase extraction
2009cited by this paper
Human-competitive automatic topic indexing
2009cited by this paper
Latent Dirichlet Allocation
2009cited by this paper
Context-based Message Expansion for Disentanglement of Interleaved Text Conversations
2009cited by this paper
Incorporating domain knowledge into topic modeling via Dirichlet Forest priors
2009cited by this paper
Approximate Matching for Evaluating Keyphrase Extraction
2009cited by this paper
Non-negative Matrices and Markov Chains
2008cited by this paper
Syntactic Topic Models
2008cited by this paper
Summarizing Emails with Conversational Cohesion and Subjectivity
2008cited by this paper
A Publicly Available Annotated Corpus for Supervised Email Summarization
2008cited by this paper
Bayesian Unsupervised Topic Segmentation
2008cited by this paper
Always On: Language in an Online and Mobile World
2008cited by this paper
Automatic labeling of multinomial topic models
2007influential reference
Co-ranking Authors and Documents in a Heterogeneous Network
2007cited by this paper
Topic Segmentation Algorithms for Text Summarization and Passage Retrieval: An Exhaustive Evaluation
2007cited by this paper
Combining Multiple Information Layers for the Automatic Generation of Indicative Meeting Abstracts
2007cited by this paper
Summarizing email conversations with clue words
2007cited by this paper
Automatic Segmentation of Multiparty Dialogue
2006cited by this paper
Minimum Cut Model for Spoken Lecture Segmentation
2006influential reference
Topic modeling: beyond bag-of-words
2006cited by this paper
Learning the Structure of Task-Driven Human-Human Dialogs
2006cited by this paper
Unsupervised Topic Modelling for Multi-Party Spoken Discourse
2006cited by this paper
The Dirichlet-tree distribution
2006cited by this paper
Topic themes for multi-document summarization
2005cited by this paper
Integrating Topics and Syntax
2004influential reference
Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization
2004cited by this paper
ROUGE: A Package for Automatic Evaluation of Summaries
2004cited by this paper
TextRank: Bringing Order into Text
2004influential reference
WordNet::Similarity - Measuring the Relatedness of Concepts
2004cited by this paper
The ICSI Meeting Corpus
2003influential reference
Improved Automatic Keyword Extraction Given More Linguistic Knowledge
2003influential reference
Discourse Segmentation of Multi-Party Conversation
2003influential reference
Retrieval and novelty detection at the sentence level
2003cited by this paper
Topic detection and tracking: event-based information organization
2002influential reference
Topic Detection and Tracking
2002cited by this paper
Language and the Internet
2002cited by this paper
A Critique and Improvement of an Evaluation Metric for Text Segmentation
2002cited by this paper
Bleu: a Method for Automatic Evaluation of Machine Translation
2002cited by this paper
A Machine Learning Approach to Coreference Resolution of Noun Phrases
2001cited by this paper
Latent Semantic Analysis for Text Segmentation
2001influential reference
Topic segmentation with an aspect hidden Markov model
2001influential reference
Learning Algorithms for Keyphrase Extraction
2000cited by this paper
The PageRank Citation Ranking : Bringing Order to the Web
1999influential reference
Statistical Models for Text Segmentation
1999cited by this paper
Discourse Segmentation by Human and Automated Means
1997cited by this paper
Normalized cuts and image segmentation
1997influential reference
Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages
1997influential reference
Bagging Predictors
1996influential reference
Support-Vector Networks
1995cited by this paper
Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text
1991cited by this paper
Introduction to Modern Information Retrieval
1983influential reference
A simplest systematics for the organization of turn-taking for conversation
1974cited by this paper

CITED BY

Unsupervised Topic Shift Detection in Chats
2025cites this paper
Identifying Small Talk in Natural Conversations
2025cites this paper
Semantic Source Code Segmentation using Small and Large Language Models
2025cites this paper
EpiCache: Episodic KV Cache Management for Long Conversational Question Answering
2025cites this paper
Visualization of Unstructured Sports Data - An Example of Cricket Short Text Commentary
2024cites this paper
Topic Segmentation of Semi-Structured and Unstructured Conversational Datasets using Language Models
2023cites this paper
Diversity-Aware Coherence Loss for Improving Neural Topic Models
2023cites this paper
Multi-turn Dialogue Comprehension from a Topic-aware Perspective
2023cites this paper
Comparing neural sentence encoders for topic segmentation across domains: not your typical text similarity task
2023cites this paper
A semi-supervised framework for concept-based hierarchical document clustering
2023cites this paper
Learning functional sections in medical conversations: iterative pseudo-labeling and human-in-the-loop approach
2022cites this paper
Visualisation of hierarchical multivariate data: Categorisation and case study on hate speech
2022cites this paper
BehanceCC: A ChitChat Detection Dataset For Livestreaming Video Transcripts
2022cites this paper
Using Conceptual Recurrence and Consistency Metrics for Topic Segmentation in Debate
2022cites this paper
Topic Shift Detection for Mixed Initiative Response
2021cites this paper
CommunityPulse: Facilitating Community Input Analysis by Surfacing Hidden Insights, Reflections, and Priorities
2021cites this paper
Coarse-to-fine
2021cites this paper
A Two-Level Semi-supervised Clustering Technique for News Articles
2021cites this paper
Bag-of-Concepts representation for document classification based on automatic knowledge acquisition from probabilistic knowledge base
2020cites this paper
Efﬁcient Evaluation of Task Oriented Dialogue Systems
2020cites this paper
A Large-Scale Corpus of E-mail Conversations with Standard and Two-Level Dialogue Act Annotations
2020cites this paper
Topic-Aware Multi-turn Dialogue Modeling
2020cites this paper
Investigating and Supporting Sensemaking within Online Health Communities
2019cites this paper
BeamSeg: A Joint Model for Multi-Document Segmentation and Topic Identification
2019cites this paper
Visual Exploration of Topic Controversy in Online Conversations
2019cites this paper
A semantic approach for topic-based polarity detection: a case study in the Spanish language
2019cites this paper
Interactive topic hierarchy revision for exploring a collection of online conversations
2019cites this paper
Drift in Online Social Media
2018cites this paper
Burst Your Bubble! An Intelligent System for Improving Awareness of Diverse Social Opinions
2018cites this paper
Bringing Back Structure to Free Text Email Conversations with Recurrent Neural Networks
2018cites this paper
Coherence Modeling of Asynchronous Conversations: A Neural Entity Grid Approach
2018cites this paper
A Weakly Supervised Method for Topic Segmentation and Labeling in Goal-oriented Dialogues via Reinforcement Learning
2018cites this paper
Modeling Speech Acts in Asynchronous Conversations: A Neural-CRF Approach
2018cites this paper
MUSED: A multimedia multi-document dataset for topic segmentation
2018cites this paper
Multi-author document decomposition based on authorship
2018cites this paper
Text analytics, health analytics
2017cites this paper
Research Statement – Shafiq Joty
2017cites this paper
Information Bottleneck Inspired Method For Chat Text Segmentation
2017cites this paper
Thread Reconstruction in Conversational Data using Neural Coherence Models
2017cites this paper
Generating and Evaluating Summaries for Partial Email Threads: Conversational Bayesian Surprise and Silver Standards
2017cites this paper
A Park or A Highway: Overcoming Tensions in Designing for Socio-emotional and Informational Needs in Online Health Communities
2017cites this paper
An Intelligent Interface for Organizing Online Opinions on Controversial Topics
2017cites this paper
Opinion Summarization and Visualization
2017cites this paper
Multimedia Summary Generation from Online Conversations: Current Approaches and Future Directions
2017cites this paper
MultiConVis: A Visual Text Analytics System for Exploring a Collection of Online Conversations
2016cites this paper
Dialogue Session Segmentation by Embedding-Enhanced TextTiling
2016cites this paper
Structuration automatique de documents audio
2016cites this paper
Interactive Topic Modeling for Exploring Asynchronous Online Conversations
2016influential citation
Automatic label generation for news comment clusters
2016cites this paper
Title assignment for automatic topic segments in TV broadcast news
2016cites this paper
Visual Text Analytics for Online Conversations: Design, Evaluation, and Applications
2016cites this paper
Visual Text Analytics for Asynchronous Online Conversations
2015influential citation
Conversation Trees: A Grammar Model for Topic Structure in Forums
2015cites this paper
ConVisIT: Interactive Topic Modeling for Exploring Asynchronous Online Conversations
2015influential citation
CODRA: A Novel Discriminative Framework for Rhetorical Analysis
2015cites this paper
Linear Discourse Segmentation of Multi-Party Meetings Based on Local and Global Information
2015cites this paper
Extractive summarization of multi-party meetings through discourse segmentation
2015cites this paper
Abstractive Summarization of Spoken and Written Conversations Based on Phrasal Queries
2014cites this paper
23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue
2014cites this paper
Development of a visualization system of discussion processes in a mass of e-mails
2014cites this paper
Revealing Resources in Strategic Contexts
2014cites this paper
ConVis: A Visual Text Analytic System for Exploring Blog Conversations
2014influential citation
Exploiting the Human Computational Effort Dedicated to Message Reply Formatting for Training Discursive Email Segmenters
2014cites this paper
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop
2014cites this paper
Detecting Disagreement in Conversations using Pseudo-Monologic Rhetorical Structure
2014influential citation
Language independent analysis and classification of discussion threads in Coursera MOOC forums
2014cites this paper
Disentangling utterances and recovering coherent multi party distinct conversations
2014cites this paper
Interactive Exploration of Asynchronous Conversations: Applying a User-centered Approach to Design a Visual Text Analytic System
2014influential citation
A Lightly Supervised Learning Method for Forum Posts Dialogue Act Classification
2014cites this paper
Statement of research interests
2013cites this paper