Exploring Content Models for Multi-Document Summarization

A. Haghighi,Lucy Vanderwende

Published 2009 in North American Chapter of the Association for Computational Linguistics

ABSTRACT

We present an exploration of generative probabilistic models for multi-document summarization. Beginning with a simple word frequency based model (Nenkova and Vanderwende, 2005), we construct a sequence of models each injecting more structure into the representation of document set content and exhibiting ROUGE gains along the way. Our final model, HierSum, utilizes a hierarchical LDA-style model (Blei et al., 2004) to represent content specificity as a hierarchy of topic vocabulary distributions. At the task of producing generic DUC-style summaries, HierSum yields state-of-the-art ROUGE performance and in pairwise user evaluation strongly outperforms Toutanova et al. (2007)'s state-of-the-art discriminative system. We also explore HierSum's capacity to produce multiple 'topical summaries' in order to facilitate content discovery and navigation.

PUBLICATION RECORD

  • Publication year

    2009

  • Venue

    North American Chapter of the Association for Computational Linguistics

  • Publication date

    2009-05-31

  • Fields of study

    Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-23 of 23 references · Page 1 of 1

CITED BY

Showing 1-100 of 573 citing papers · Page 1 of 6