Unsupervised Text Segmentation Based on Native Language Characteristics

S. Malmasi,M. Dras,Mark Johnson,Lan Du,Magdalena Wolska

Published 2017 in Annual Meeting of the Association for Computational Linguistics

ABSTRACT

Most work on segmenting text does so on the basis of topic changes, but it can be of interest to segment by other, stylistically expressed characteristics such as change of authorship or native language. We propose a Bayesian unsupervised text segmentation approach to the latter. While baseline models achieve essentially random segmentation on our task, indicating its difficulty, a Bayesian model that incorporates appropriately compact language models and alternating asymmetric priors can achieve scores on the standard metrics around halfway to perfect segmentation.

PUBLICATION RECORD

  • Publication year

    2017

  • Venue

    Annual Meeting of the Association for Computational Linguistics

  • Publication date

    2017-07-01

  • Fields of study

    Linguistics, Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-46 of 46 references · Page 1 of 1