Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations

Aarne Talman,Antti Suni,H. Çelikkanat,Sofoklis Kakouros,J. Tiedemann,M. Vainio

Published 2019 in Nordic Conference of Computational Linguistics

ABSTRACT

In this paper we introduce a new natural language processing dataset and benchmark for predicting prosodic prominence from written text. To our knowledge this will be the largest publicly available dataset with prosodic labels. We describe the dataset construction and the resulting benchmark dataset in detail and train a number of different models ranging from feature-based classifiers to neural network systems for the prediction of discretized prosodic prominence. We show that pre-trained contextualized word representations from BERT outperform the other models even with less than 10% of the training data. Finally we discuss the dataset in light of the results and point to future research and plans for further improving both the dataset and methods of predicting prosodic prominence from text. The dataset and the code for the models will be made publicly available.

PUBLICATION RECORD

  • Publication year

    2019

  • Venue

    Nordic Conference of Computational Linguistics

  • Publication date

    2019-08-06

  • Fields of study

    Linguistics, Computer Science, Psychology

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-33 of 33 references · Page 1 of 1

CITED BY

Showing 1-31 of 31 citing papers · Page 1 of 1