Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations

Aarne Talman,Antti Suni,H. Çelikkanat,Sofoklis Kakouros,J. Tiedemann,M. Vainio

Published 2019 in Nordic Conference of Computational Linguistics

ABSTRACT

In this paper we introduce a new natural language processing dataset and benchmark for predicting prosodic prominence from written text. To our knowledge this will be the largest publicly available dataset with prosodic labels. We describe the dataset construction and the resulting benchmark dataset in detail and train a number of different models ranging from feature-based classifiers to neural network systems for the prediction of discretized prosodic prominence. We show that pre-trained contextualized word representations from BERT outperform the other models even with less than 10% of the training data. Finally we discuss the dataset in light of the results and point to future research and plans for further improving both the dataset and methods of predicting prosodic prominence from text. The dataset and the code for the models will be made publicly available.

PUBLICATION RECORD

Publication year
2019
Venue
Nordic Conference of Computational Linguistics
Publication date
2019-08-06
Fields of study
Linguistics, Computer Science, Psychology
Identifiers
arXiv 1908.02262
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019influential reference
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
2019influential reference
Effects of Word Embeddings on Neural Network-based Pitch Accent Detection
2018cited by this paper
Hierarchical representation and estimation of prosody using continuous wavelet transform
2017influential reference
Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi
2017cited by this paper
Using continuous lexical embeddings to improve symbolic-prosody prediction in a text-to-speech front-end
2016cited by this paper
3PRO - An unsupervised method for the automatic detection of sentence prominence in speech
2016cited by this paper
Analyzing the Contribution of Top-Down Lexical and Bottom-Up Acoustic Cues in the Detection of Sentence Prominence
2016cited by this paper
Simple Semi-Supervised POS Tagging
2015cited by this paper
Librispeech: An ASR corpus based on public domain audio books
2015influential reference
GloVe: Global Vectors for Word Representation
2014influential reference
Efficient Higher-Order CRFs for Morphological Tagging
2013cited by this paper
Efficient Estimation of Word Representations in Vector Space
2013influential reference
Unsupervised learning for text-to-speech synthesis
2013cited by this paper
Experimental and theoretical advances in prosody: A review
2010cited by this paper
Automatic detection and classification of prosodic events
2009cited by this paper
Automatic Prosodic Labeling with Conditional Random Fields and Rich Acoustic Features
2008cited by this paper
Probabilistic Relations between Words: Evidence from Reduction in Lexical Production
2008cited by this paper
Naïve listeners' prominence and boundary perception
2008cited by this paper
To Memorize or to Predict: Prominence labeling in Conversational Speech
2007influential reference
An Acoustic Measure for Word Prominence in Spontaneous Speech
2007cited by this paper
Pitch accent prediction: effects of genre and speaker
2005cited by this paper
Intertranscriber reliability of prosodic labeling on telephone conversation using toBI
2004influential reference
Using Conditional Random Fields to Predict Pitch Accents in Conversational Speech
2004cited by this paper
Learning to Predict Pitch Accents and Prosodic Boundaries in Dutch
2003cited by this paper
The perception of prosodic prominence
2000cited by this paper
Long Short-Term Memory
1997cited by this paper
A Probabilistic Model of Lexical and Syntactic Access and Disambiguation
1996cited by this paper
Pitch Accent in Context: Predicting Intonational Prominence from Text
1993cited by this paper
Accent Is Predictable (If You're a Mind-Reader)
1972cited by this paper
Sentence Stress and Syntactic Transformations
1971cited by this paper
The Sound Pattern of English
1968cited by this paper
Some Acoustic Correlates of Word Stress in American English
1959cited by this paper

CITED BY

Spontaneous Speech Variables for Evaluating LLMs Cognitive Plausibility
2025cites this paper
The time scale of redundancy between prosody and linguistic context
2025influential citation
The Role of Prosody in Spoken Question Answering
2025cites this paper
Can LLMs Understand the Implication of Emphasized Sentences in Dialogue?
2024cites this paper
NLPP: A Natural Language Prosodic Prominence Dataset Assisted by ChatGPT
2024cites this paper
Hierarchical Intonation Modelling for Speech Synthesis using Legendre Polynomial Coefficients
2024cites this paper
What does BERT learn about prosody?
2023influential citation
Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody
2023cites this paper
The DeepZen Speech Synthesis System for Blizzard Challenge 2023
2023cites this paper
Predicting children's perceived reading proficiency with prosody modeling
2023influential citation
Crowdsourced and Automatic Speech Prominence Estimation
2023cites this paper
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes
2023cites this paper
Multi-granularity Semantic and Acoustic Stress Prediction for Expressive TTS
2023cites this paper
Quantifying the redundancy between prosody and text
2023cites this paper
A Generative Modelling-based Approach for Text-to-Speech Synthesis in Romanian Language
2023cites this paper
Exploring the Utility of Automatically Generated References for Assessing L2 Prosody
2023influential citation
BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model
2022cites this paper
Towards Integration of Embodiment Features for Prosodic Prominence Prediction from Text
2022influential citation
A Mandarin Prosodic Boundary Prediction Model Based on Multi-Source Semi-Supervision
2022cites this paper
Polyphone Disambiguation and Accent Prediction Using Pre-Trained Language Models in Japanese TTS Front-End
2022cites this paper
Automatic Prosody Annotation with Pre-Trained Text-Speech Model
2022cites this paper
Metrical Tagging in the Wild: Building and Annotating Poetry Corpora with Rhythmic Features
2021influential citation
A Universal Bert-Based Front-End Model for Mandarin Text-To-Speech Synthesis
2021cites this paper
Superposition of Functional Contours Based Prosodic Feature Extraction for Speech Processing
2021cites this paper
Prosody-Enhanced Mandarin Text-to-Speech System
2021cites this paper
A Survey on Neural Speech Synthesis
2021cites this paper
Deep Learning for Prominence Detection In Children’s Read Speech
2021cites this paper
Unified Mandarin TTS Front-end Based on Distilled BERT Model
2020cites this paper
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
2020cites this paper
Prosodic Prominence and Boundaries in Sequence-to-Sequence Speech Synthesis
2020cites this paper
Disambiguating Speech Intention via Audio-Text Co-attention Framework: A Case of Prosody-semantics Interface
2019cites this paper