Although a variety of transformers have been proposed for symbolic music generation in recent years, there is still little comprehensive study on how specific design choices affect the quality of the generated music. In this work, we systematically compare different datasets, model architectures, model sizes, and training strategies for the task of symbolic piano music generation. To support model development and evaluation, we examine a range of quantitative metrics and analyze how well they correlate with human judgment collected through listening studies. Our best-performing model, a 950M-parameter transformer trained on 80K MIDI files from diverse genres, produces outputs that are often rated as human-composed in a Turing-style listening survey.
Generating Piano Music with Transformers: A Comparative Study of Scale, Data, and Metrics
Jonathan Lehmkuhl,Ábel Ilyés-Kun,Nico Bremes,Cemhan Kaan Özaltan,Frederik Muthers,Jiayi Yuan
Published 2025 in arXiv.org
ABSTRACT
PUBLICATION RECORD
- Publication year
2025
- Venue
arXiv.org
- Publication date
2025-11-10
- Fields of study
Computer Science, Engineering
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-19 of 19 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1