The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image and video modelling, but these have not yet been successfully scaled to nearly the same degree; the largest dense ViT contains 4B parameters (Chen et al., 2022). We present a recipe for highly efficient and stable training of a 22B-parameter ViT (ViT-22B) and perform a wide variety of experiments on the resulting model. When evaluated on downstream tasks (often with a lightweight linear model on frozen features), ViT-22B demonstrates increasing performance with scale. We further observe other interesting benefits of scale, including an improved tradeoff between fairness and performance, state-of-the-art alignment to human visual perception in terms of shape/texture bias, and improved robustness. ViT-22B demonstrates the potential for"LLM-like"scaling in vision, and provides key steps towards getting there.
Scaling Vision Transformers to 22 Billion Parameters
Mostafa Dehghani,J. Djolonga,Basil Mustafa,Piotr Padlewski,J. Heek,J. Gilmer,A. Steiner,Mathilde Caron,Robert Geirhos,Ibrahim M. Alabdulmohsin,Rodolphe Jenatton,Lucas Beyer,Michael Tschannen,Anurag Arnab,Xiao Wang,C. Riquelme,M. Minderer,J. Puigcerver,Utku Evci,Manoj Kumar,Sjoerd van Steenkiste,Gamaleldin F. Elsayed,Aravindh Mahendran,F. Yu,Avital Oliver,Fantine Huot,Jasmijn Bastings,Mark Collier,A. Gritsenko,Vighnesh Birodkar,C. Vasconcelos,Yi Tay,Thomas Mensink,Alexander Kolesnikov,Filip Paveti'c,Dustin Tran,Thomas Kipf,Mario Luvci'c,Xiaohua Zhai,Daniel Keysers,Jeremiah Harmsen,N. Houlsby
Published 2023 in International Conference on Machine Learning
ABSTRACT
PUBLICATION RECORD
- Publication year
2023
- Venue
International Conference on Machine Learning
- Publication date
2023-02-10
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.