Unsupervised contact prediction is central to uncovering physical, structural, and functional constraints for protein structure determination and design. For decades, the predominant approach has been to infer evolutionary constraints from a set of related sequences. In the past year, protein language models have emerged as a potential alternative, but performance has fallen short of state-of-the-art approaches in bioinformatics. In this paper we demonstrate that Transformer attention maps learn contacts from the unsupervised language modeling objective. We find the highest capacity models that have been trained to date already outperform a state-of-the-art unsupervised contact prediction pipeline, suggesting these pipelines can be replaced with a single forward pass of an end-to-end model.1
Transformer protein language models are unsupervised structure learners
Roshan Rao,Joshua Meier,Tom Sercu,S. Ovchinnikov,Alexander Rives
Published 2020 in bioRxiv
ABSTRACT
PUBLICATION RECORD
- Publication year
2020
- Venue
bioRxiv
- Publication date
2020-12-15
- Fields of study
Biology, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-67 of 67 references · Page 1 of 1