Recent breakthroughs in protein structure prediction have led to an unprecedented surge in high-quality 3D models, highlighting the need for efficient computational solutions to manage and analyze this wealth of structural data. In our work, we comprehensively examine the structural clusters obtained from the AlphaFold Protein Structure Database (AFDB), a high-quality subset of ESMAtlas, and the Microbiome Immunity Project (MIP). We create a single cohesive low-dimensional representation of the resulting protein space. Our results show that, while each database occupies distinct regions within the protein structure space, they collectively exhibit significant overlap in their functional profiles. High-level biological functions tend to cluster in particular regions, revealing a shared functional landscape despite the diverse sources of data. By creating a single, cohesive low-dimensional representation of protein structure space integrating data from diverse sources, localizing functional annotations within this space, and providing an open-access web-server for exploration, this work offers insights for future research concerning protein sequence-structure-function relationships, enabling various biological questions to be asked about taxonomic assignments, environmental factors, or functional specificity. This approach is generalizable to other or future datasets, enabling further discovery beyond findings presented here.
Large protein databases reveal structural complementarity and functional locality
P. Szczerbiak,Lukasz M. Szydlowski,Witold Wydmański,P. Renfrew,Julia Koehler Leman,Tomasz Kościółek
Published 2025 in bioRxiv
ABSTRACT
PUBLICATION RECORD
- Publication year
2025
- Venue
bioRxiv
- Publication date
2025-05-26
- Fields of study
Biology, Medicine, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar, PubMed
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-83 of 83 references · Page 1 of 1
CITED BY
Showing 1-1 of 1 citing papers · Page 1 of 1