Crystallization is pivotal in the chemical and pharmaceutical industry, affecting particle stability, and drug release. Crystal size distribution (CSD), a critical attribute of the final dosage form, is determined by the molecular structure of the crystallizing entity. Due to molecular diversity, establishing a clear relationship between molecular structure and CSD is challenging. This study unveils CrystalFormer, a novel framework that bridges this gap. By utilizing machine‐learned molecular fingerprints derived from an encoder‐based transformer trained on a dataset of 1.8 billion molecules, CrystalFormer introduces a “universal chemical language” to represent molecules in a latent space. These fingerprints enable the prediction of thermodynamic and kinetic properties using neural networks and probabilistic regression models. The integration of these predictions with first‐principles models like population balance equations allows for the determination of CSD with confidence bounds. The results highlight good prediction accuracy of thermodynamic and kinetic parameters with errors less than 8% for paracetamol and salicylic acid.
Predicting both thermodynamic and kinetic properties of crystallizing molecules via transformer‐based language model
Silabrata Pahari,C. Lee,Niranjan Sitapure,J. Kwon
Published 2025 in AIChE Journal
ABSTRACT
PUBLICATION RECORD
- Publication year
2025
- Venue
AIChE Journal
- Publication date
2025-07-10
- Fields of study
Not labeled
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-58 of 58 references · Page 1 of 1
CITED BY
Showing 1-9 of 9 citing papers · Page 1 of 1