Forecasting Host Cells for Recombinant Protein Expression

Hung Le

Published 2026 in Biochemistry & Molecular Biology

ABSTRACT

Selection of an appropriate host cell is a critical determinant of success in recombinant protein expression. In practice, host choice is still largely guided by individual experience, ad hoc consultation of the literature, and intuitive decision-making, often resulting in suboptimal expression outcomes and costly cycles of experimental trial and error. Despite several decades of accumulated empirical knowledge in the field, there is currently no systematic, evidence-based framework for forecasting host cell suitability from protein sequence and structural characteristics. The purpose of this study was to develop predictive models that enable rational selection of host cells for recombinant protein expression based on intrinsic protein features. To achieve this, we leveraged collective experimental experience embedded in publicly available structural data. Protein entries from the Protein Data Bank were curated and analyzed, and logistic regression approaches were applied to relate expression outcomes to a range of protein attributes, including structural parameters, stability indices, predicted subcellular localization, and post-translational modification requirements. Using these variables, we constructed and validated statistical models capable of forecasting expression preferences across four commonly used host systems: <i>Escherichia coli</i>, insect cells, mammalian cells, and yeast. Model performance was assessed using internal validation procedures, demonstrating that distinct combinations of protein features are associated with differential expression success among host types. In conclusion, this work provides an evidence-based and quantitative framework for predicting suitable host cells for recombinant protein expression. By translating accumulated empirical knowledge into practical predictive tools, the proposed models reduce reliance on subjective judgment and trial-and-error experimentation. To facilitate broad adoption, the models, together with user guidance, have been implemented in a publicly accessible web server, offering a practical resource to improve experimental efficiency and success rates in protein expression studies.

PUBLICATION RECORD

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-26 of 26 references · Page 1 of 1

CITED BY

  • No citing papers are available for this paper.

Showing 0-0 of 0 citing papers · Page 1 of 1