Habitable exoplanet - a statistical search for life

Prasenjit Banerjee,A. Chattopadhyay

Published 2025 in Frontiers in Astronomy and Space Sciences

ABSTRACT

The identification of habitable exoplanets is an important challenge in modern space science, requiring the combination of planetary and stellar parameters to assess conditions that support life. Using a dataset of 5867 exoplanets from the NASA Exoplanet Archive (as of April 3, 2025), we have applied Random Forest and eXtreme Gradient Boosting (XGBoost) to classify planets as habitable or non-habitable based on 32 continuous parameters, including orbital semi-major axis, planetary radius, mass, density, and stellar properties. Habitability is defined through physics-based criteria rooted in the presence of liquid water, stable climates, and Earth-like characteristics using seven key parameters: planetary radius, density, orbital eccentricity, mass, stellar effective temperature, luminosity, and orbital semi-major axis. To make the classification accurate, we deal with multicollinearity and we checked the Variance Inflation Factor (VIF). We selected parameters with VIF < 5: planetary orbital period, semi-major axis, density, eccentricity, inclination; stellar effective temperature, radius, mass, metallicity, age, density, and total proper motion. Although the defining parameters are used for labeling, only those with low VIF (orbital semi-major axis and eccentricity, planetary density, and stellar effective temperature) are retained for modeling, supplemented by additional low-VIF parameters. Class imbalance is addressed using the Random Over-Sampling Examples (ROSE) technique with both over- and under-sampling to create a balanced dataset. The models achieve classification accuracies of 99.99% for Random Forest and 99.93% for eXtreme Gradient Boosting (XGBoost) on the test set, with high sensitivity and specificity. We analyze the data distributions of the key defining parameters, revealing skewed distributions typical of exoplanet populations. Parameter uncertainties are incorporated through Monte Carlo perturbations to assess prediction stability, showing minimal impact on overall accuracy but possible biases in borderline cases. We consider the intersection of habitable exoplanets identified by the seven defining parameters and verify with the twelve low-VIF parameters, confirming consistent classification and making habitability assessments more reliable. Our findings highlight the potential of machine learning techniques to prioritize exoplanet targets for future observations, providing a fast and understandable approach for habitability assessment.

PUBLICATION RECORD

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-57 of 57 references · Page 1 of 1

CITED BY

  • No citing papers are available for this paper.

Showing 0-0 of 0 citing papers · Page 1 of 1