×

You are using an outdated browser Internet Explorer. It does not support some functions of the site.

Recommend that you install one of the following browsers: Firefox, Opera or Chrome.

Contacts:

+7 961 270-60-01
ivdon3@bk.ru

Machine Learning of Predictive Models on Unbalanced Data on Hazardous Asteroids

Abstract

Machine Learning of Predictive Models on Unbalanced Data on Hazardous Asteroids

Gorlatov D.V.

Incoming article date: 21.03.2023

A set of data on potentially dangerous asteroids for the Earth is analyzed. According to descriptive statistics, a preliminary analysis and data processing is performed. The correlation between the parameters allows you to identify those that will be used to train the models. With the help of machine learning models, asteroids from the database are classified into hazardous and non-hazardous. Methods of logistic regression, k-nearest neighbors; decision tree and others are used. Using cross-validation, the best method is found, then its optimal hyperparameters are determined. The quality of the classifier model is evaluated by the metrics of completeness (Recall) and its standard deviation, as well as using the error matrix (confusion matrix) and the average absolute error in percent (MAPE). The results of analysis and modeling in Python are presented, demonstrating the high accuracy of predicting the resulting model.

Keywords: machine learning, predictive model, data analysis, imbalanced data, logistic regression, k-nearest neighbors, decision tree, random forest, support vector machine, cross-validation