The article discusses the problem of feature selection when training machine learning (ML) models in the task of identifying fake (phishing) websites. As a solution, a set of key metrics is proposed: efficiency, reliability, fault tolerance, and retrieval speed. Efficiency measures impact of feature to prediction accuracy. Reliability measures how well feature distinct phishing from legitimate. Fault tolerance score measures empirical probability of feature to be valid and fulfilled. And retrieval speed is logarithmic time of feature extraction. This approach allows for the ranking of features into categories and their subsequent selection for training machine learning models, depending on the specific domain and constraints. In this article, 82 features was measured, and 6 fully-connected neural networks was trained to evaluate the effectiveness of metrics. Experiments has shown that proposed approach can increase the accuracy of models by 1-3%, precision by 0.03, and significantly reduce overall extraction time and so improve response rate.
Keywords: feature evaluation method, machine learning model, identification of phishing websites, metric, efficiency, reliability, fault tolerance, and retrieval speed