•  
  •  
 

Abstract

Enhancing the accuracy in predicting continuous values remains a significant challenge, especially when dealing with imbalanced data and choosing appropriate models. Regression techniques are widely used in data mining, and machine learning fields for this purpose. However, the traditional algorithms struggle to achieve high accuracy because of the limitations in dealing with complex data and imbalanced distribution. This study addresses these gaps by proposing a new framework that evaluates multiple regression models using the Boston House Pricing Dataset (BHD). The examined models involve simple linear, multiple linear, Polynomial, Lasso, Ridge, Random Forest, Keras and Gradient Boosting regression. The models are compared using evaluation metrics such as R-squared Score (R2), Mean Squared Error (MSE), and Mean Absolute Error (MAE). Among the examined models, the first promising outcomes indicate that Random Forest and Ridge regressors scored a high level of R2 i.e. 89.9 and 88.3, respectively. In addition, The Gradient Boosting model offers the best result of R2 92 with MSE 0.72 and MAE 2.00. To further enhance the accuracy of the best model, this research applies two techniques. Re-sampling and optimization using the RandomizedSearchCV tuned hyper-parameter improved R2 score to 93.2 with a better MSE of 0.015 and MAE of 0.82. These findings prove a significant improvement in model performance and offer a potential for practical application in real-world scenarios.

Share

COinS