Predicting Investment Risk Levels Using XGBoost Algorithm
In this project, I embarked on a journey to predict investment risk levels by leveraging the power of the XGBoost algorithm. What makes this endeavor unique is that I chose not to remove outliers from the dataset, relying on XGBoost’s innate robustness to handle them effectively. This blog post will delve into the dataset, the model building process, and the evaluation of the predictive model.
Dataset Overview
The dataset encompasses 14 essential financial and economic features pivotal for assessing investment risk. These features include:
- Capital adequacy ratio (%)
- GDP per capita (USD)
- Gross External Debt (% of GDP)
- Growth of consumer price (%)
- Growth of population (%)
- Growth of Real GDP (%)
- Growth of Real GDP per capita (%)
- Loan-deposit ratio (%)
- Net External Debt (% of GDP)
- Nominal GDP (USD bn)
- Non-performing loans (% of gross loans)
- Percentage of gross domestic investment to GDP (%)
- Percentage of gross domestic saving to GDP (%)
- Unemployment rate (% labour force)
The target variable, "Risk Level," indicates whether a country’s investment risk is low or high. These features, along with the target variable, were employed in training the XGBoost model to ascertain the investment risk level.
Addressing Missing Values and Outliers
To handle missing values and potential outliers effectively, I employed the MICE (Multiple Imputation by Chained Equations) method. This method facilitated imputation while preserving the interrelations between variables, ensuring that outliers or missing values did not distort the underlying data patterns. By using MICE, I created a more robust dataset for training the XGBoost model without necessitating the removal of outliers.
Data Scaling for Enhanced Robustness
For enhanced outlier handling capability, I applied Robust Scaling to the dataset. This technique, based on the median and the interquartile range (IQR), ensured that the data remained less sensitive to extreme values, thus improving the XGBoost model’s performance without compromising valuable outlier information.
Model Building with XGBoost
In the modeling phase, I harnessed the power of XGBoost to construct a high-performing predictive model. By training the model on a training dataset and evaluating it on a testing dataset, I ensured that the model exhibited robust performance and could generalize well to unseen data. The XGBoost model excelled in learning the intricate patterns and relationships within the data, leading to accurate predictions of investment risk levels.
Model Evaluation and Conclusion
Upon evaluation, the XGBoost model showcased exceptional performance, achieving an accuracy rate of 100%. The model accurately classified all samples across both classes, showcasing its superior sensitivity, specificity, and precision. However, given the presence of class imbalance in the dataset, further testing on larger datasets may be warranted to gauge the model’s generalizability.
In conclusion, this project demonstrates the efficacy of XGBoost in predicting investment risk levels with the added advantage of robust outlier handling capabilities. For detailed insights and access to the dataset, refer to the following links: