Volume 68 | Issue 2 | Year 2022 | Article Id. IJMTT-V68I2P504 | DOI : https://doi.org/10.14445/22315373/IJMTT-V68I2P504
The presence of insignificant predictors in models causes estimation bias and reduces prediction precision. Collinearity among predictors is a common problem that renders the design matrix unstable leading to unreliable OLS coefficient estimates. Multiple linear regression analysis in a non-regularized routine is unsatisfactory due to poor prediction as the inclusion of all variables reduces noise but increases variance and for interpretation, it becomes necessary to identify the important predictors that have a high influence on the response variable. The study implements the Bayesian Stochastic Search Variable selection (B-SSVS) algorithm in the context of multiple linear regression with the incorporation of a correlation factor prior specification to address the correlation problem which reduces the performance of the Markov chain Monte Carlo and Gibbs sampling process. Further, comparative analysis on variable selection performance with classical penalized methods Elastic Net and Least Absolute Shrinkage Selection Operator (Lasso) is done using simulated data. We found that B-SSVS with a correlation factor prior showed good performance, mixing and convergence properties based on the diagnostic tests. B-SSVS performed better in variable selection compared to Elastic Net and Lasso shrinkage methods. We also found out that Elastic Net outperforms Lasso in detecting the true predictors and has less cross-validation mean squared error.
[1] Ijomah Maxwell. A., & Nwali Obinna, A. Comparative Study of Some Variable Selection Techniques In Logistic Regression. European Journal of Mathematics and Com38puter Science, (2018).
[2] Chipman, H., George, E. I., McCulloch, R. E., Clyde, M., Foster, D. P., and Stine, R. 53 A. The practical implementation of Bayesian model selection. Lecture Notes Monograph Series JSTOR , (2001) 65-134
[3] Kojima, M., and Komaki, F. Determinantal point process priors for Bayesian variable selection in linear regression. Statistica Sinica, (2016) 97-117.
[4] Raftery, A. E. Bayesian model selection in social research. Sociological methodology, (1995) 111–163.
[5] Swartz, M. D., Yu, R. K., and Shete, S. Finding factors influencing risk: comparing Bayesian stochastic search and standard variable selection methods applied to logistic regression models of cases and controls. Statistics in medicine, 27(29) (2008) 6158-6174.
[6] Kwon, D., Landi, M. T., Vannucci, M., Issaq, H. J., Prieto, D., and Pfeiffer, R. M. An efficient stochastic search for Bayesian variable selection with high-dimensional correlated predictors. Computational statistics & data analysis, 55(10) (2011) 2807-2818
[7] Wang, Y., and Witten, I. H. Pace regression (Working paper 99/12). Hamilton, New Zealand: University of Waikato, Department of Computer Science, (1999).
[8] Perrakis, K., & Ntzoufras, I. Stochastic Search Variable Selection (SSVS). Wiley. DOI: 10.1002/9781118445112.stat07829
[9] Yang, X., Belin, T. R., & Boscardin, W. J., Imputation and Variable Selection in Linear Regression Models with Missing Covariates. Biometrics, 61 (2015) 498–506. DOI: 10.1111/j.1541- 0420.2005.00317.x
[10] Bainter, S. A., McCauley, T. G., Wager, T., and Losin, E. A. Improving practices for selecting a subset of important predictors in psychology: An application to predicting pain. Advances in Methods and Practices in Psychological Science, 3 (2020) 66–80. 80. DOI:10.1177/251524591988
[11] Chen, C. W., Dunson, D. B., Reed, C., and Yu, K. Bayesian variable selection in quantile regression. Statistics and its Interface, 6 (2013) 261–274.
[12] George, E. I., and McCulloch, R. E. Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88(423) (1993) 881-889.
[13] Kojima, M., & Komaki, F. Determinantal point process priors for Bayesian variable selection in linear regression. Statistica Sinica, (2016) 97-117.
[14] Krishna, A., Bondell, H. D., and Ghosh, S. K. Bayesian variable selection using an adaptive powered correlation prior. Journal of statistical planning and inference, 139(8) (2009) 2665-2674
[15] George, E. I., and McCulloch, R. E. Approaches for Bayesian variable selection. Statistica Sinica, (1997) 339–373.
[16] Yuan, M., and Lin, Y. Efficient empirical Bayes variable selection and estimation in linear models. Journal of the American Statistical Association, 100(472) (2005)1215-1225.
[17] Vats, D., and Knudson, C. Revisiting the Gelman-Rubin diagnostic. Statistical Science, 36(4): (2021) 518—529.
[18] Geweke, J., Gowrisankaran, G., and Town, R. J. Bayesian inference for hospital quality in a selection model. Econometrica, 71 (2003) 1215–1238.
[19] Cowles, M. K., and Carlin, B. P. Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review. Journal of the American Statistical Association, 91 (1996) 883–904. DOI:10.1080/01621459.1996.10476956
[20] Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58 (1996) 267–288. DOI; 10.1111/j.2517- 6161.1996.tb02080.x
[21] Zou, H., and Hastie, T. Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 67(2) (2005) 301-320.
[22] Ijomah M., A., and Chris-Chinedu, J., N. Jackknife And Bootstrap Techniques In The Estimation of regression Parameters" International Journal of Mathematics Trends and Technology 65(12) (2019) 25-35
[23] Srivastava, S., and Chen, L. Comparison between the stochastic search variable selection and the least absolute shrinkage and selection operator for genome-wide association studies of rheumatoid arthritis. In BMC Proceedings 3(7) (2009) 1-7. Biomed Central.
[24] Lin, C. Y. Stochastic search variable selection for split-plot and blocked screening designs. Journal of Quality Technology, 53(1) (2021) 72-87.
[25] Martin L., W., and L.Maria Alphonse Ligori ., A Modified Least-Squares Approach to Mitigate the Effect of Collinearity in Two- Variable Regression Models, International Journal of Mathematics Trends and Technology (IJMTT), V30(1) (2016) 48-ISSN:2231-5373. www.ijmttjournal.org. Published by Seventh Sense Research Group.
[26] Qingli Pan., An Improved Two-Stage Estimator of Simultaneous Equations Models, International Journal of Mathematics Trends and Technology, 65(1) (2019) 53-56.
Christabel Nyanchama Bisonga, Oscar Owino Ngesa, Martine Odhiambo Oleche, "A Comparative Study of Bayesian Stochastic Search Variable Selection Approach in Multiple Linear Regression," International Journal of Mathematics Trends and Technology (IJMTT), vol. 68, no. 2, pp. 19-27, 2022. Crossref, https://doi.org/10.14445/22315373/IJMTT-V68I2P504