Analysis and data processing systems

ANALYSIS AND DATA PROCESSING SYSTEMS

Print ISSN: 2782-2001          Online ISSN: 2782-215X
English | Русский

Recent issue
№2(98) April - June 2025

Automatic selection of leading indicators for regional labor market forecasting

Issue No 4 (77) October - December 2019
Authors:

Timofeeva Anastasia Yu.
Abstract

The problem of selection of most informative variables from a set of candidate predictors arises when constructing forecast models based on leading indicators. This problem can be solved by embedded methods, such as the LASSO regression, or filter methods, for example, using the correlation-based feature selection. The task of this paper is to compare the performance of these methods with alternative approaches to the time series analysis (ARIMA, the Holt-Winters model, and exponential smoothing). For this, an algorithm for constructing forecast models is proposed, including automatic selection of leading indicators. To conduct an empirical study, indicators suitable for regional labor market predicting were selected from official statistics. They describe indicators such as money supply, a balance sheet structure of credit institutions and a price index. A pseudo-out-of-sample forecasting of a number of indicators characterizing the situation in the registered labor market of the Novosibirsk Region for the period from 2015 to 2018 was carried out. Direct multi-step forecasts were computed for horizons of 6months. It turned out that a stable modification of the LASSO, the percentile-lasso, does not give any advantages in terms of average absolute forecast errors. In most cases, the best results were obtained using the LASSO regression with the choice of the regularization parameter according to the one standard error rule based on block cross-validation with 10 blocks selected at random. Due to the automatic selection of leading indicators, it was possible to reduce forecasting errors in comparison with alternative methods. Thus, the proposed algorithm is appropriate for regional labor market predicting.


Keywords: variable selection, LASSO regression, percentile-lasso, correlation-based feature selection, leading indicators, labor market, forecasting, ARIMA, Holt-Winters model, STL algorithm

References

1. Stock J., Watson M. Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association, 2002, vol. 297, pp. 1167–1179. DOI: 10.1198/016214502388618960.



2. Boivin J., Ng S. Are more data always better for factor analysis? Journal of Econometrics, 2006, vol. 132, no. 1, pp. 169–194. DOI: 10.1016/j.jeconom.2005.01.027.



3. Tibshirani R. Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2011, vol. 73, no. 3, pp. 273–282. DOI: 10.1111/j.1467-9868.2011.00771.x.



4. Sagaert Y.R., Aghezzaf E.H., Kourentzes N., Desmet B. Tactical sales forecasting using a very large set of macroeconomic indicators. European Journal of Operational Research, 2018, vol. 264, no. 2, pp. 558–569. DOI: 10.1016/j.ejor.2017.06.054.



5. Bulligan G., Marcellino M., Venditti F. Forecasting economic activity with targeted predictors. International Journal of Forecasting, 2015, vol. 31, no. 1, pp. 188–206. DOI: 10.1016/j.ijforecast.2014.03.004.



6. Ma S., Fildes R., Huang T. Demand forecasting with high dimensional data: the case of SKU retail sales forecasting with intra-and inter-category promotional information. European Journal of Operational Research, 2016, vol. 249, no. 1, pp. 245–257. DOI: 10.1016/j.ejor.2015.08.029.



7. Hall M.A. Correlation-based feature selection for machine learning. PhD thesis. Hamilton, University of Waikato, 1999.



8. Timofeeva A.Y., Mezentsev Y.A. Forecasting using predictor selection from a large set of highly correlated variables. CEUR Workshop Proceedings, 2019, vol. 2416. Information Technology and Nanotechnology: Data Science, pp. 10–18.



9. Lund K.V. The instability of cross-validated LASSO. Master’s thesis. Faculty of Mathematics and Natural Sciences, University of Oslo, 2013.



10. Roberts S., Nowak G. Stabilizing the lasso against cross-validation variability. Computational Statistics and Data Analysis, 2014, vol. 70, pp. 198–211. DOI: 10.1016/j.csda.2013.09.008.



11. Cleveland R.B., Cleveland W.S., McRae J.E., Terpenning I. STL: a seasonal-trend decomposition procedure based on loess. Journal of Official Statistics, 1990, vol. 6, pp. 3–73.



12. Brockwell P.J., Davis R.A., Calder M.V. Introduction to time series and forecasting. New York, Springer, 2002. 425 p.



13. Open source and enterprise-ready professional software for data science. Available at: https://rstudio.com/ (accessed 12.12.2019).



14. Statisticheskaya informatsiya o situatsii na registriruemom rynke truda [Statistical information on the situation in the registered labor market]. Rostrud [Federal Service for Labour and Employment]: website. Available at: https://www.rostrud.ru/rostrud/deyatelnost/?CAT_ID=6293 (accessed 12.12.2019).



15. EMISS (Unified interdepartmental information and statistical system): website. (In Russian). Available at: https://fedstat.ru (accessed 12.12.2019).

Acknowledgements. Funding

The research was carried out with the financial support of the RFBR / RGNF, grant no. 17-32-01087 A2.

For citation:

Timofeeva A.Yu. Avtomaticheskii podbor operezhayushchikh indikatorov dlya prognozirova-niya sostoyaniya regional'nogo rynka truda [Automatic selection of leading indicators for regional labor market forecasting]. Nauchnyi vestnik Novosibirskogo gosudarstvennogo tekhnicheskogo universitetaScience bulletin of the Novosibirsk state technical university, 2019, no. 4 (77), pp. 85–98. DOI: 10.17212/1814-1196-2019-4-85-98.

Views: 1958