Abstract
The problem of regression dependence recovery by the method of support vector machines with a quadratic loss function is studied in the paper. This method belongs to the kernel technique class. To set a number of LS–SVM algorithm internal parameters the problem of obtaining a test sample is discussed. Various criterions of model selection which are based on partitioning the sample into learning and test parts are presented. The problem of partitioning the sample into learning and test parts with the use of the D-optimal experiment planning method is considered in detail for the case of linear parametric regression models. This method of obtaining test samples is proposed to use for the LS–SVM method. A sequential algorithm is presented for obtaining the learning and test parts of the sample observations as applied to the LS–SVM method. To verify the efficiency of the proposed method of partitioning samples a computational experiment was conducted. An improvement of the LS–SVM solution accuracy was achieved by selecting the scale of the Gaussian kernel function. This parameter of the kernel function was selected by minimizing a prediction error in the sample test part. The final solution accuracy was tested by the mean-square error method. The computational experiment was carried out on simulated data. A nonlinear dependence on the input factor was selected as a data generating model. The variance of the noise (noise level) was determined as a percentage of a signal power. Two ways of sample partitioning were compared on the learning and test parts, namely random partitioning and partitioning by the D-optimal experiment planning method. The cross-validation criterion was also used to select the LS–SVM algorithm parameters. The results of computational experiments are given in tables and figures. Based on the results of the computational experiments, conclusions are made that the results of using a random test sample are unstable and largely depend on the specific partitioning option. Whereas the stability of test sample results obtained by D-optimal sample partitioning is much higher.
Keywords: regression, LS-SVM method, quadratic loss function, test sample training sample, optimal experiment planning, D-optimal plan, regularity criterion, cross-validation criterion, regularization coefficient, kernel function, mean square error
References
1. Perel'man I.I. Metodologiya vybora struktury modeli pri identifikatsii ob"ektov upravleniya [A methodology for the selection of the model structure when identification of objects of management]. Avtomatika i telemekhanika – Automation and Remote Control, 1983, no. 11, pp. 5–29. (In Russian)
2. Romanov V.L. Vybor nailuchshei lineinoi regressii: sravnenie formal'nykh kriteriev [The selection of the best linear regression: a comparison of formal criteria]. Zavodskaya laboratoriya – Industrial Laboratory, 1990, no. 1, pp. 90–95. (In Russian)
3. Seber G.A.F. Linear regression analysis. New York, Wiley, 1977 (Russ. ed.: Seber Dzh. Lineinyi regressionnyi analiz. Moscow, Mir Publ., 1980. 456 p.).
4. Stepashko V.S., Kocherga Yu.L. Metody i kriterii resheniya zadach strukturnoi identifikatsii [Methods and criteria of the solving problems of structural identification]. Avtomatika – Soviet Journal of Automation and Information Sciences, 1985, no. 5, pp. 29–37. (In Russian)
5. Kocherga Yu.L. J-optimal'naya reduktsiya struktury modeli v skheme Gaussa–Markova [J-optimal reduction of structure of model in the scheme of Gauss–Markov]. Avtomatika – Soviet Journal of Automation and Information Sciences, 1988, no. 4, pp. 34–38. (In Russian)
6. Sarychev A.P. Usrednennyi kriterii regulyarnosti metoda gruppovogo ucheta argumentov v zadache poiska nailuchshei regressii [The averaged regularity criterion of group method of accounting arguments in the problem of finding the best regression]. Avtomatika – Soviet Journal of Automation and Information Sciences, 1990, no. 5, pp. 28–33. (In Russian)
7. Stepashko V.S. Asimptoticheskie svoistva vneshnikh kriteriev vybora modelei [The asymptotic properties of the external criteria of selection models]. Avtomatika – Soviet Journal of Automation and Information Sciences, 1988, no. 6, pp. 75–82. (In Russian)
8. Stepashko V.S. Potentsial'naya pomekhoustoichivost' modelirovaniya po kombinatornomu algoritmu MGUA bez ispol'zovaniya informatsii o pomekhakh [The potential noise immunity of modeling by combinatorial GMDH algorithm without using the interference information]. Avtomatika – Soviet Automatic Control, 1983, no. 3, pp. 18–28. (In Russian)
9. Stepashko V.S. Selektivnye svoistva kriteriya neprotivorechivosti modelei [The selective properties of the consistency criterion of models]. Avtomatika – Soviet Journal of Automation and Information Sciences, 1986, no. 2, pp. 40–49. (In Russian)
10. Popov A.A. [The methods of experimental planning in problems of optimal complexity models synthesis]. Mashinnye metody planirovaniya eksperimenta i optimizatsii mnogofaktornykh sistem [The machine methods of experimental planning and optimization of multifactor systems]. Novosibirsk electrotechnical institute. Novosibirsk, 1987, pp. 54–58.
11. Popov A.A. [The use of repeated samples in the criteria of selection models]. Planirovanie eksperimenta, identifikatsiya, analiz i optimizatsiya mnogofaktornykh sistem [Experiment planning, identification, analysis and optimization of multifactor systems]. Novosibirsk electrotechnical institute. Novosibirsk, 1990, pp. 82–88.
12. Lisitsin D.V., Popov A.A. Vybor struktury dlya mnogomernoi dinamicheskoi sistemy [The selection of structure for multidimensional dynamic systems]. Sbornik nauchnykh trudov Novosibirskogo gosudarstvennogo tekhnicheskogo universiteta – Transaction of scientific papers of the Novosibirsk state technical university, 1997, no. 1 (6), pp. 33–40.
13. Lisitsin D.V., Popov A.A. [A researching of selection criteria of multidimensional models in the presence of different types of factors]. Trudy III mezhdunarodnoi nauchno-tekhnicheskoi konferentsii "Aktual'nye problemy elektronnogo priborostroeniya" APEP-96 [Proceedings of Third international scientific-technical conference "Actual problems of electronic instrument engineering" APEIE-96]. Novosibirsk, 1996, vol. 6, pt. 1, pp. 54–58.
14. Lisitsin D.V., Popov A.A. Issledovanie kriteriev selektsii mnogootklikovykh regressionnykh modelei [A researching of selection criteria of multiresponse regression models]. Sbornik nauchnykh trudov Novosibirskogo gosudarstvennogo tekhnicheskogo universiteta – Transaction of scientific papers of the Novosibirsk state technical university, 1996, no. 2, pp. 19–28.
15. Lisitsin D.V., Popov A.A. Konstruirovanie kriteriev selektsii mnogomernykh regressionnykh modelei [The development of selection criteria of multidimentional regression models]. Sbornik nauchnykh trudov Novosibirskogo gosudarstvennogo tekhnicheskogo universiteta – Transaction of scientific papers of the Novosibirsk state technical university, 1996, no. 1, pp. 13–20.
16. Lisitsin D.V., Popov A.A. [The structural optimization of multidimentional regression models]. Vtoroi Sibirskii Kongress po prikladnoi i industrial'noi matematike: tezisy dokladov [The Second Siberian Congress on Industrial and Applied Mathematics: abstracts]. Novosibirsk, 1996, p. 179.
17. Popov A.A. Planirovanie eksperimenta v zadachakh razbieniya vyborki v MGUA [The experiment planning in problems of splitting the sample in GMDH]. Sbornik nauchnykh trudov Novosibirskogo gosudarstvennogo tekhnicheskogo universiteta – Transaction of scientific papers of the Novosibirsk state technical university, 1995, no. 2, pp. 35–40.
18. Popov A.A. Razbienie vyborki dlya vneshnikh kriteriev selektsii modelei s ispol'zovaniem metodov planirovaniya eksperimenta [The splitting of sample for the external criteria of selection models using experiment planning methods]. Zavodskaya laboratoriya. Diagnostika materialov – Industrial laboratory. Materials diagnostics, 1997, no. 1, pp. 49–53. (In Russian)
19. Yurachkovskii Yu.P., Groshkov A.N. Primenenie kanonicheskoi formy vneshnikh kriteriev dlya is-sledovaniya ikh svoistv [The use of canonical form of external criteria for the research of their properties]. Avtomatika – Soviet Automatic Control, 1979, no. 3, pp. 85–89. (In Russian)
20. Fedorov V.V. [The active regression experiments]. Matematicheskie metody planirovaniya eksperimenta [The mathematical methods of experimental planning]. Novosibirsk, Nauka Publ., 1987, pp. 19–73.
21. Popov A.A. Optimal'noe planirovanie eksperimenta v zadachakh strukturnoi i parametricheskoi identifikatsii modelei mnogofaktornykh sistem [The optimal experiment planning in problems of structural and parametric identification of multifactor systems models]. Novosibirsk, NSTU Publ., 2013. 296 p.
22. Suykens J.A.K., Gestel T. van, Brabanter J. de, Moor B. de, Vandewalle J. Least square support vector machines. New Jersey, World Scientific, 2002. 290 p.
23. Popov A.A., Boboev Sh.A. Postroenie regressionnykh zavisimostei s ispol'zovaniem kvadratichnoi funktsii poter' v metode opornykh vektorov [The construction of a regression relationships using least square in support vector machines]. Sbornik nauchnykh trudov Novosibirskogo gosudarstvennogo tekhnicheskogo universiteta – Transaction of scientific papers of the Novosibirsk state technical university, 2015, no. 3 (81), pp. 69–78.
24. Popov A.A. Posledovatel'nye skhemy postroeniya optimal'nykh planov eksperimenta [The sequential schemes constructing of the optimal experiment plans]. Sbornik nauchnykh trudov Novosibirskogo gosudarstvennogo tekhnicheskogo universiteta – Transaction of scientific papers of the Novosibirsk state technical university, 1995, no. 1, pp. 39–44.
25. Popov A.A. Posledovatel'nye skhemy sinteza optimal'nykh planov eksperimenta [Sequential schemes of synthesis of optimum plans of experiment]. Doklady Akademii nauk vysshei shkoly Rossiiskoi Federatsii – Proceedings of the Russian higher school Academy of sciences, 2008, no. 1 (10), pp. 45–55.