Analysis of Education Systems Performance in European Countries by Means of PCA-DEA

Many countries have focused on the improvement of education system performance. Small number of studies consider system of a country as unit of assessment where indicators represent all levels of education system. In the paper, we propose the methodology for the performance analysis of education systems as a whole hybridizing Data Envelopment Analysis and Principal Component Analysis. Its applicability is illustrated by the analysis of the data collected for 29 European countries. In the analysis we used publicly available data from EUROSTAT and OECD which European Commission uses for the performance monitoring of education in European Union. No prior assumptions were made or expert judgements included. We demonstrated good performance of the method on limited data set. The proposed methodology of hybrid Data Envelopment Analysis and Principal Component Analysis allows researchers analyse education systems quantitatively. The recommendations for improvements and assessment of real world education systems should be based on the analysis of a sufficiently large data set comprehensively representing the considered education systems.


Introduction
Performance assessment of education systems plays a vital role worldwide. The European Commission monitors the performance of education systems in Member States according the strategic framework "Education and Training 2020 1 " (ET2020). However, the importance of performance of education systems are still underestimated. The evaluation of schools is a widely researched topic, but limited number of papers analyse the education sector as a whole (not only school level analysis). Also there are not so many papers whose focus on multi-country comparisons (Witte and López-Torres, 2017). This topic is especially urgent because of structural reforms of education systems in the new EU countries. Enlargement of the EU and accession of new member states in [2004][2005][2006][2007][2008][2009][2010][2011][2012][2013] was the beginning of the development of education systems in the new EU countries along the similar lines like in the Western countries. Post-communist countries in Europe after the structural reforms do not composed one type of education system and moved in different directions (Beblavy et al., 2011(Beblavy et al., , Želvys et al., 2017. Taking into account that, there is no one way to improve the education systems in all EU countries, it is important to estimate the performance of each country and to provide guidelines that particular country should follow to improve her performance of education system. As pointed out (Silva and Camanho, 2017) measuring performance in absolute terms is often less valuable than making comparisons with other countries, and provide examples of good education practices that under-performance countries should follow to improve their performance of education systems. Data Envelopment Analysis (DEA) have been proved as appropriate for the analysis of different sectors of the education systems. It is natural to investigate the applicability of DEA methodology to the analysis of an education system of a country as a whole.
DEA proposed by Charnes, Rhodes and Cooper (Charnes et al., 1979) is a widely used technique to analyse relative performance of systems in a large variety of fields. The original DEA model allows total exibility of the weights, i.e. each decision making unit (DMU) maximizes its efficiency score, given the inputs consumed and the outputs attained. The exibility of the weights' selection is strength and weakness of a DEA, as it allows some indicators to be assigned a zero weight. Due to full exibility, many DMUs will be able to achieve the maximum DEA efficiency score (Liu et al., 2006). DEA loses discriminatory power when the number of indicators increase compared with the number of the DMUs. Also the large dimension of the data set and correlations among indicators reduce the discrimination power of DEA and introduces bias (Nunamaker, 1985, Dyson et al., 2001. The first attempt to restrict the exibility of the weights and improve the discriminatory power of the DEA model was made by Thompson et al. (Thompson et al., 1990). They improved the discrimination between the DMUs' efficiency scores by defining ranges of acceptable weights. Assurance regions and Cone-ratio constraints are often used to restrict the weights and improve the discrimination power of the DEA. These techniques requires prior information, which is often difficult to attain (Adler and Golany, 2001). For the increase of discriminatory power of DEA, Torgersen et al. proposed slack-adjusted efficiency measure for the ranking of efficient units (Torgersen et al., 1996). The two-stage method did not use any prior information.
Other way dealing with discrimination issue, the dimension reduction of indicators before running DEA. Principal components analysis (PCA) could be employed where original number of indicators would be replaced by smaller number of principal components with a minimal loss of information (Adler and Golany, 2001). The idea to combine PCA and DEA was developed independently by Ueda and Hoshiai (Ueda and Hoshiai, 1997) and Golany, 2001, Adler andGolany, 2002). They showed that PCA can improve discriminatory power in DEA, which often fails when 247 there are too many of indicators in relation to the number of DMUs, and give more reliable efficiency measurement in small samples.
For the selection of an appropriate DEA method, we perform a preprocessing of the available data. Since the number of indicators selected is relatively large with respect to the number of countries analysed, involvement of PCA seems rational. The analysis of the Pareto frontier of the data set encouraged the application of convexity assumptions based methods. Since the analysis is oriented to aid structural reforms, the proportional improvement of indicators is not a concern. Thus PCA and the Additive Model based DEA were hybridised and investigated.
The objective of the study is to analyse performance of education systems by means of hybrid PCA and DEA approach. We quantitatively analyse performance of education systems in 29 European countries. For the analysis we use publicly available data for year 2013, 2014 and 2015. No prior information or expert judgement were used in the analysis.
The rest of the paper is organised as follows: we present the data analysed first, next we describe methodology of PCA-DEA, then we provide numerical example and discussion with conclusions finalise the paper.

Available Data
As described above, European Commission has The Education and Training Monitor initiative for monitoring and fostering performance of education systems in EU. Seven key indicators are selected for the monitoring and benchmarks reached by 2020 are set (ET2020 2 ). In the analysis we use 6 out of 7 key indicators, as learning mobility indicator still waits for appropriate compilation of cross-national data (Flisi et al., 2014). We added two additional indicators for the reflection of a wider range of learning activities (the higher-achievement in reading, maths, science and the minimum necessary qualifications to actively participate in social and economic life). In the further text  detones the th indicator for th country. The titles of variables and detailed description are the following:  • • ·1 : Early leavers from education and training (the percentage of the population aged 18-24 with at most lower secondary education and who were not in further education or training during the last four weeks preceding the survey);  • • ·2 : Tertiary education attainment (the share of the population aged 30-34 years who have successfully completed tertiary-level education);  • • ·3 : Early childhood education and care (the share of the population aged 4 to the age when the compulsory education starts who are participating in early education);  • • ·4 : Employment rate of recent graduates (the share of employed graduates (20-34 years) having left education and training 1-3 years before the reference year);  • • · 5 : Adult participation in lifelong learning (participation rate of adults (25-64 years) in education and training in last 4 weeks);  • • ·6 : Low achievement in reading, maths and science (the percentage of the PISA (The Programme for International Student Assessment) average score in reading, mathematics and science below Level 2);  • • ·7 : Top achievement in reading, maths and science (the percentage of the PISA average score in reading, mathematics and science at Level 5 or 6);  • • ·8 : Upper secondary or tertiary education attainment (the percentage of people at aged 25-64 who have successfully completed at least upper secondary education).
All variables were extracted from EUROSTAT and OECD (The Organisation for Economic Co-operation and Development) databases for 29 European countries over 2013-2015 year. European Commission monitors performance of 28 European countries (EU-28), we used 26 out of 28 countries, because Cyprus and Malta have insufficiently data set, also we added three European countries (Iceland, Norway and Switzerland), that are not in European Union. Missing data were replaced with the most recent year available. The indicators were adjusted as the profit type the larger the better, so that higher values of all indicators correspond to better performance. The values of indicators  ·1 and  ·6 were converted using the complement to 100 percent. The data used in the study is provided in Appendix Tables 7, 8 and 9.
Education performance can be analysed as the degree to which an education system achieve desired goals and effects. In the context of education systems in European countries, goals and effects might be represented in terms of education systems achievement according to the framework ET2020, an education system that contributes to greater levels of these achievement is considered more effective than another education system.
From Table 1 we see that on average all variables except  ·6 and  ·7 indicate better performance of European countries in 2015 compare to 2013. An average the target of  ·1 is already reached in 2014 however at country level there are some countries those have not reached the target. The performance of education systems should be improved with respect to  ·2 - ·6 variables in almost all countries. The correlation analysis for selected variables revealed that some of variables are medium to high correlated (see Appendix Table 10) which indicates that PCA application is plausible. Next we describe hybrid PCA and DEA approach which was employed to quantify the performance of each country in the analysis. The selected indicators will serve as output indicators only in DEA. Table 1 Descriptive statistics of analysed indicators for 29 European countries with ET2020 targets

Analysis of Data by Means of PCA-DEA
Because of different historical obstacles, economical and political conditions, the achievements of different countries with respect of the considered criteria are quite different. The comparison of these achievements is a problem of multicriteria evaluation.
In the present section we will analyse data about education systems by means of DEA. For the convenience of references, here the terminology of DEA will be used. Let , are criteria representing the efficiency of -th DMU. These criteria represent the performance of the education system of the corresponding country. The values of  for the considered years are given in Appendix Tables 7, 8 and 9. As explained in the previous section performance is an increasing function of all criteria. The crudest evaluation is the partition of the set , •= 1•••••29, into subsets of dominating (Pareto optimal) and dominated vectors. We have applied a standard algorithm to find Pareto optimal vectors for the data analysed. The Pareto optimal vectors are defined by the shorthand of names of the corresponding countries in Table 2 for each year.
The obtained results show that only a third of DMUs are not Pareto-optimal. In the case when Pareto optimal points belong to the convex hull of the data set, the application of a standard DEA technique, e.g. CCR (Cooper et al., 2006), defines the unit efficiency of all Pareto optimal DMUs. In our case, this happens as shown by the results of a numerical experiment. The results of the experiment, although itself not very interesting, encourages the application of methods based on convexity assumptions. Therefore, the convex hull based frontier can be preferred against the free disposal hull frontier. For the discussion on potential disadvantages of the weighted sum aggregation of criteria in the case of non-convex feasible objective region we refer to (Pardalos et al., 2017).
For the discrimination of the Pareto optimal DMUs additional assumptions should be made with respect to the involved criteria. Several DEA methods have been proposed where the different treatment of criteria is implied implicitly by formulating restrictions for weights in the DEA models. This idea is implemented in the assurance region (Thompson et al., 1986) and cone-ratio methods (Charnes et al., 1990). However, in the considered problem we do not have rational argument to substantiate a magnitude of restrictions.
An alternative option is to aggregate criteria potentially reducing the scatter of data in the criteria space. Principal component analysis is widely used for reducing dimensionality of data in various applications. The hybridization of PCA with DEA has been proposed quite recently (Ueda and Hoshiai, 1997), and yet thoroughly investigated and applied (Adler and Golany, 2002, Adler and Berechman, 2001, Adler and Golany, 2001, Adler and Yazhemsky, 2010, Azadeh et al., 2007, Põldaru et al., 2014, Chen et al., 2016, Annapoorni and Prakash, 2017, Jothimani et al., 2017. This hybrid method seems attractive since a priori does not require any assumptions about relations between the considered criteria. We have applied an algorithm of PCA to the data in the three considered years. Since the first four components explain about 90% of variance (see Table 3) they are considered as representing the data sufficiently well. As we see in Fig. 1 according to the first and the second principal coordinates all Pareto optimal countries (marked as diamond) except Croatia come into one group.

251
Further we apply DEA for the projection of the original data to the subspace defined by four eigenvectors of the covariance matrix (computed by means of a PCA algorithm). For the details we refer to (Adler and Golany, 2001) where an excellent description of the algorithm is presented. The four dimensional projections of data are denoted by •, •= 1•••••29. The attention should be paid to the unfavorable results of PCA where some elements of the transformed data sets are negative. Recall that the data with negative elements is not appropriate in the context of the original formulation of the DEA problems. However, alternative DEA models have been developed which are translation invariant (Banker et al., 1984, Ali andSeiford, 1990). We will use the so called Additive Model which maintains translation invariance in the analysis of entirely output data (Ali and Seiford, 1990). Let us note, that the Additive Model is also used in (Adler and Golany, 2001).
Since we consider entirely the output data, the Additive Model is defined as the following problem of linear programming ets are negative. Recall that the data with negative elements is not iate in the context of the original formulation of the DEA problems. er, alternative DEA models have been developed which are translation nt [Banker et al., 1984, Ali andSeiford, 1990]. We will use the so called ve Model which maintains translation invariance in the analysis of entirely data [Ali and Seiford, 1990]. Let us note, that the Additive Model is also [Adler and Golany, 2001]. ce we consider entirely the output data, the Additive Model is defined as lowing problem of linear programming e solution of (1) z i is equal to the L 1 (city block) distance from the vector he efficiency frontier, i.e. to the Pareto optimal segment of the convex p of the data vectors. Thus, the equality z i = 0 is valid for the efficient . Correspondingly, the non-zero distance is a measure of inefficiency. The of z i for all DMUs are summarized in Table 4 and together with slacks h of four constraints of the linear programming problem (1) are provided endix Tables 11, 12, 13. e distance fromỸ i to the efficiency frontier can be used as a criterion for g the inefficient DMUs. The non-zero slacks show the potential improve- The solution of (1)  is equal to the 1 (city block) distance from the vector   to the efficiency frontier, i.e. to the Pareto optimal segment of the convex envelop of the data vectors. Thus, the equality  = 0 is valid for the efficient DMUs. Correspondingly, the non-zero distance is a measure of inefficiency. The values of  for all DMUs are summarized in Table 4 and together with slacks for each of four constraints of the linear programming problem (1) are provided in Appendix Tables 11, 12, 13.
The distance from • to the efficiency frontier can be used as a criterion for ranking the inefficient DMUs. The non-zero slacks show the potential improvement quantities along directions of principal components. However, these potential efficiency improvements does not have proper interpretation. To obtain estimates of potential efficiency improvements with respect of original criteria the slacks can be expressed in terms of original data using loadings of principal components. However, this problem has no unambiguous solution. For example, the PCA coordinates of vectors in the original space are obtained by linear projecting where a "shadow effect" can emerge. The mutual distances of images can be better preserved using non-linear projecting methods, e.g. Multidimensional Scaling (MDS) (Borg andGroenen, 1997, Žilinskas andŽilinskas, 2006). The attractive idea of hybridization of DEA and MDS, however is very new and still not mature.
Therefore we have applied an alternative option to express desired improvements for non-efficient DMUs. Let data sets are negative. Recall that the data with negative elements is not apprpriate in the context of the original formulation of the DEA problems. However, alternative DEA models have been developed which are translation invariant [Banker et al., 1984, Ali andSeiford, 1990]. We will use the so called Additive Model which maintains translation invariance in the analysis of entirely output data [Ali and Seiford, 1990]. Let us note, that the Additive Model is also used in [Adler and Golany, 2001].
Since we consider entirely the output data, the Additive Model is defined as the following problem of linear programming The solution of (1) z i is equal to the L 1 (city block) distance from the vector Y i to the efficiency frontier, i.e. to the Pareto optimal segment of the convex envelop of the data vectors. Thus, the equality z i = 0 is valid for the efficient DMUs. Correspondingly, the non-zero distance is a measure of inefficiency. The values of z i for all DMUs are summarized in Table 4 and together with slacks for each of four constraints of the linear programming problem (1) are provided in Appendix Tables 11, 12, 13.
The distance fromỸ i to the efficiency frontier can be used as a criterion for ranking the inefficient DMUs. The non-zero slacks show the potential improvement quantities along directions of principal components. However, these potential efficiency improvements does not have proper interpretation. To obtain estimates of potential efficiency improvements with respect of original criteria the slacks can be expressed in terms of original data using loadings of principal components. However, this problem has no unambiguous solution. For example, the PCA coordinates of vectors in the original space are obtained by linear projecting where a "shadow effect" can emerge. The mutual distances of of images can be better preserved using non-linear projecting methods, e.g. Multidimensional Scaling (MDS) [Borg andGroenen, 1997,Žilinskas andŽilinskas, 2006]. The attractive idea of hybridization of DEA and MDS, however is very new and still not mature.
Therefore we have applied an alternative option to express desired improvements for non-efficient DMUs. Let Λ i denote the optimal vector of weights corresponding to the solution of linear programming problem (1) withỸ i as left hand side of the constraints where z i > 0. Since the i-th DMU is inefficient its improvement seems desirable. A reasonable target to achieve can be formulated as where λ i j , j = 1, . . . , 29, are components of Λ i .  denote the optimal vector of weights corresponding to the solution of linear programming problem (1) with • as left hand side of the constraints where  •0. Since the -th DMU is inefficient its improvement seems desirable. A reasonable target to achieve can be formulated as (2) Ỹ 1 ,Ỹ 2 , . . . ,Ỹ 29 ), e = (1, . . . , 1) T . tion of (1) z i is equal to the L 1 (city block) distance from the vector ciency frontier, i.e. to the Pareto optimal segment of the convex he data vectors. Thus, the equality z i = 0 is valid for the efficient respondingly, the non-zero distance is a measure of inefficiency. The for all DMUs are summarized in Table 4 and together with slacks our constraints of the linear programming problem (1) are provided Tables 11, 12, 13. ance fromỸ i to the efficiency frontier can be used as a criterion for inefficient DMUs. The non-zero slacks show the potential improveties along directions of principal components. However, these poency improvements does not have proper interpretation. To obtain potential efficiency improvements with respect of original criteria n be expressed in terms of original data using loadings of principal . However, this problem has no unambiguous solution. For example, rdinates of vectors in the original space are obtained by linear proe a "shadow effect" can emerge. The mutual distances of of images r preserved using non-linear projecting methods, e.g. Multidimeng (MDS) [Borg andGroenen, 1997,Žilinskas andŽilinskas, 2006]. ve idea of hybridization of DEA and MDS, however is very new and ure. e we have applied an alternative option to express desired improveon-efficient DMUs. Let Λ i denote the optimal vector of weights g to the solution of linear programming problem (1) withỸ i as left the constraints where z i > 0. Since the i-th DMU is inefficient its t seems desirable. A reasonable target to achieve can be formulated (1) z i is equal to the L 1 (city block) distance from the vector frontier, i.e. to the Pareto optimal segment of the convex vectors. Thus, the equality z i = 0 is valid for the efficient dingly, the non-zero distance is a measure of inefficiency. The DMUs are summarized in Table 4 and together with slacks straints of the linear programming problem (1) are provided s 11, 12, 13. omỸ i to the efficiency frontier can be used as a criterion for ent DMUs. The non-zero slacks show the potential improveong directions of principal components. However, these poprovements does not have proper interpretation. To obtain tial efficiency improvements with respect of original criteria xpressed in terms of original data using loadings of principal ver, this problem has no unambiguous solution. For example, es of vectors in the original space are obtained by linear proadow effect" can emerge. The mutual distances of of images rved using non-linear projecting methods, e.g. Multidimen-S) [Borg andGroenen, 1997,Žilinskas andŽilinskas, 2006]. of hybridization of DEA and MDS, however is very new and ave applied an alternative option to express desired improveient DMUs. Let Λ i denote the optimal vector of weights e solution of linear programming problem (1) withỸ i as left nstraints where z i > 0. Since the i-th DMU is inefficient its s desirable. A reasonable target to achieve can be formulated . . , 29, are components of Λ i .

8
, •= 1•••••29, are components of data sets are negative. Recall that the data with negative elements is not apprpriate in the context of the original formulation of the DEA problems. However, alternative DEA models have been developed which are translation invariant [Banker et al., 1984, Ali andSeiford, 1990]. We will use the so called Additive Model which maintains translation invariance in the analysis of entirely output data [Ali and Seiford, 1990]. Let us note, that the Additive Model is also used in [Adler and Golany, 2001].
Since we consider entirely the output data, the Additive Model is defined as the following problem of linear programming . . . ,Ỹ 29 ), e = (1, . . . , 1) T . The solution of (1) z i is equal to the L 1 (city block) distance from the vector Y i to the efficiency frontier, i.e. to the Pareto optimal segment of the convex envelop of the data vectors. Thus, the equality z i = 0 is valid for the efficient DMUs. Correspondingly, the non-zero distance is a measure of inefficiency. The values of z i for all DMUs are summarized in Table 4 and together with slacks for each of four constraints of the linear programming problem (1) are provided in Appendix Tables 11, 12, 13.
The distance fromỸ i to the efficiency frontier can be used as a criterion for ranking the inefficient DMUs. The non-zero slacks show the potential improvement quantities along directions of principal components. However, these potential efficiency improvements does not have proper interpretation. To obtain estimates of potential efficiency improvements with respect of original criteria the slacks can be expressed in terms of original data using loadings of principal components. However, this problem has no unambiguous solution. For example, the PCA coordinates of vectors in the original space are obtained by linear projecting where a "shadow effect" can emerge. The mutual distances of of images can be better preserved using non-linear projecting methods, e.g. Multidimensional Scaling (MDS) [Borg andGroenen, 1997,Žilinskas andŽilinskas, 2006]. The attractive idea of hybridization of DEA and MDS, however is very new and still not mature.
Therefore we have applied an alternative option to express desired improvements for non-efficient DMUs. Let Λ i denote the optimal vector of weights corresponding to the solution of linear programming problem (1) withỸ i as left .

Numerical Example
The methodology described in the previous section is illustrated by its application in formulating targets to the inefficient DMUs. For a numerical example, let consider Bulgarian (BLG) and Latvia (LVA). We propose to define a peer by (2) where can be better preserved using non-linear projecting methods, e.g. M sional Scaling (MDS) [Borg and Groenen, 1997,Žilinskas andŽilin The attractive idea of hybridization of DEA and MDS, however is ve still not mature. Therefore we have applied an alternative option to express desire ments for non-efficient DMUs. Let Λ i denote the optimal vector corresponding to the solution of linear programming problem (1) wi hand side of the constraints where z i > 0. Since the i-th DMU is in improvement seems desirable. A reasonable target to achieve can be as where λ i j , j = 1, . . . , 29, are components of Λ i . 8 are obtained as the solution of (1). The non zero jecting where a "shadow effect" can emerge. The mutual distances of of images can be better preserved using non-linear projecting methods, e.g. Multidimensional Scaling (MDS) [Borg andGroenen, 1997,Žilinskas andŽilinskas, 2006]. The attractive idea of hybridization of DEA and MDS, however is very new and still not mature.
Therefore we have applied an alternative option to express desired improvements for non-efficient DMUs. Let Λ i denote the optimal vector of weights corresponding to the solution of linear programming problem (1) withỸ i as left hand side of the constraints where z i > 0. Since the i-th DMU is inefficient its improvement seems desirable. A reasonable target to achieve can be formulated as where λ i j , j = 1, . . . , 29, are components of Λ i .

8
defines -th DMU as a peer for -th DMU. For Table 4, Bulgaria is one of the countries with the largest distance to the efficiency frontier (• = 724 of the year 2015). The peers (defined by the non-zero weights) are Denmark (weight is equal to 0.8896) and Ireland (weight is equal to 0.1104). The targets for Bulgaria are presented in Table 5.
Only one indicator ( ·8 ) is already reached in 2015, so the performance of education system in Bulgaria should be improved with respect to  ·1 - ·7 indicators. The largest potential for the improvement of the education system in Bulgaria can be reached by improving indicators  ·5 and  ·7 . These indicators are much worse than in peers Denmark and Ireland.
Similarly for Latvia, peers are Denmark (weight is equal to 0.2864) and Ireland (weight is equal to 0.7137). The targets for Bulgaria are presented in Table 6.
Three indicators ( ·3 ,  ·4 and  ·8 ) out of eight are already reached in 2015 and the largest potential for improvement education system in Latvia can be reached by improving the same indicators ( ·5 and  ·7 ) like in Bulgaria.

Discussion and Conclusions
In the analysis of sectors of education systems the most frequently radial DEA models and CCR modifications are applied. Radial efficiency measure and proportional improvement of efficiency are natural for the units with fixed structure operating in identical conditions. Education systems of different countries do not correspond to Table 5 Observed data and formulated targets for Bulgaria (BGR) Observed ( Table 6 Observed data and formulated targets for Latvia (LVA) Observed (2015)  such a characterisation. Improvement of education systems means mainly improvement in it's structure: a reform means not only increasing values of indicators but also changing ratios between their values. Therefore BCC type model seems the most appropriate for education systems analysis. Our experiments show the applicability of PCA with Additivity Model base DEA. The proposed methodology of PCA-DEA allows researchers analyse education systems quantitatively. We demonstrated good performance of the proposed methodology on limited data set of ET2020 indicators. The assessment of the performance and recommendations for the improvements of education systems could be based on the analysis conducted using sufficiently large data set which would comprehensively represent the considered education system of the country as a whole.
A. Jakaitiene is professor and chief researcher at Vilnius university. She defended her PhD thesis in physical sciences (Informatics 09P) in 2001. A. Jakaitiene has experience of statistical analysis in social, physical and biomedical areas from working in Lithuanian and EU institutions. As of 2005, her research paper topics could be split in three fields: 1. modelling and forecasting of Lithuanian, European Union and global economic indicators; 2. analysing biomedical data, with special interest to modelling of large-scale genetic data; 3. conducting secondary analysis of international large-scale assessment databases in education. She is a board member of Lithuanian Statistical Society, a member of Lithuanian Mathematical Society, International Epidemiology Association and International Biometric Society.  Table 7 Indicators for year 2013