Score Calculation in Informatics Contests Using Multiple Criteria Decision Methods

The Lithuanian Informatics Olympiad is a problem solving contest for high school students. The work of each contestant is evaluated in terms of several criteria, where each criterion is measured according to its own scale (but the same scale for each contestant). Several jury members are involved in the evaluation. This paper analyses the problem how to calculate the aggregated score for whole submission in the above mentioned situation. The chosen methodology for solving this problem is Multiple Criteria Decision Analysis (MCDA). The outcome of this paper is the score aggregation method proposed to be applied in LitIO developed using MCDA approaches.


Introduction
The Lithuanian Informatics Olympiad (LitIO) is a state supported algorithmic problem solving competition for students in secondary education.The contestants are given algorithmic tasks and have to solve them in four or five hour contest sessions.They have to design and implement the algorithm in order to solve the task.The task may also require to submit reasoning for algorithm design or a set of test cases.The material submitted for evaluation by a contestant is called a submission.After the submission has been evaluated in terms of separate criteria where each criterion is measured according to its own scale (but the same scale for each contestant), the aggregated score has to be calculated so that the submissions can be ranked with respect to other submissionsfor the same task.Measuring the distance between contestants is also important.
This paper is a continuation of the paper Improving the Evaluation Model for the Lithuanian Informatics Olympiads (Skūpienė, 2010), where we elaborated the components of the submission, and the list of evaluation criteria.Each criterion consists of a description, the form of evaluation (manual, automatic) and the scale (boolean, ordinal, ratio).
In this paper we focus on the score aggregation once the submissions have been evaluated in terms of the separate criteria.We approach the problem from the point of view of Multiple Criteria Decision Analysis (MCDA).The structure of the paper is the following.First we will explain the main ingredients of MCDA and show that submission ranking in LitIO falls under the category of problems that can be solved using MCDA techniques.Then we will present the requirements that emerge from the specifics of the problem, and finally we focus on the choice of the MCDA algorithms, suitable for score aggregation in LitIO.

Submission Ranking in LitIO as an MCDA Problem
The field of Multiple Criteria Decision Analysis (MCDA) is defined as a collection of formal approaches which seek to take into account multiple criteria in order to help decision makers to explore different decision alternatives (Belton and Stewart, 2003).
Evaluation in LitIO as such can be treated as an MCDA problem, and the work presented in Skūpienė (2010) corresponds to the first stage of the MCDA process, i.e., problem structuring.The outcome of problem structuring is an explicit list of criteria and alternatives.The task that has to be explored in this paper is ranking of submissions once the submisions have been evaluated in terms of separate criteria.Note, that in practice, the overall ranking has to be based on the scores of several tasks.However, in this paper, we limit our research to determining a score for one task only.
Three major roles can be identified in MCDA.They are decision maker, decision analyst and stakeholder (Val, 2002).
The scientific part of LitIO is managed by the scientific committee.The scientific committee is responsible for all the scientific decisions, i.e., approving the syllabus of the contest, designing tasks and tests, approving the evaluation procedure, performing the evaluation, approving ranking and declaring winners.In 2010 the scientific committee of LitIO consisted of 13 members (Scientific Committee of Lithuanian Informatics Olympiads, 2010).The scientific committee is the only decision maker in this context.The role of decision analyst is played by the author of this paper.
The most important stakeholders are interested in programming and algorithmics students in secondary education from all over Lithuania, as well as the community of informatics teachers.This community of stakeholders is affected directly by each decision or change in the evaluation scheme.Scientific committee of LitIO is also a stakeholder, because possible changes in the evaluation scheme might change their working procedures, time spend on task design and evaluation.
Model of relationship between the different roles in the decision analysis process in the problem under consideration is presented in Fig. 1.
There were suggested several ways how to classify MCDA problems (Roy, 1996;Belton and Stewart, 2003).Submission ranking problem is the ranking problematique as the final outcome of the evaluation is a ranked list of contestants based on which the awards will be distributed.
Based on another type of classification, the submission ranking problem is repeated problem, therefore the focus of this research is on constructing the ranking procedure which could be applied annually in LitIO.
It is a group decision making problem, because the role of decision maker is played by the members of LitIO scientific committee and the opinions of all of those members who are involved in the evaluation of submissions of a particular task has to be taken into account.

Decision Matrix
Once the list of alternatives and criteria for an MCDA problem is determined, the following step is to construct a decision matrix.In this section we present the decision matrix constructed for the submission ranking problem. Let is a finite set of criteria (i.e., evaluation criteria), and P = {P 1 , P 2 , . . ., P q ) is a finite set of decision makers (i.e., evaluators).
Assume that all the decision makers are involved into defining relative weights and determining performance of each alternative in terms of each attribute.
Note that if the criterion is measured subjectively, i.e., the decision makers assess the performance of an alternative i in terms of criterion j manually, then the value of x k ij is linguistic.Otherwise the value of x k ij is numeric and x 1 ij = x 2 ij = . . .= x q ij .Then the submission ranking problem can be expressed by the following decision matrix: where k = 1, 2, . ..q.
Note, that the classical MCDA algorithms, assume single decision maker problem, i.e., they assume that q = 1.

Application of Fuzzy Numbers for Quantifying Linguistic Variables
Some of proposed evaluation in LitIO criteria are measured manually using linguistic variables.Linguistic variables are variables whose values are linguistic terms and not numbers.They are used to express results of subjective qualitative evaluation.Linguistic variables were introduced and described by Zadeh (1975a, b, c).Triangular and trapezoidal fuzzy numbers are used for quantifying linguistic variables.
Next we shortly present the related concepts of fuzzy logic based on Lee (2005), Triantaphyllou (2000).
Fuzzy set.A Fuzzy set is any set that allows its members to have different grades of membership (membership function) in the interval [0, 1], i.e., for any subset A of a universe X it is possible to define a membership function of a fuzzy set: A crisp set is a separate case of fuzzy set and to make distinctions between crisp and fuzzy sets we will use A notation for fuzzy sets.
Operations on Fuzzy sets (Zadeh, 1965;Lee, 2005;Triantaphyllou, 2000): Fuzzy number.Fuzzy set is called a fuzzy number if the fuzzy set is convex, normalised, its membership function is defined in R and is piecewise continuous.
Trapezoidal fuzzy number.Trapezoidal fuzzy number is a fuzzy number represented with four points as follows: A = (a 1 , a 2 , a 3 , a 4 ) and this representation is interpreted in the following way: x > a 4 . (2) When a 2 = a 3 , the trapezoidal number coincides with a triangular fuzzy number.There were created many conversion scales for transforming linguistic terms into fuzzy numbers.Chen et al. (1992) proposed eight conversion scales with different numbers of linguistic terms which are commonly used.An example of pretty standard in fuzzy set theory nine-item scale is presented in Table 1 and Fig. 2 (Sule, 2001).The choice of a concrete scale from the available ones is intuitive and left to the responsibility of the decision maker.Note that the same linguistic term in different conversion scales can have different crisp values.
paper we will not suggest the concrete scales, because the scales are chosen intuitively and we believe that the jury members also have to be involved in the decision.Only after the piloting of the evaluation scheme it might be possible to make a final decision about the scales.

Submission Ranking Problem Constraints
The decision context of our problem is rather specific.The problem belongs to the ranking problematique category and is a group decision making problem.Moreover, the chosen method will have to be applied in an educational informatics contest situation.Therefore it is highly important that the approach would be accepted by the community of informatics contests.Belton and Stewart (2003) emphasize that the ability to explain the chosen to approach to a variety of backgrounds is an important factor in the choice of MCDA approach.The score aggregation contains parts which are revealed to the contestants, but it also contains the hidden parts.For example, the scores assigned by individual jury members during manual evaluation are not revealed to the contestants, only the aggregated score is.We emphasise that the parts of the scoring function which are revealed to the contestants must be easily understandable and transparent.While more complicated techniques could be applied to the hidden parts.
It must be noted that our problem is a repeated problem.This means that the process of ranking submissions will have to be repeated each time a LitIO contest session takes place.Though on different submissions possibly of different contestants.Therefore it is very important to achieve that the stakeholders would accept and understand the method.
Even though the problem is described as ranking problematique, it is not enough to present ranking to the contestants.The jury (during medal allocation procedure) and the contestants are interested not only in the position in the ranking table, but also in the score differences among a group of contestants.
It is commonly accepted in LitIO that a score aggregation function mapping the performances for separate attributes (groups of criteria) into real numbers is defined and announced to the contestants in advance.
Therefore we will focus on MCDA approaches which foresee defining score aggregation function and partial score functions, inducing a ratio scale, and the ranking is made after the aggregated scores for each alternative have been calculated.

Choice of MCDA Approach
Many different MCDA approaches are presented and categorised (Triantaphyllou, 2000;Kahraman, 2008;Chen et al., 1992;Carlsson and Fullér, 1996;Belton and Stewart, 2003).Instead of focusing of separate MCDA methods, we will first look at the major families of MCDA methods.Belton (2003) distinguish three major families of MCDA approaches: • Value measurement theory (Keeney and Raiffa, 1976).The main idea of this approach is to construct a value function which would associate each alternative with a real number in order to produce ranking of alternatives.The main idea of this theory correspond the intentions and reasoning for our problem.Therefore we will include it for further consideration.• Satisficing (or Goal programming) (Simon, 1976).This approach instead of creating one value function operates on partial value functions.By partial value function we understand value function which maps performance of alternatives in terms of a certain criteria to real number.The main idea of the approach is that the most important criterion is identified and the acceptable level of it is determined.Then the alternatives are eliminated until all the remaining alternatives achieve the acceptable level.At this point the second most important alternative together with its satisfactory level is identified.The alternatives which do not reach satisfactory level of the second criteria are eliminated again.This approach is not suitable for our problem as it does not assume score aggregation at all.• Outranking (Roy, 1996).Outranking methods also operate with partial value functions and involve pair-wise comparisons of alternatives.An alternative is dominated by another alternative if the another alternative performs better in terms of one or more criteria and equals in the remaining criteria.The concept of outranking is introduced.The outranking relationship of two alternatives describes that even though the two alternatives do not dominate each other mathematically, the decision maker accepts the risk of regarding one alternative almost surely better than the other.
We consider this approach also unacceptable in our situation because it again deals with preferences in terms of separate criteria and does not foresee score aggregation using single value function.The concept of outranking, i.e., allowing the decision maker to take the risk of considering one alternative better than the other is not acceptable in a contest community where scoring is extremely sensitive issue.
Out of three major MCDA families, only one foresees construction value function, which is required in ranking submissions problem as well.Therefore further we focus on algorithms of value measurement theory.
Besides main families of MCDA approaches, fuzzy logic is often considered to be applied for MCDA problems.Fuzzy logic is used in group decision making which is our case.However, fuzzy logic is not a separate methodology, but a tool that can be applied within other MCDA approaches including the ones described above.Therefore we assume that fuzzy logic might be applicable in case of this problem and we will look at the concepts of fuzzy logic as well.

Choice of Value Measurement Theory Method
Value measurement theory was mainly started by Keeney and Raiffa (1976).More on it can be found in Roberts (1979), French (1988).
The main idea of this theory is that a real number ("value") is associated with each alternative in order to produce ranking of alternatives.The value function is defined as a function assigning a non-negative number to each alternative indicating the desirability (or preference) of the alternative.
The value function has to satisfy the following requirements: an alternative A i1 is preferred to alternative A i2 (A i1 A i2 ) if and only if V (A i1 ) > V (A i2 ); the alternatives are indifferent if and only if V (A i1 ) = V (A i2 ).Note, that the value function must induce complete order.
Value measurement approach introduces partial value functions v j (A i ) They are constructed for separate criteria and partial value functions holds the essential features (i.e., induces complete order) of value function in terms of separate criteria.
There were developed several value measurement theory algorithms and the most popular ones are Weighted Sum Model and Weighted Product Model.We would also assign Topsis algorithm (we will present it later) to the same category of algorithms.Note that those algorithms are constructed for single decision maker problems.
Weighted Sum Model (WSM) is most commonly used method for single decision maker problems (Triantaphyllou, 2000).It can be described using the following formula: where i = 1, 2, . . .n and j = 1, 2, . . .m.One of the reasons of wide acceptance of this model is its simplicity, i.e., it can be easily explained by the decision makers to a variety of backgrounds (Belton and Stewart, 2003).
Note that the requirement preferential independence has to be satisfied so that the WSM model could be applied (Belton and Stewart, 2003).Suppose that two alternatives A i1 and A i2 differ only on a set of criteria R ⊂ C (R is a proper subset of C) and values of partial functions are equal on all other criteria.Then it is possible to decide the relationship of A i1 and A i2 (i.e., A i2 A i1 or A i1 A i2 or A i1 ∼ A i2 ) knowing their performances on criteria from R only, i.e., irrespective of values of their performances on all the other criteria.
However among the criteria there are several dependent criteria, e.g., quality of programming style is related either to the performance of an algorithm-code complex or to its efforts to solve the task (Skūpienė, 2010).Thus partial independence of criteria is violated.
We suggest that this does not eliminate WSM from applying it for score aggregation in LitIO.WSM still can be applied for aggregating those criteria that are preferentially independent.Special functions have to be introduced for aggregating scores of dependent criteria.
Another requirement is to use the same scale of measurement for all the criteria.Performance of submissions are measured using different scales.However we intend to unify the scales by constructing the corresponding partial value functions.
WSM can be potentially applied for score aggregation in LitIO, though the above mentioned condition has to be observed.
There have been suggested arguments that preferences are often perceived in ratio scale terms therefore product is more natural than sum (Lootsma, 1997;Triantaphyllou, 2000).The consequence of trade an additive approach into multiplicative approach is that partial value functions have to satisfy ratio scale properties instead of interval scale properties (Triantaphyllou, 2000).The WSM presented above uses partial value functions which can solve this issue.
Simplicity of the approach is a high priority in the choice of score aggregation algorithm.We conclude that WSM algorithm would be more suitable than WPM as it is simpler and better understandable to the wide audience and otherwise they seem to be identical in terms of the problem under consideration.
Topsis (Technique for Order Preference by Similarity to Ideal Solution) (Triantaphyllou, 2000;Saghafian and Hejazi, 2005).We did not find it explicitly stating that Topsis belongs to Value measurement theory approaches, nor to other specific family of MCDA approaches.However, as it involves calculating value of closeness coefficient and ranking based on the values of the coefficient, we suggest that it is appropriate to consider it here.
Topsis introduces concepts of hypothetical solutions, i.e., positive ideal solution and negative ideal solution.The positive ideal solution is calculated as a function from the best performance values of the concrete decision matrix in terms of each criteria: The negative ideal solution is calculated as a function from the worst performance values in terms of each criteria: For each alternative the Euclidean distance from ideal positive solution and ideal negative solution is calculated: Finally, the relative closeness coefficient to the ideal positive solution is calculated: Si * +Si− .The alternatives are ranked based on the value of relative closeness coefficient to the ideal solution of each alternative.This method from mathematical point of view is interesting and appealing however, it gives in to WSM due to the simplicity of the latter.
Moreover, score of one submission is dependant upon the quality of the submissions.There were cases where such approach was applied in large informatics contests.However, in LitIO contestants also compete in small groups and there are cases where just few (i.e., less than 10) submissions per task are presented.If the score is dependant upon the submissions in such case, then it might become too biased.
After looking at several value measurement theory associated methods, we came to that conclusion, that as simplicity and the ability of wide audience to accept the evaluation scheme plays significant role in the choice of approaches, WSM approach suits best to solving evaluation in LitIO problem.Though certain requirements have to be observed.We did not find any evidence that other methods would be more suitable than WSM.
We decided on the score aggregation method, however the classical WSM assumes single decision maker and therefore we have to look for the extension of WSM to be applicable for group decision problems.

Group Decision Making
Group Decision Making (GDM) can be defined as decision making process based on the opinions of several individuals.The goal of GDM is to arrive to the satisfactory for the group solution, rather than to the best solution which almost does not exist (Lu et al., 2007).There are available various methods for group decision making from mathematical to psychological and social.
Among MCDA approaches explicitly meant for solving group decision making problems there are techniques which foresee negotiation theory, working with group dynamics, etc. References to that can be found (Carlsson and Fullér, 1996;Lu et al., 2007).Those approaches have been experienced in LitIO many times.Investigation of their suitability in LitIO evaluation problem would require much investigations from other sciences, in particular management and psychology.For example, most of meetings are conducted on-line (as members of the scientific committee are associated with different universities in different cities and even countries), some members are reluctant to discuss issues on-line, less experienced tend to vote as more experienced members, etc.These aspects should have been investigated if the above mentioned direction was taken.
Our choice is to focus on mathematical group decision making methods which assume eliciting concrete information from decision makers and using it in a mathematical algorithm, but do not require interaction and negotiation between decision makers.
There are different ways how to implement group decision making.Much references can be found (Rao, 2007;Lu et al., 2007).Many common GDM methods (e.g., authority rule, majority rule, negative minority rule) are not suitable because they are intended for the choice problematique (i.e., determining the best alternative), but not for the ranking problematique problems.Lu et al. (2007) distinguishes three factors which influence GDM: The weights of the decision makers.Among the decision makers there might be those who play more important roles in the decision making.In this case, the decision makers should be assigned different weights and that should be reflected in the group decision making process.

Weights of criteria.
The decision makers may have different views, attitudes, experience and therefore propose different weights to the criteria.

Preferences of decision makers for alternatives.
If the performance of an alternative is evaluated subjectively, then different decision makers can have different understanding, different experiences and evaluate performance of the same alternative in a different way.
It is common in GDM that the weight of a decision maker, the suggested weights for evaluation criteria, and the performances of alternatives suggested by the decision makers are expressed by linguistic terms, since linguistic terms reflect uncertainty, inaccuracy, and fuzziness of the decision makers (Lu et al., 2007).We also assume that the information, provided by each decision maker is consistent and non-conflicting.

Score Aggregation Method for Submission Ranking
We already concluded that WSM approach suits best for submission ranking problem.We were looking for an extension of WSM to GDM, such that it would allow fuzzy numbers in the decision marix, however would use crisp number for partial scores for the attributes and for the final ranking, i.e., its public parts would be acceptable by the community of LitIO.
Many fuzzy GDM algorithms (e.g., an intelligent FMCGDM method (Lu et al., 2007) or the one described in Sule ( 2001)) assume aggregating fuzzy numbers and only then deriving the final ranking.There was performed a systematic and critical study of the existing fuzzy MCDA methods.It was arrived to the conclusion, that the majority of currently existing fuzzy MCDA approaches involve complicated calculations, require all the elements of decision matrix to be presented in a fuzzy format (though some of them might be crisp), and are not suitable for solving problems with more than ten alternatives associated with more than ten criteria (Rao, 2007;Chen et al., 1992).
The method presented by Chen et al. (1992) is considered to be the one which avoids the above mentioned problems (Rao, 2007;Zhang, 2004).It consists of the following phases: • linguistic terms (if such are used) are converted to fuzzy numbers; • fuzzy numbers are converted into crisp scores; • classical MCDA approaches, which assume crisp values, are applied.Now we have to find a classical GDM method which assumes crisp matrix.Such method is The group decision support algorithm suggested by Csáki (1995).
Therefore we have to combining the group decision support algorithm with the approach of Chen et al. (1992).Thus we obtain a GDM algorithm suitable to apply in LitIO evaluation problem.Further we use the notations introduced in the Section 3.
The linguistic terms are converted to fuzzy numbers as it was described in Section 4. The crisp score of a fuzzy number A is calculated in the following way.First there are defined two functions μ max (x) and μ min (x): Then the left and the right scores of A and the two functions are defined as: Here Sup stands for the least upper bound.The total score crisp score of the fuzzy number A is defined as: rion is announced.Thus, if fuzzy techniques are used to aggregate scores of several jury members, they remain behind the curtains and do not become the source of discussions and doubts for the contestants.
The final step is to apply the group decision support algorithm (Csáki et al., 1995) to the crisp decision matrix.
First the aggregated group weights for each criterion are calculated: The values of partial value functions of performance of each alternative in terms of each criterion are calculated in a similar way: The total aggregated values for each alternative are calculated in the following way: Based on the calculated values, the ranking of the alternatives is performed.

Conclusions
In this paper we proposed to combine the group decision support algorithm combined with score aggregation method to be applied during evaluation process in Lithuanian Informatics Olympiad.The method takes into account linguistic values (outcome of manual evaluation) and multiple decision makers (members of the scientific committee).
Even though MCDA theory is acceptable from the scientific point of view, there arise many difficulties with its application in practice because the stakeholders feel reluctant and sensitive about the application of complicated formula to sensitive issues (in this case score aggregation).
The most important requirements to the score aggregation method were the understandability and acceptability to the wide audience of the public (i.e., disclosed to the contestants) parts of it.Another important requirement was use of value function.As a result of these requirements, we spent time on looking for a suitable method that would fulfill all the problem specific requirements, rather than analysing several equally possible options.The paper reveals how we arrived to the suggested score aggregation method.
The next step would be piloting the suggested score aggregation scheme and investigating how it will be accepted in LitIO.
J. Sk ūpienė is a younger research fellow in the Informatics Methodology Department in the Institute of Mathematics and Informatics Vilnius Uinversity.She is a member of the Scientific Committee of National Olympiads in Informatics since 1994 and a Lithuanian team leader in International Olympiads in Informatics since 1996.For a few years she was director of studies of Young Programmers' School by Correspondence.Since 2004 she has been a coordinator of Informatics section in the National Academy of Students.She is author/co-author of three books on algorithmic problems for informatics contests.Her research interests include informatics and algorithmic contests, teaching informatics and computer science in secondary education.

Fig. 1 .
Fig. 1.Model of relationship between different roles in decision analysis in the analysed problem.

Table 1
Weights of a trapezoidal distribution of a linguistic scale