Effects, Experiences and Feedback from Studies of a Program Visualization Tool

. Program visualization (PV) is potentially a useful method for teaching programming basics to novice programmers. However, there are very few studies on the effects of PV. We have developed a PV tool called ViLLE at the University of Turku. In this paper, multiple studies on the effects of the tool are presented. In addition, new qualitative data about students’ feedback of using the tool is presented. Both, the results of our studies and the feedback indicate that ViLLE can be used effectively in teaching basic programming concepts to novice programmers.


Introduction
Programming is one of the main objectives in computer science studies.However, according to several studies (see, e.g., McCracken et al. (2001) and Ala-Mutka (2005)) students have significant problems in learning the very basics of programming.Because of scarce teaching resources and large group sizes students often get inadequate personal instruction.Thus, there is clearly a need for instructional tools that support students' independent learning.
Visualization systems use various graphical means in concretizing abstract programming and algorithmic problems.According to Wiggins (1998), the purpose of visualization is to help the user understand what a program does, why it does it, how it works, and what the end result is.Hence, visualization systems can supposedly help students to understand programs better, thus improving the learning results.There are several studies on the effects of different algorithm visualization systems (see, e.g., Hundhausen et al. (2002) and Laakso et al. (2008a)), but very few on program visualization.
We have developed a program visualization tool called ViLLE at the University of Turku, Finland.The purpose of developing the tool was to find out if program visualization can indeed help students in learning to program.ViLLE has been tested with different kind of setups taking student competence levels and backgrounds into account.In this paper we describe the tool itself for both in teacher's and student's point of view and present the results of our studies on the effects of ViLLE so far.Moreover, we present new quantitative results, based on the student feedback about using the tool.Finally, the future of ViLLE is discussed.

ViLLE
ViLLE is a program visualization tool, developed at the University of Turku.Its main purpose is to illustrate the changes in the programs states during the execution with various graphical and textual means.ViLLE supports multiple programming languages and has built-in editors for defining and editing syntaxes, examples and questions.With the export function the defined examples can be published as an independent collection, distributable in web or other media.By using the TRAKLA server (see Malmi et al. (2004)), ViLLE's automatically assessed exercises can be easily integrated as a part of a programming course.

Teacher's Point of View
From a teacher's point of view, the suitability for different kinds of courses can be seen as the number one feature of ViLLE.Programming languages, examples and questions are all customizable with the built-in editors.Hence, it should be relatively easy to integrate ViLLE into almost any programming course.Moreover, the flexibility gives teachers a chance to fulfil their own teaching philosophy, without a need to adjust to tool's limitations.

Example Editor
Examples are divided into categories.Teacher can create new examples and edit the existing ones by using the built-in example editor (see Fig. 1).The examples are written in Java, and ViLLE automatically translates the program code to all defined languages.Moreover, the visualizations of examples are automatically generated, as well as explanations on each program line.The list of supported Java features is limited, mainly, because all the programs should be easily translatable to other defined languages, and because the tool is directed for novice programmers.Furthermore, the visualization of more complex features (e.g., GUI components) can get quite tricky.

Question Editor
To further engage the learners to visualization, teacher can create questions about the example programs.Currently multiple choice questions and graphical array questions are supported.The questions are attached to program execution with a built-in question editor (see Fig. 2), and are automatically launched when desired point of program is reached.The TRAKLA2 server can be used with ViLLE to keep the score of answered questions and correct answers.

Syntax Editor
Teacher can add or edit syntaxes with the built-in syntax editor (Fig. 3).The editable syntax is presented on the right hand side of the window, and the correspondent Java syntax on the left hand side.Additionally, an explanation of the edited syntax line is presented in the bottom window.All defined syntaxes must have matching lines with the Java syntax to ensure the flow of execution and the consistency of questions.

Student's Point of View
ViLLE's key features from the student's point of view can be divided into following categories: Visualizing the program execution: The execution of the example program is visualized line by line (see Fig. 4).Currently and previously executed lines are highlighted, and the variable values, program line explanation and the output of the program are presented in their own frames.All subprograms, and their return values, are presented in the call order in their own frames in the call stack.Moreover, all global variables (namely arrays) are presented graphically in their own area.

Language independency:
The visualization of the execution works similarly, regardless of the chosen programming language.Programming language can be changed anytime during the visualization.In addition, ViLLE has a parallel mode (see Fig. 5) where execution can be viewed in two different languages simultaneously.
Visualization controls: Controls are flexible -the user can move one step at a time, both forwards and backwards in the execution of a program.In addition, the program can be executed continuously in adjustable speed.The execution slider at the bottom of the window can be used to move directly to a desired state of execution.The slider also has a secondary function: the number of steps can be used to determine the complexity of the program, and with suitable examples to compare the complexities of algorithms.
Interaction: Besides answering questions (see Fig. 6) the students can edit the program code in the visualization view.The changes in program code can be visualized instantly.However, since the editing must be done in Java, the feature can't naturally be utilized in all courses.

Automatic Assessment of Exercises
Examples made with ViLLE can be transferred to a collection in a web (see Fig. 7) by using the TRAKLA2 server.The server keeps score on students' logins and submissions, and makes it possible to set opening and expiration dates for example rounds.Individual examples can be retaken an unlimited number of times.Additionally, the teacher can set the minimum number of points required for each round to pass the course.Web-based ViLLE exercises are nowadays in use in most universities in Finland, and the student feedback (see Subsection 3.4) has been mostly positive: it seems that ViLLE has a positive role in enhancing the reading and tracing skills of novice programmers.

The Effectiveness of ViLLE
The effectiveness of ViLLE was studied at the University of Turku, Finland, in a course called "Introduction to information technology".There were 72 students participating in the study.We tried to find the answer to two research questions: 1) "Is ViLLE useful in learning basic programming concepts?"and 2) "Is there any difference in learning results when earlier programming experience is taken into account?".The null-hypotheses were that ViLLE is not useful in learning to program, and that the effect is same for both novices and more experienced students.No programming was taught before the study, taking place in the third week of the course.However, a lecture was arranged before the study where the programming language and the tool used were introduced.A link  to ViLLE was added to course homepage in the second week so that the students could familiarize themselves with the tool before the study.
The students were randomly divided into two groups: there were 32 (N = 32) students in the treatment group and 40 (N = 40) in the control group.At the beginning of the two hour session students first took a pre-test that lasted for 15 minutes.The pretest included three program reading exercises where students were asked to write down the output of each program.After the pre-test, students rehearsed programming concepts presented in the test with a web-based tutorial for 45 minutes.The treatment group could also visualize the program code examples presented in the tutorial with the ViLLE tool.After the tutorial session students had 30 minutes to answer to a post-test.The post-test had the same three questions as the pre-test with two more demanding questions, in which the students had to write a program implementing a given task, and write down the output of a recursive program.
Pre-and post-test exercises were analyzed in the scale of zero to ten.Thus, the maximum points of the pre-and post-test were 30 and 50.Table 1 presents the pre-test scores for both groups, including point averages, standard deviations (in parentheses) and pvalues calculated with a two-tailed t-test.
No statistically significant differences were found in any of the questions.In absolute scale the treatment group performed better in questions Q2 and Q3, and the control group in question Q1.
The scores from the post-test are presented in Table 2. Corresponding questions from the pre-test are shown in square brackets beside the post-test question names.Total points averages are presented both for the shared questions (questions that were the same in the pre-and post-test) and for all questions.Additionally, the point difference between the shared questions in the pre-and post-test is shown.
The comparison between the shared questions shows that the control group performed better in question PQ1 and the treatment group in questions PQ2 and PQ3.One reason for the smaller difference in the scores of the treatment group in PQ1 is the relatively high scores they got from the pre-test.In any case, the differences are so small that the null-hypothesis can not be rejected.
Table 3 displays a comparison between the pre-and post-test scores inside the groups.
As seen in the table, both groups performed statistically significantly better in the post-test than in the pre-test (p-value < 0.01).Based on this, it seems that it is possible to efficiently study basic programming concepts independently in these kinds of tutorial sessions.
The other research problem was the effect of earlier programming experience on learning results.When the results were analyzed, both groups were divided into two: students with no earlier programming experience (NPE) and students with some earlier programming experience (SPE).The pre-test scores with earlier programming experience taken into account are presented in Table 4.
The figures in the table show that students with some earlier programming experience got statistically significantly better scores in both groups.The corresponding results from the post-test are presented in Table 5.
As we can see from the table, statistically significant difference between NPE and SPE remained in the post-test in the control group.However, the difference vanished in the treatment group (p-values 0.212 and 0.151).Thus, it seems that ViLLE is especially useful for novice programmers.Because of the relatively short exposure to the tool, the result can be seen as significant.Cronbach alpha-values calculated for pre-and post-test questions (pre-test α = 0.667 and post-test α = 0.831) indicate high reliability.This study is presented in more detail in (Rajala et al., 2008).The study presented in Subsection 3.1 was extended so that a third group was formed which used ViLLE in a lower level of the engagement taxonomy.Hence, there were three groups: no viewing (N = 40), viewing (N = 65) and responding (N = 32).The questions were removed from the version of ViLLE used by the viewing group.The purpose of the setup was to confirm the hypothesis of Naps et al. (2002), which states that visualizations can have an effect on learning only if they are used in engagement level three or higher.
Statistical differences between the groups in the pre-and post-test were calculated with one-way ANOVA-test, and are presented in Table 6.
As seen in the table, there were no statistically significant differences between the groups.Students in the viewing group also improved their results statistically significantly during the session (the p-value of pre-and post-test difference inside the group < 0.01).
The differences in learning results between the novices and more experienced programmers inside the groups were examined next.The results are shown in Table 7.
There was a statistically significant difference in all groups between the NPE and SPE.As discovered in the previous section, the difference in the responding group vanished in the post-test.However, the difference remained in the viewing group (p-value < 0.001).Thus we can conclude that for the ViLLE to be useful, it should be used in an engagement level higher than viewing.To confirm the result, a post-hoc Student-Newman-Keuls test  too complex for high school students).Students who hadn't used ViLLE before belonged to control group (N = 17) while students with prior experience on ViLLE formed the treatment group (N = 7).Though the groups were quite small, results were statistically significant.
To confirm the equality of the groups, participants' earlier math and CS grades were analyzed.The averages and standard deviations are presented in Table 10.
There were no statistically significant differences in grades between the groups.The absolute differences are less than one point.
Pre-and post-test scores are presented in Table 11.
There was no statistically significant difference between the groups in the pre-test scores, but there is a difference in the post-test.The statistically significant difference can be found both in the shared (same questions in pre-and post-test) questions and all questions.Based on the results we can conclude that prior knowledge of ViLLE clearly improves learning results.Furthermore the results support our earlier findings that ViLLE is useful in learning basic programming skills: both groups got statistically significantly better results from the post-test in comparison to the pre-test.The results are presented in more detail in (Laakso et al., 2008b).

Student Feedback
In addition to quantitative tests we wanted to find out what students think about the tool.The opinions were gathered from students participating in a course called 'Introduction to Information Technology' at the University of Turku.ViLLE was an integral part of the course as all the assignments were done with it.114 students answered to the questionnaire consisting of three parts: general questions about the tool, how useful the system is when learning new programming concepts, and opinions about the features of the tool.
In the first section six statements about system were presented, and students were asked to evaluate those in a scale of one to seven (1: completely disagree, 7: completely agree).Based on the answers, the students seem to think that the tool is quite suitable for teaching programming (average of all answers 5.64), that it is fairly easy to use (avg.5.49) and that it helps in understanding the basic programming concepts (avg.5.41).
In the second section the students were asked to evaluate the usefulness of ViLLE in different areas related to programming (see Table 12).Based on the answers, the students found ViLLE as an useful tool for teaching all the basic concepts -arrays were the only concept which got an average less than five (avg.4.73).Based on students' comments, this is probably because of the usability issues in answering the array questions.
In the third section the students were asked to evaluate the usefulness of different features of ViLLE (see Table 13).All features -except the visualization of programs in different languages -got an average of five or higher.The features the students found most useful were the visualization of variable states and the automatic assessment of exercises (averages 5.90 and 5.80).The worse average in "visualization in multiple languages" is probably due to a fact, that the students didn't use the feature.For that matter, the feature can be found most useful when the students already know a programming language, and a different language is taught.Additionally, the students were asked to evaluate the number of ViLLE exercises in the course and the number of questions in the exercises in the scale of one to seven (1too few, 7 -too many).Based on the answers, the amounts were quite suitable (exercises avg. 4.48 and questions avg. 4.13).
Moreover, the students had a possibility to give additional comments about the tool.Most of the feedback was quite positive.However, some criticism and thoughts about improving the tool were also given.Some of the positive things mentioned were: -"In my opinion, ViLLE exercises were more effective than lectures."-"I found ViLLE really useful: because of the ViLLE exercises, I didn't practically need the lecture handouts at all to learn programming."-"With ViLLE I was able to pick up programming far better than with lectures."-"There were a lot of different exercises and it was easy to do them wherever and whenever I felt like it."The negative comments were mostly related to the functionality of the user interface: -"ViLLE is great when it functions properly; however, the GUI needs some improvement: e.g., handling arrays is illogical and difficult."-"It would be handy if you could see the execution history when answering the questions:" -"All the essential things won't fit on the screen simultaneously.Why do you need to login to ViLLE?" -"It's irritating that you can't scroll the window when the question appears."Moreover, some of the students would like to have more conventional teaching, instead or alongside ViLLE exercises: -"I'd rather attend the traditional computer lab sessions, where there would be some kind of help available on request."-"ViLLE was a good tool for studying; however, more training in small groups would be highly appreciated."-"ViLLE is ok for learning the basics, but personally I learned more during the lectures."All in all, the students seemed to have quite positive image about ViLLE's usefulness, though some improvements were also wished for.In conclusion, the students would prefer a course, where ViLLE exercises are integrated with more traditional teaching methods.The reported problems related to the user interface will be taken into account in the further development of the tool -e.g., the answering to array questions should nowadays work more logically than in the version, which the opinions were gathered about.

The Future of the Tool
ViLLE is nowadays in use in basic programming courses in most universities in Finland, and in addition, it will be mobilized at least in Australia during this spring.In future, we plan to develop the tool further based on the user opinions and experiences, and further research the effects of the tool when used in programming teaching.Our goal is to add more features to support the higher levels of engagement taxonomy.Moreover, exercise templates are developed to enable the randomization of the exercise parameters (i.e., variable types and values etc.); this makes redoing the exercises more meaningful.Additionally, different kinds of exercise types are under development, including code sorting exercises, where program code lines are randomly shuffled and the students are supposed to sort them in an order which implements the given task or algorithm.

Conclusions
Based on the presented results, experiences and feedback we can conclude the following: -By using ViLLE the novices can improve their learning results significantly, even when the tool is used quite briefly. . .-. . .but only if the tool is used in the higher level of engagement.The mere viewing of visualization doesn't seem to have the same effect.This supports the hypothesis presented by Naps et al. (2002).-Moreover, to ensure the learning results, the students must be familiarized with the tool beforehand, hence reducing the cognitive load of learning to use the tool.-Most of the students think that ViLLE is beneficial when learning the basic programming skills.However, some of the students seem to think that the best way to use such tool is to integrate it with the more conventional forms of teaching.-ViLLE gives students a chance to learn and practice the very basics of programming independently.These basics can't normally be covered on the lectures as thoroughly as required by some students.-The focus of ViLLE is in program code reading and comprehension skills.However, Lopez et al. (2008) state that the tracing skills correlate with students performance on code writing tasks.-From a teacher's point of view, ViLLE's key feature is the flexibility, and most importantly the support for almost any imperative programming language.
All in all the experiences seem quite encouraging so far: the results and the students' feedback both confirm the assumption that ViLLE can be used effectively in teaching the basic programming skills to novices.As we all know -unless we have already forgotten -the first steps in programming really are quite difficult.Therefore there seems to be a demand for such system, both now and in the future.

Table 4
The effect of earlier programming experience on pre-test scores

Table 6
Statistical differences between the groups