SCORE – A Model for the Self-Assessment of Creativity Skills in the Context of Computing Education in K-12

. In today’s society, creativity plays a key role, emphasizing the importance of its development in K-12 education. Computing education may be an alternative for students to extend their creativity by solving problems and creating computational artifacts. Yet, there is little systematic evidence available to support this claim, also due to the lack of assessment models. This article presents SCORE, a model for the assessment of creativity in the context of computing education in K-12. Based on a mapping study, the model and a self-assessment questionnaire are systematically developed. The evaluation, based on 76 responses from K-12 students, indicates a high internal reliability (Cronbach’s alpha = 0.961) and confirmed the validity of the instrument sug - gesting only the exclusion of 3 items that do not seem to be measuring the concept. As such, the model represents a first step aiming at the systematic improvement of teaching creativity as part of computing education.


Introduction
In our globalized world, creativity plays a key role in all areas and, thus, together with critical thinking and problem-solving, it is considered one of the main 21st century skills (Voogt and Roblin, 2012). Consequently, creativity also plays an important role in K-12 education. Many curricula around the world, mention creativity explicitly as the desired outcome (P21, 2020;Voogt and Roblin, 2012).
Creativity can be understood and defined in different ways depending on the context (Mellini et al., 2010). It can depend on the culture, the person's knowledge, and idio-syncratic skills, so that different communities may have different notions of creativity (Amabile, 1982;Said-Metwaly et al., 2017). From a viewpoint of cognitive psychology (Matlin, 2014), creativity is related to the problem-solving field and is generally defined in terms of the capacity to generate new and useful ideas and solutions that are novel, appropriate, functional, correct, and valuable (Walia, 2019). Guilford (1950) characterizes creativity as: Fluency: the ability to generate many ideas, which frees creativity. • Flexibility: the ability to analyze a situation from a different angle, by combining • different places, people, directions, and periods. Originality: the ability to generate unique or unusual products. • Elaboration: the ability to engage details, embellish and complete something cre-• ative.
Divergent thinking can be considered a type of creative thinking and, although not being the same, both lead to original ideas and solutions (Runco and Acar, 2012). In the context of 21st-century skills, Binkley et al. (2011) also consider creativity as being able to create valid new ideas effectively. It involves being open to new ideas, diverse perspectives, and feedback as well as to understand failure as a learning opportunity.
To represent the multifaceted nature of creativity it is often classified into 4P's (Rhodes, 1961): Person, Process, Product and Press, The Person strand involves aspects such as personality, traits, attitudes, etc., and focuses on researching questions related to how to identify a creative person. The Process strand focuses on thinking, motivation, communicating processes related to creating tangible results of the creative process represented by the Product strand. The Press strand is related to whether the environment favors the relationship of people regarding creativity. Any of the P`s can be analyzed on its own or together to provide a holistic insight.
Focusing on the assessment of creativity in the context of computing education, we are emphasizing Person aspects, referring to the individual that is performing the creative act. This includes the personality and various traits and attitudes of the creative individual, such as creative self-concept, intrinsic motivation, independence of judgment, as well as the individual's creative potential (Gruszka and Tang, 2017). Including also Process aspects, a creative person is expected to be sensitive to problems, has mental flexibility, thinks divergently, and is able to redefine existing objects and concepts.
There are many ways to integrate the teaching of creativity into K-12 and one alternative is through computing education, which has become important as young people need to learn not only how to use Information Technology, but also to create new computational artifacts (CSTA, 2016). In this way, the teaching of computing covering core concepts, such as algorithms and programming and practices has the potential to provide opportunities to students to extend their creativity by solving problems and creating computational artifacts (Yadav and Cooper, 2017;Romero et al., 2017). Currently, computing education is already part of the K-12 curriculum in several countries as well as through extracurricular initiatives to popularize computing competencies (Webb et al., 2017;Heintz et al., 2016;Hubwieser et al., 2015). Even though, observing the importance of computing education for the development of cognitive skills such as creative thinking (Scherer et al., 2019), there is little evidence confirming its contribution.
Most assessments carried out concerning the impact on creativity of computing education in K-12 are aimed at analyzing the learning of specific skills, such as programming and/or computational thinking (Grover and Pea, 2013), not evaluating the development of other 21st-century skills. Specifically, concerning computing education there exist only very few approaches to assess creativity, as by analyzing programming artifacts within the educational context (Bennett et al., 2013;Manske and Hoppe, 2014).
A reason for this lack of assessment may be the complexity of the theoretical characterization of the creativity construct making it difficult to assess. So far there exist several general creativity assessment models, including diverse tests, inventories, the judgment of the products created, etc. (Nakano, 2020). Among these, the Torrance Tests of Creative Thinking (TTCT) (Torrance, 1966) is one of the most well-known tests. It associates the cognitive characteristics proposed by Guilford (1956) with emotional characteristics such as expression of emotion, fantasy, and movement, etc. Other instruments include elements such as divergent thinking, analytical thinking, mental flexibility, associative thinking, tolerance for ambiguity, imagination, and inventive capacity (Nakano, 2020). However, assessment models for this skill focusing on the Person in the context of computing education are nonexistent. And, although Bolden et al. (2019) and Snyder et al. (2019) demonstrate the assessment of creativity in any discipline in K-12, none of them targets specifically computing education.
Therefore, the purpose of this article is to present the development of the SCORE (aSsessing COmputing cREativity) model for the assessment of creativity in computing education settings in K-12 adopting a self-assessment instrument that can be used to measure the impact of teaching computing. The model is evaluated in terms of the reliability and validity of the measurement instrument based on a case study conducted in Brazilian schools. The results of this study represent an initial step in order to provide support for the assessment of the impact of computing education in K-12 aiming at the development of creativity.

Related Work
As a result of a systematic mapping of the literature, we found only ten models aimed at the assessment of creativity in the educational context as shown in Table 1.
The majority of the models are based on well-founded and accepted research (Torrance and Goff, 1989;Guilford, 1967;Sternberg, 1985). Most models focus on evaluating higher education students, mainly in Computer Science and Engineering courses. Applications of three models were also found in Psychology and Educational Sciences courses. Many other models were found but were excluded for not being inserted in an educational context. Most of the models in an educational context are targeting higher education, with only McKlin et al. (2018) and Soroa et al. (2015) also approaching the high school level. No approach for earlier educational stages has been encountered.
In general, the models vary a lot concerning the factors of creativity they assess pointing out a lack of a standardized way to assess creativity. The most frequently con-sidered factors are originality, fluency, flexibility, and curiosity ( Fig. 1). Yet, considering that there is a global effort to carry out creativity assessments, whether in K-12 or higher education, given the variety of sources and definitions used for the development of the models, there still does not seem to exist an agreement concerning how to evaluate this skill.
Most models use a Likert scale to answer the assessment questionnaire, followed by three models that use an ordinal scale and one model that uses a multiple-choice answer. They also differ considerably in the number of items in the questionnaire, ranging from 18 to 60 items as detailed in Table 2.
The majority of the models has been systematically developed based on previous work and/or theoretical models. Most models (with only two exceptions (Oihus et al., 2013;Romero et al., 2017)) present in detail the evaluation of the proposed measurement instrument. On the other hand, one study only partially assessed its data collection instrument and two others did not provide the data. Four of the studies analyzed factors such as reliability and validity, while others focused exclusively on reliability. The results of these evaluations are consistent, with most presenting a Cronbach's alpha above 0.70 and three models above 0.90 indicating acceptable to excellent internal consistency of their items (McKlin et al., 2018;Runco et al., 2001;Kaufman, 2012). Table 1 Existing approaches for the assessment questionnaires of creativity in educational contexts Reference Name (Auzmendi et al., 1996) CT -Abedi-Schumacher Creativity Test (Hass and Burke, 2016) -- (Kaufman, 2012) K-DOCS -Kaufman Domains of Creativity Scale (McKlin, et al., 2018) Student Engagement Survey (Oihus et al., 2013) TestMyCreativity (Romero et al, 2017) Assessment Scale of Creative Collaboration (Runco et al., 2001) RIBS -Runco Ideational Behavior Scale (Shell et al., 2013) ECCI-i -Epstein Creativity Competencies Inventory for Individuals (Soroa et al., 2015) EDICOS -Emotion/motivation-related Divergent and Convergent thinking styles Scale (Susnea and Vasiliu, 2016) IACEST -Indirect Assessment of Creativity through the Estimation of Stereotypical Thinking Yet, although these results indicate some generic models for assessing creativity, there are still none available in the context of computing education in K-12, especially when focusing on elementary and middle school.

Research Methodology
To develop the SCORE model, a multi-method research was performed. Initially, we elicited the state of the art identifying existing approaches for self-assessment of creativity in an educational context through a systematic mapping study in accordance to Petersen et al. (2008). Based on the literature review, the SCORE model has been developed following the procedure of the scale development guide proposed by DeVellis (2016) and the guide for questionnaire design by Kasunic (2005).
Adopting the Goal/Question/Metric approach (GQM) (Basili et al., 1994), the assessment objective was defined and systematically decomposed into factors to be measured. The factors were defined to support the development of the measurement instrument (questionnaire), based on a mapping study of their concepts following the procedure proposed by Budgen et al. (2008). The measurement of the factors is operationalized by decomposing the factors into measurement instrument items. The definition of the items is based on other questionnaires found in the literature. We analyzed the pool of items in terms of similarity and redundancy, customizing, and unifying the selected items. To standardize the selected items, all items were refined and transformed into positive statements. The response format for the items of the measurement instrument was determined based on response formats typically used following the scale development guide proposed by DeVellis (2016). Face validity (Trochim and Donnelly, 2018) has been analyzed through an expert panel composed of a multidisciplinary group of senior researchers with backgrounds in computing and/or statistics as well as representatives of the target audience. The review aimed at analyzing clar- Table 2 Characteristics of the existing assessment questionnaires

Reference
Quantity of items Scale type (Auzmendi et al., 1996) 60 Multiple choice (Hass and Burke, 2016) 46 4-point Likert scale (Kaufman, 2012) 50 5-point Likert scale (McKlin, et. al., 2018) 18 5-point Likert scale (Oihus et al., 2013) 31 10-point Likert scale, multiple-choice, open questions (Romero et al, 2017) --5-point ordinal scale (Runco et al., 2001) 23 5-point Likert scale (Shell et al., 2013) 28 5-point ordinal scale (Soroa et al., 2015) 30 6-point Likert scale (Susnea and Vasiliu, 2016) 20 5-point Likert scale ity, relevance, consistency, and completeness of the SCORE measurement instrument items. Based on the suggestions of the experts and the young people, changes in the wording and text formatting have been made to improve the preliminary version of the measurement instrument. Then, to evaluate the SCORE model in terms of reliability and validity concerning its measurement instrument, we conducted a case study following Yin (2009) and Wohlin et al. (2012) applying the self-assessment questionnaire in a one-shot posttest only design (without applying any treatment). We pooled the data collected at each school into a single sample for data analysis. Data were analyzed in terms of reliability and construct validity following the definition of Trochim & Donnelly (2018) and the scale development guide proposed by DeVellis (2016). In terms of reliability, we measured internal consistency through Cronbach's alpha coefficient (Cronbach, 1951). Construct validity was analyzed using exploratory factor analysis and based on evidence of convergent and discriminant validity, obtained through the degree of correlations of the items (DeVellis, 2016; Trochim and Donnelly, 2018). In addition, a factor analysis was used to determine how many factors underlie the set of items of the SCORE questionnaire, following the analysis process proposed by Brown (2006). The results of the statistical analysis were interpreted by researchers in the context of computing education to identify the reliability and validity of the SCORE measurement instrument, as well as to propose improvements to the SCORE measurement instrument.

Development of the SCORE Model
The objective of the SCORE (aSsessing COmputing cREativity) model is to evaluate the creativity skills of students in the context of computing education in K-12 from the student's perception. Based on the creativity definition and general assessment models in the literature, we decomposed the abstract concept of creativity into a set of factors as presented in Table 3.
The target audience is students from elementary to high school. The model can be applied in different ways, depending on the type of study and the research design chosen ranging from non-experimental studies, using one-shot post-test designs with specific applications and/or after the treatment or one-shot pre-test/post-test before and after treatment, as well as in (quasi-) experimental studies, involving control groups.
Aiming at the measurement of the degree of the skills defined in Table 3, a self-assessment questionnaire has been developed as a data collection instrument. We opted for this kind of assessment, as it is quick to administer and easy to score (Kaufman, 2019). Limitations of this type of assessment are associated with the respondents' subjective questions as answers idealized by them as desirable, untrue or exaggerated to appear to be better since many people do not perceive their own creative skills (underestimating or overestimating) or the personal concept of creativity. Yet, the credibility of creativity self-assessment depends on its use and can present a good approximation of consolidated tests based on performance measures (Kaufman, 2019). Likewise, the self-assessment of creativity can estimate how something impacts how a person feels about their creativity and, in many cases, represents the best possible measure when it comes to examining personal beliefs and insights about creativity itself. Thus, although there is no consensus in the literature, there is evidence that self-assessment can produce reliable, valid, and useful data (Ross, 2006), especially when using reliable and valid measurement instruments (Sitzmann et al., 2010). Therefore, as a compromise, we develop a statistically validated measurement instrument, increasing the validity and reliability of the data collected in the self-assessment (DeVellis, 2016; Kasunic, 2005).
The questionnaire items are defined based on the literature. Items related to skills not covered by any of the models found in the literature review are based on complementary references and/or our practical experiences. The items were carefully formulated, taking into account the target audience. As response format, we chose a 4-point Likert scale, typically used for cases in which the respondent should take a position, whatever it may be, regarding the item (Losby and Wetmore, 2012). To be able to overcome boundaries of accepted conventions and to not be afraid to make mistakes.
Hass and Burke, 2016; Oihus et al., 2013;Shell et al., 2013 Originality To be able to produce unique or unusual ideas. Auzmendi et al., 1996;McKlin et al., 2018;Runco et al., 2001 Fluency To be able to generate many ideas to evaluate, research, and choose different solutions to a problem. Auzmendi et al., 1996;Runco et al., 2001;McKlin et al., 2018 Flexibility To be able to produce ideas that show a diversity of possibilities, through different points of view or domains of thought. Auzmendi et al., 1996;Oihus et al., 2013;McKlin et al., 2018;Runco et al., 2001 Elaboration To take care of details, beautifying, and completing something creative to make something real, understandable, or aesthetically pleasing. Auzmendi et al., 1996 (Auzmendi et al.,1996); I enjoy discovering new things (Rahimi et al., 2011). 7 I am a curious person about how things work.
I am very curious. (Susnea and Vasiliu, 2016); Is inquisitive at an early age; is inquisitive (Hass and Burke, 2016); I am a curious person (Martins-Pacheco et al., 2020); I am curious about the unknown (Rahimi et al., 2011). 8 I can complete several things during the day.
Is productive (Hass and Burke, 2016) 9 I question beliefs, customs, and traditions, for example, not to go under the stairs to avoid bad luck.
Questions societal norms, truisms, and assumptions (Hass and Burke, 2016) Knowledge and skills expansion 10 I like to learn new things. It is important to me to continue my education throughout my life (Shell et al., 2013); I regularly read magazines or other material in a wide variety of subject areas; I often read books on topics outside my specialty (Shell et al., 2013) 11 I am not afraid to learn new things.
I'm not afraid to learn new things (Shell et al., 2013) 12 With the knowledge I have, I am able to solve a new problem.
I can adapt my previous skills to suit an unfamiliar task (Rahimi et al., 2011) 13 I like to participate in extracurricular activities to learn new things (field research, lectures, courses).
I sometimes take courses on topics about which I know nothing at all (Shell et al., 2013) 14 I go online several times to learn new things.
I regularly surf the Internet to expand my knowledge (Shell et al., 2013) 15 I like to discuss matters by giving my opinion.
Debating a controversial topic from my own perspective (Kaufman, 2012) Continued on next page  (Petty, 1997); I accept errors and therefore, I accept my mistakes and those of others (Romero et al., 2017).

Connection
19 I can discover relationships between the use of computers and their impact on society.
I can discover different links and relationships (obvious and not so obvious) when I look at different information sources; I can find the connection between items (Fields and Bisshof, 2013). 20 I can understand and interpret the type of problem to be solved, for example, how to do a math exercise.
Has the ability to understand and interpret his or her own environment (Hass and Burke, 2016)  Do you like going to the lab to do experiments? (Auzmendi et al., 1996); I want to develop my own game (Petty, 1997) 32 I try to solve a problem on my own before asking someone.
When you face a class of problems that you are not used to, what do you do? (Auzmendi et al.,1996) Continued on next page 34 I already did something using the computer that I never thought was possible.
I produced something in computing that I never thought was possible (McKlin et al., 2018) 35 I think it is important to think about things in many different ways.
It is important to be able to think of bizarre and wild possibilities (Runco et al., 2001) 36 I imagine many things that do not yet exist.
I invent/imagine a lot of things that not yet exist (Martins-Pacheco et al., 2020); is imaginative (Hass and Burke, 2016) 37 I like to modify computer programs from programs that other people have shared.
--38 I have ideas on how to make new games and how to improve them.
I have ideas about new inventions or about how to improve things (Runco et al., 2001); I am considering how I can further improve my computer game (Petty, 1997) Fluency 39 I can imagine different solutions to solve a problem (for example, how to get to school faster).
Coming up with a new way to think about an old debate (Kaufman, 2012); Has the ability to change direction and use another procedure (Hass and Burke, 2016); I am able to solve a problem in different ways (Martins-Pacheco et al., 2020); I can simultaneously propose a variety of solutions to a specific problem (Fields and Bisshof, 2013); I look for different solutions to a computing problem (McKlin et al., 2018) 40 I find it easy to write a story for a game.
Can you express your ideas well when you write? Do you find it easy to write narratives or stories? (Auzmendi et al., 1996); I find it easy to develop a strategy for a project (Rahimi et al., 2011). 41 I can write a computer program. Writing a ten-line poem would be easier for me (Auzmendi et al., 1996) 42 When I grow up, I would like to work with something that involves thinking about several new ideas.
Would you like a job where you often have to think of new ideas? (Auzmendi et al., 1996) 43 I can think of a list of things that require little money but can improve my school.
If you were invited to a city hall meeting to discuss problems in your city, would it be difficult to think of a list of problems?; Would it be difficult for you to help a school with limited resources to find new and interesting ideas for sports and games? (Auzmendi et al., 1996) 44 I am able to explain a computer program to colleagues.
If you are with a group of friends and they asked you to talk to them about your experience for an hour, how do you think you would do that? (Auzmendi et al., 1996)   I am good at combining ideas in ways that others have not tried. (Runco et al., 2001); I don't reject ideas with initial faults but find ways to make them work (Rahimi et al., 2011) 47 I can think of new ways to use a pan.
Are you able to find different uses for things, that is, uses that are uncommon for them? (Auzmendi et al., 1996) 48 I like to work on creating new things instead of doing repetitive exercises.
I like the kind of work that requires the creation and use of many new ideas. (Auzmendi et al., 1996) 49 I can find the materials I need to develop an idea.
I am resourceful and can find the materials I need (Rahimi et al., 2011) 50 If a certain resource is not available, I try to find a solution with other available resources.
A valuable solution that responds to the situation constraints. An efficient solution that required a limited number of resources (Romero et al., 2017) Elaboration 51 I care about the details when I do something.
How much do you care about details when you do something? (Auzmendi et al., 1996); I care about detail and work well done (Romero et al., 2017).

I pay attention to the colors and
fonts used on the screen of a mobile application.
When you are interested in something, how much attention do you pay to details? (Auzmendi et al., 1996) 53 After using an interesting mobile application, I like to talk to someone about it.
After watching a movie that impressed me, I think a lot about what happened in the movie and talk about it with someone (Auzmendi et al., 1996) 54 When I'm interested in something, I pay attention to every detail.
How concerned are you with details when you do something?; When you are interested in something, how much attention do you pay to details? (Auzmendi et al., 1996); I consider important to examine the details of a complex problem (Soroa et al., 2015) 55 When I do homework, I like to make it beautiful and decorated.
Has an appreciation for art, music, and so forth; has good taste (Hass and Burke, 2016) 56 I like to make the screens of games or mobile applications that I create beautiful.
--A preliminary version of the questionnaire was reviewed by an expert panel. The multidisciplinary panel was composed of 9 researchers from a background in computing, education, design, and/or microelectronics, and 3 representatives of the target audience (young people aged 11 to 15 years). The participants reviewed each item in the questionnaire for relevance and understanding. The questionnaire was also evaluated in terms of its completeness and consistency. Based on the feedback obtained, several items were changed, especially concerning their formulation, to improve understanding by the target audience, few items were removed, and others were decomposed into separate items for a better representation. As a result, a 56-item questionnaire as presented in Table 4 was defined.

Evaluation of the SCORE Model
To evaluate the reliability and validity of the measuring instrument of the SCORE model, we conducted a case study.

Definition of the Evaluation
The purpose of the evaluation has been to evaluate the reliability and validity of the selfassessment questionnaire as a measurement instrument. For this, the following questions are analyzed: Is there evidence of internal consistency in the measuring instrument?
• Is there evidence of convergent and discriminant validity in the measuring instru-• ment?
How do the underlying factors influence the responses of the items of the measur-• ing instrument?
Data were collected from the application of the questionnaire in a case study in K-12 (without the application of any specific treatment). Students answered a version of the questionnaire in Brazilian Portuguese available online only due to the pandemic. The study has been approved by the Ethics Committee of the Federal University of Santa Catarina.

Execution of the Evaluation
The questionnaire was applied from March to April 2020. A total of 76 K-12 students aged 8 to 17 years participated in the research in six schools in the south of Brazil (Table 5).

Is there Evidence of Internal Consistency in the Measuring Instrument?
Internal consistency indicates whether all parts of an instrument measure the same characteristic, which can be analyzed using Cronbach's alpha coefficient (Cronbach, 1951). Cronbach's alpha coefficient indicates the degree to which a set of items measures a single factor. Cronbach's alpha values between 0.7 and 0.8 are acceptable, between 0.8 and 0.9 are good, and greater than 0.9 are excellent indicating an internal consistency of the instrument (Cronbach, 1951). The analysis of the questionnaire's reliability shows a Cronbach's alpha coefficient α = 0.961, indicating an excellent internal consistency of the items. We also analyzed Cronbach's alpha for each item if excluded, expecting that no item exclusion would cause an increase in Cronbach's alpha (Table 6). These results also show that none of the items affects the internal consistency of the assessment instrument, and, therefore, there is no indication for the exclusion of any of the items.

Is there Evidence of Convergent and Discriminant Validity?
To obtain evidence of convergent and discriminant validity of the instrument, the correlations of the items were calculated (DeVellis, 2016). Convergent validity shows whether the items that should be related are related, while discriminant validity, on the other hand, shows whether the items that should not be related are not related.
Therefore, Spearman's nonparametric correlation matrix was used that shows Spearman's correlation coefficient (Daniel, 1990). To perform the analysis of the coefficients, Cohen's coefficient was adopted. A correlation between items is considered satisfactory when the coefficient is greater than 0.29, which indicates a moderate correlation (marked in green). A coefficient above 0.50 indicates a high correlation (marked in blue). A negative coefficient, shown in red, indicates a divergent correlation, which indicates that different factors are being measured (Cohen, 1988). The items related to the factor "Creative personality and curiosity" present moderate and high correlation as well as one item with a negative correlation. The item "IT9: I question beliefs, customs, and traditions, for example, not passing under the stairs to be unlucky", shows significant correlations only with item IT1 and presents a divergent correlation with items IT2, IT6, and IT8, indicating that it seems to measure another factor. "IT7: I am a curious person about how things work" demonstrates a good correlation with almost all other items, except for "IT5: I can think of new ways to help people" as shown in Table 7.
The items related to the factor "knowledge and skills expansion" show a moderate and high correlation, indicating that they measure the same factor. Some items show a good correlation with all other items, such as item "IT15: I like to discuss subjects giving my opinion". Only item "IT16: I know how to take advantage of praise and criticism when redoing school work", presents a divergent correlation with item IT15, as shown in Table 8. Table 7 The correlation coefficient of creative personality and curiosity   IT1  IT2  IT3  IT4  IT5  IT6  IT7  IT8  IT9 IT1  The items of the factor "connection" also demonstrate good results regarding their validity. Only item "IT20: I can understand and interpret the type of problem to be solved", shows a low correlation with the item "IT19: I can discover relationships between the use of computers and their impact on society", as illustrated in Table 9.
The factor "boldness" demonstrates a low correlation between its items. The items "IT27: When I find a very difficult problem, I have the courage to try to solve it" and "IT28: I am not afraid of failing" when compared to "IT23: I like to do things the way I want", even demonstrate a divergent correlation, indicating that they seem not to measure the same factor. "IT30: I am not ashamed to talk about my ideas", presents a moderate correlation only with item IT23, as illustrated in Table 10.
In general, the items of the factor "originality" show a moderate correlation. The item least correlated is "IT35: I think it is important to think about things in several different ways", demonstrating a moderate correlation only with the item "IT32: I try to solve a problem myself before asking someone", as shown in Table 11.
The factor "fluency" shows good results regarding its validity. Most of the item pairs have a moderate to high correlation, especially item "IT45: I have ideas for mobile applications that I could develop", demonstrating a high correlation with the items IT41, IT43, and IT44. Only item "IT41: I can write a computer program", does not have a significant correlation with the other items, as shown in Table 12.
The factor "flexibility" also demonstrates good results with all items having some moderate correlation. However, the item "IT49: I can find the materials I need to develop Table 9 The correlation coefficient of connection 0,466 0,349 1 IT22 0,412 0,591 0,498 1 an idea", demonstrates a low correlation, almost zero, with the item "IT48: I like to work creating new things instead of doing repetitive exercises", as shown in Table 13.
The factor "elaboration" shows a moderate to a high correlation between its items. Item "IT51: I care about the details when I do something", is the one least correlated with the other items, as illustrated in Table 14.
In general, the analysis of most factors shows a moderate and high correlation between their items, such as the factors "originality" and "fluency", which indicates a good internal correlation. Only the factors "creative personality and curiosity", "knowledge and skills expansion" and "boldness" had items with divergent correlation.   Yet, most items demonstrate a moderate to high correlation not only with the other items of the same factor but also with items of other factors. Examples include the item "IT12: With the knowledge I have, I am able to solve a new problem" or item "IT13: I like to participate in extracurricular activities to learn new things (field research, lectures, courses)", which are highly correlated with almost any other item. This also indicates the cohesion of the measurement instrument as a whole aiming at measuring ultimately one concept, creativity.

How do the Underlying Factors Influence the Responses of the Items of the Measuring Instrument?
A factor analysis was performed to confirm the number of factors that represent the 56 items of the instrument. To determine the number of factors to be retained in the factor analysis, the Cattell Scree Test was used, one of the most used techniques (Raîche et al., 2013). The test plots the factors in decreasing order in relation to the number of components. The interval between steep inclination and leveling, called "elbow", indicates the number of significant factors (Cattell, 1966). As illustrated in Fig. 2, the most significant change in the curve occurs between two and four factors, well below the 8 factors initially proposed. However, a sample of n = 76 is still considered small for factor analysis with several factors (Comrey and Lee, 1992). Thus, taking into consideration that the first factor is well highlighted, showing a predominant dimension, we decided to perform a factor analysis with one factor. According to Comrey & Lee (1992), factor loading values from 0.3 are considered acceptable, values below the cutoff point may indicate that they are not measuring the factor and need to be revised. The greater the factor loading of an item, the more it will be correlated with the factor. In general, the items presented a good factor loading > 0.6 for most items as presented in Table 15.  Reconsidering their correspondence to the specific application domain, we, therefore, suggest to exclude them taking also in consideration the results of the correlation analysis. Item IT9 could be understood to be more related to critical thinking, rather than creativity. Item IT23 may not be formulated in a way that it is understood correctly, and may erroneously be interpreted as someone who is inconvenient and does only what s/ he wants. Item IT30 may also be more related to the trait of an outgoing personality than with "boldness" as part of the creativity trait.

Discussion
The results of the analysis show that, in addition to the exclusion of the three items identified in the factor analysis, no further reformulation of the questionnaire is necessary.
The correlation matrix indicates that most items have a moderate to high correlation, such as "fluency" and "elaboration", with almost all items showing a correlation above 0.29. Only very few exceptions demonstrate even a divergent correlation, indicating that they do not measure the same factor. The item with the most divergent correlation is the "IT9: I question beliefs, customs, and traditions, for example, not to go under the stairs so as not to be unlucky" of the factor "creative personality and curiosity". Taking into consideration also its low factor loading below 0.3, we suggest its exclusion from the questionnaire.
The factor "boldness" demonstrates various items with a correlation below 0.29, also reflected through low factor loadings in the factor analysis with respect to "IT23: I like to do things the way I want to" and "IT30: I am not ashamed to talk about my ideas". Therefore, we also suggest the exclusion of these two items.
The factor analysis was performed with only one factor, due to the small sample size and the Scree Plot graph showing a predominant first factor. As a result, the SCORE model covers the most used factors in related assessment models, yet adding, unlike the other models, also items related to creativity in computing.
Despite a small sample, the analysis carried out indicates that the items, except the three items to be excluded, contribute to the measurement of the concept of creativity. Based on the results of the evaluation, we thus propose the exclusion of the three items resulting in a 53-items questionnaire.

Threats to Validity
Like any kind of research, this study's limitations may pose threats to its validity. Some threats are related to the design of the study. To mitigate this threat, we defined and documented a systematic research method. The SCORE model has been defined, decomposing the evaluation objective. The measuring instrument has been developed following scale and questionnaire development methods.
Another threat refers to the quality of the data pooled into a single sample, in terms of standardization of data (response format) and adequacy. As our study is limited exclusively to assessments that used the SCORE model, this risk is minimized as in all applications the same data collection instrument has been used. Another issue refers to the pooled data from different contexts. To mitigate this threat all case studies have been conducted in similar contexts.
A limitation of our study refers to the assessment of creativity. Adopting a nonexperimental research design (case study), only a post-test using self-assessment has been applied to evaluate the students' perceived skills. No pre-test has been applied and, therefore, it was not possible to accurately understand any skill differences promoted by computing education. However, regarding the self-assessment, although there is no consensus, there is evidence that self-assessment provides reliable, valid, and useful information for this type of study (Sitzmann et al., 2010), mainly when using a systematic, reliable, and valid assessment model.
A threat to external validity is related to the sample size and diversity of the data used for the evaluation. In respect to sample size, our evaluation used data collected from an application involving a population of 76 students from six different schools. In terms of statistical significance, this is a satisfactory sample size, allowing the generation of reasonable results (Wohlin et al., 2012).
In terms of reliability, a threat refers to what extent the data and the analysis are dependent on the specific researchers. To mitigate this threat, we systematically documented the evaluation of the SCORE model, clearly defining the study objective, the process of data collection, and the statistical methods used for data analysis. Another issue refers to the correct choice of statistical tests for data analysis. To minimize this threat, we performed a statistical evaluation following the guide for the construction of measurement scales as proposed by DeVellis (2016), which is aligned with procedures for the evaluation of internal consistency and construct validity of a measurement instrument (Trochim and Donnelly, 2018).

Conclusions
This article presents a model for the self-assessment of creativity in the context of computing education in K-12. Unlike other models, SCORE covers all factors defined by prominent frameworks also adding items related to computing proficiency concerning the specific context of computing education. The evaluation of the SCORE model, based on a total of 76 responses, indicates high internal reliability (Cronbach 's alpha = 0.961). Results regarding its validity also show that most items demonstrate a moderate to high correlation. Furthermore, the results of a factor analysis considering one single factor due to the small sample size, suggest the exclusion of three items, resulting in a 53-item questionnaire. We are currently planning to continue the evaluation in future case studies amplifying the application of the assessment model as we believe that SCORE is an important instrument to promote the development of creativity also in the Brazilian education context. To contribute in this respect, the instrument and analysis spreadsheet in English and Brazilian Portuguese are available online: https://www.computacaonaescola.ufsc.br/en/score/.