Teaching Machine Learning in School: A Systematic Mapping of the State of the Art

. Although Machine Learning (ML) is integrated today into various aspects of our lives, few understand the technology behind it. This presents new challenges to extend computing education early to ML concepts helping students to understand its potential and limits. Thus, in order to obtain an overview of the state of the art on teaching Machine Learning concepts in elementary to high school, we carried out a systematic mapping study. We identified 30 instruc - tional units mostly focusing on ML basics and neural networks. Considering the complexity of ML concepts, several instructional units cover only the most accessible processes, such as data management or present model learning and testing on an abstract level black-boxing some of the underlying ML processes. Results demonstrate that teaching ML in school can increase understanding and interest in this knowledge area as well as contextualize ML concepts through their societal impact.


Introduction
Artificial Intelligence (AI) has become part of our everyday life deeply impacting our society. For many countries, it has also become a major strategy to promote national competitiveness (Hiner, 2017). And, as the growth of lucrative AI career opportunities far outpaces the number of interested and capable job seekers, there is a growing need for AI-literate workers (Forbes, 2019).
Although the existence of AI is well known, hardly anybody understands the technology behind it (Evangelista et al., 2018). This lack of understanding also causes a misplaced fear about automation and AI, overshadowing its potential positive impact on society. Therefore, it is important to popularize a basic understanding of AI technologies (Touretzky et al., 2019a). This presents new challenges to computing educa-tion, providing students starting at an early age with an understanding of AI concepts to become not just consumers of AI, but creators of intelligent solutions (Touretzky et al., 2019b;Kandlhofer et al., 2016). Access to basic AI literacy can also reduce the danger of social or economic exclusion of certain groups of people, especially women and minorities. Furthermore, AI literacy may encourage more students to consider STEM careers and provide a solid preparation for higher education and their future career.
While there are many programs today that focus on coding and robotics, K-12 education still needs to embrace the teaching of AI concepts. According to AI4K12 (Touretzky et al., 2019c), this should cover five big ideas for a K-12 audience: perception, representation and reasoning, learning, natural interaction, and societal impact. Within this context, an important knowledge area is Machine Learning (ML) (Wollowski et al., 2016;Touretzky et al., 2019a). Machine Learning is the application of AI that provides systems the ability to automatically learn and improve from experience without being explicitly programmed (Royal Society, 2017). It powers a huge range of applications, from speech recognition systems to intelligent assistants, self-driving cars, healthcare, etc.
Teaching fundamental AI (including Machine Learning) concepts and techniques has traditionally been done only in higher education (Torrey, 2012;McGovern et al., 2011). And, although computing education is beginning to be included in K-12 education worldwide, these computing programs rarely cover AI content on this educational stage (Hubwieser et al., 2015). However, in recent years several initiatives and projects pursuing the mission of K-12 AI education have emerged. In this context, the AI for K-12 Initiative (Touretzky et al., 2019c) started to develop guidelines for K-12 AI education. The guidelines are based on a set of big ideas, including teaching computers to learn from data, the challenges involved in making AI agents interact naturally with humans, and the positive and negative effects of AI on society. New AI courses, tools, and tutorials are being launched for teaching AI in schools, in the USA, China, the UK, and elsewhere.
Yet, these efforts seem to be scattered, making it difficult to obtain an overview on existing instructional units, as existing reviews on teaching computing focus mostly on computational thinking (Lye and Koh, 2014;Grover and Pea, 2013;Heintz et al., 2016;Google, 2016), or related knowledge areas such as Software Engineering (da Cruz Pinheiro et al., 2018). Literature providing an overview on how to teach AI/ML in K-12 is basically nonexistent, as surveys on practices and teaching of AI by focuses on higher education only (Wollowski et al., 2016).
Thus, in order to analyze the question of whether and which instructional units are currently available for teaching Machine Learning in K-12, we conduct a systematic mapping study. The main contribution of this article is the mapping and synthesis of the characteristics of instructional units (IUs) for ML education from elementary to high school, regarding their content, context and the analysis of how they were developed and evaluated. Our results also show that it is possible and beneficial to introduce ML education in K-12. The overview can help instructors to select and/or curriculum developers to develop instructional units and we hope that the discussion can further foster the inclusion of ML education in K-12.

Artificial Intelligence Education in K-12
Although there have been some historical AI teaching initiatives in schools from the 1970s (Papert & Solomon, 1971;Kahn, 1977) and, even specifically involving neural networks, in the 1990s (Bemley, 1999), there has been a rapid expansion of computing education in K-12 worldwide over the last few years. Standardization of what K-12 students should know about computing has been supported by the development of several curriculum guidelines, such as the CSTA K-12 Computer Science Framework (CSTA, 2017). Many instructional units, software tools, and resources have been developed to make computing accessible for young students ranging from one hour of code programming exercises (code.org) to courses allowing them to learn core computing concepts while creating meaningful artifacts that have direct impact on their lives and their communities (Tissenbaum et al., 2019).
At the same time, AI has had an increasing impact on society. And, although, some countries, such as China has mandated that all high school students learn about artificial intelligence (Jing, 2018), AI education to K-12 students is still not well-defined. Existing computing curriculum guidelines such as the CSTA K-12 Computer Science Framework (CSTA, 2017) commonly only cite AI very briefly on the high school level.
In this context, the AI for K-12 Working Group (AI4K12), a joint initiative of the Association for the Advancement of Artificial Intelligence (AAAI) and the Computer Science Teachers Association (CSTA) aims at developing guidelines for teaching K-12 students about artificial intelligence. To frame these guidelines, "big ideas" in AI that every student should know are defined (Touretsky et al., 2019a): 1. Perception: Computers perceive the world using sensors. Students should understand that machine perception of spoken language or visual imagery requires extensive domain knowledge. 2. Representation and Reasoning: Agents maintain models/representations of the world and use them for reasoning. Students should understand the concept of representation and understand that computers construct representations using data, and these representations can be manipulated by applying reasoning algorithms that derive new information from what is already known. 3. Learning: Computers can learn from data. Students should understand that machine learning is a kind of statistical inference that finds patterns in data. 4. Natural Interaction: Making agents interact naturally with humans is a substantial challenge for AI developers. Students should understand that while computers can understand natural language to a limited extent, at present they lack the general reasoning and conversational capabilities of even a child. 5. Societal Impact: AI applications can impact society in both positive and negative ways. Students should be able to identify ways that AI is contributing to their lives as well as that the ethical construction of AI systems requires attention to the issues of transparency and fairness.
Thus, while AI is "the science and engineering of making intelligent machines that have the ability to achieve goals as humans do", Machine Learning (ML) is a subfield of AI dealing with the field of study that gives computers the ability to learn without being explicitly programmed (Mitchell, 1997). ML algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task. In accordance with AI4K12, Machine Learning concepts to be covered in K-12 education should include (Touretzky et al., 2019c): Limitations of machine learning. • And, although, currently there are significant efforts underway to address the need for AI curriculum guidelines (ISTE, 2018) (AI4ALL, 2018), unlike the general subject of computing, when it comes to AI, there is still little guidance available for teaching at the K-12 level.

Machine Learning
Machine Learning is the training of a model from data that generalizes a decision against a performance measure (Mitchell, 1997).
ML algorithms can be classified into several broad categories by their learning style (Goodfellow et al., 2016). In supervised learning, the algorithm builds a mathematical model from a set of data that contains both the inputs and the desired outputs. Classification algorithms and regression algorithms are types of supervised learning. In semi-supervised learning, a combination of labeled data and unlabelled data is used in order to make better predictions for new data points than by using the labeled data alone. In unsupervised learning, the algorithm builds a mathematical model from a set of data that contains only inputs and no desired output labels. Unsupervised learning algorithms are used to find structure/patterns in the data, like grouping or clustering the data points into categories. Reinforcement learning algorithms are given feedback in the form of positive or negative reinforcement in a dynamic environment and are used, e.g., in autonomous vehicles.
Building ML applications is an iterative process that involves a sequence of steps, which typically include (Amazon, 2019): 1. Requirements analysis. During this stage, the main objective of the model and its target features are specified. This also includes the characterization of the inputs and expected outputs, specifying the problem.
2. Data management. During data collection, available datasets are identified and/or data is collected. This may also include the selection of available generic datasets (e.g., ImageNet for object detection), as well as specialized datasets for transfer learning. The type of data depends on the machine learning task (e.g., images, sound, text, etc.). They also vary greatly in terms of the number of instances ranging from a few hundred to more than a billion instances. The data is prepared by validating and cleaning the data and can also be preprocessed transforming the raw data. Data sets may be labeled in supervised learning by augmenting each piece of unlabeled data with meaningful tags manually assigned by users. The data set is typically split into a training set to train the model, validation set to select the best candidate from all models and a test set to perform an unbiased performance evaluation of the chosen model on unseen data (Ripley, 2008). 3. Feature engineering. Often, the raw data (input variables) and answer (target) are not represented in a way that can be used to train a machine learning model. Therefore, feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. This may include feature transformation, feature generation, selecting features from large pools of features among others. 4. Model learning. Then a model is built or more typically chosen from well-known models that have been proven effective in comparable problems or domains (e.g., (ModelZoo, 2019)) by feeding the features/data to the learning algorithm. The quality of the model(s) is evaluated in order to understand how to iteratively improve its performance (e.g., in terms of high accuracy, lower error) by testing the model against previously unseen data (Tharwat, 2019). Hyperparameters, such as the number of training steps, learning rate, initialization values, and distribution, etc. are finetuned in order to improve performance. 5. Model evaluation. The quality of the model is evaluated in order to test the model providing a better approximation of how the model will perform in the real world, e.g., by analyzing the correspondence between the results of the model and human labeling. 6. Model deployment. During the production/deployment phase, the model is deployed into a production environment to apply it to new incoming events in real-time.
There are a number of programming languages that are popular for machine learning. Among them, Python is the most popular language followed by Java, R, and C++ (Tricon Infotech, 2019). Especially in the context of K-12 computing education, block-based programming languages are used (Weintrop, 2019). These environments improve learnability for novices by favoring recognition over recall; reducing cognitive load by chunking computational patterns into blocks; and using direct manipulation of blocks to prevent errors and enhance understanding of program structure (Bau et al., 2017). Several of these block-based programming environments also provide extensions for the development of machine learning solutions, such as for App Inventor, Scratch or SNAP!.

Development and Evaluation of Instructional Units
An instructional unit is a set of classes (courses, workshops, etc.) designed to teach certain learning objectives to a specific target audience. It consists of a set of instructional materials for both teachers and students designed to provide learning opportunities in a specific context (Hill et al., 2005).
Instructional units are typically developed in a systematic way using instructional design (Branch, 2009), in order to make the acquisition of competencies more efficient, effective, and appealing. Instructional design defines an iterative process of planning learning objectives, selecting instructional strategies, selecting or creating instructional material, and applying and evaluating instructional units. During the analysis phase, the learning needs are identified. As part of the analysis, the goals and objectives of the instructional unit are determined and the target audience is characterized. Other influencing factors, such as human and technical resources, infrastructure, cost and time, are also analyzed. During the design phase, the learning objectives of the instructional unit are specified. The content to be addressed is defined and sequenced, and the instructional methods to be used are defined. Instructional methods may include lectures, demonstrations, exercises, problem-solving activities (labs), online interactive tutorials, serious games, unplugged activities, etc. It is also defined how the students' learning will be assessed. During the development phase, the material that will be used during the instructional unit is selected and/or created in accordance with the defined instructional strategies. This step may also involve the selection and/or development of tools to support the instructional unit such as code analyzers. The implementation phase covers the preparation of the learning environment, the training of the instructors and the application of the IU in the classroom.
An essential step in the instructional design process is the evaluation of the instructional unit in order to assess its quality and whether it allows the students to achieve the defined objectives (Branch, 2009). This evaluation is typically performed through an empirical study (Wohlin et al., 2012), ranging from non-experimental studies (such as case studies) to experiments (Shadish et al., 2002). Several types of data collection instruments can be used, such as observation, questionnaires, interviews, or the artifacts created by the students themselves as well as test results (Branch, 2009). According to the objective of the evaluation and the nature of the data collected, different methods of qualitative or quantitative analysis can be used (Freedman et al., 2007). The analyzed data are then interpreted, answering the analysis questions in order to achieve the evaluation goal.

Definition and Execution of the Systematic Mapping Study
To elicit the state of the art and practice on whether and how Machine Learning education is addressed from elementary to high school, we conducted a systematic mapping study following the procedure proposed by Petersen et al. (2008).

Definition of the Review Protocol
The research question is: What instructional units exist for teaching Machine Learning concepts in the context of elementary to high school (and what are their characteristics)? This research question is refined in the following analysis questions: AQ1. Which IUs exist? AQ2. Which Machine Learning concepts are taught in the IUs? AQ3. What are the instructional characteristics of the IUs? AQ4. How were the IUs developed and how was the quality of the IUs evaluated?
Inclusion/exclusion criteria. We considered any instructional unit (course, activity, tutorial) that focuses on computer teaching including ML concepts in elementary to high school published between 2009 and 2019. Instructional units that focus on teaching ML in higher education and/or instructional units for computing teaching without addressing ML concepts were excluded. We also excluded publications such as blogs, videos, or tools that do not provide an instructional unit.
Quality Criteria. We considered only articles or material for which substantial information regarding the teaching of ML concepts, indicating, for example, lesson content, instructional material, etc. were freely available.

Data source.
We examined all published English-language articles or material that are available on the Web via the most important digital libraries and databases in this field (including ACM Digital Library, IEEEXplore, Scopus) with free access through the CAPES Portal 1 . To increase coverage, we also used Google, which indexes a large set of data across several different sources (Haddaway et al., 2015), as in this emergent area several instructional units have not been published as scientific articles. Observing also the research focus at the MIT media lab in this area, we also searched for publications of this research group. We have also included secondary literature that has been discovered based on the primary literature found in order to obtain more detailed information.
Definition of the search string. The search string was composed of concepts related to the research question, including also synonyms, as indicated in Table 1. From these keywords, the search string was calibrated and adapted according to the specific syntax of the data source as presented in Table 2: (teach* OR education OR course OR MOOC OR learn*) AND ("machine learning" OR "data science" OR "artificial intelligence" OR "deep learning") AND ("k-12" OR school* OR kids OR children OR teen*)

Search Execution
The search has been realized in October 2019 by the first author and revised by the co-authors (Table 3). Several searches returned a large number of results even after a calibration of the search string. This is due to the fact that articles describing how to use AI techniques for education, such as learning analytics for personalized learning, correspond to the same search terms. Therefore, maintaining the search string we limited the analysis to only the most relevant ones.
In the first analysis stage, we quickly reviewed titles and abstracts to identify papers that matched the inclusion criteria, resulting in 98 potentially relevant artifacts. In the second stage, the materials were fully read to check their relevance with respect to our inclusion/exclusion criteria. Many articles were excluded due to their focus on using AI for education, or their focus on "deep learning" as a cognitive activity in the learning process. We also excluded artefacts related to other educational stages (pre-school or higher education) (Williams et al., 2019a;Williams et al., 2019b;Park et al., 2019;Bennett, 2017;Estevez et al., 2019) and the ones covering AI, but not machine learning (e.g., (CSUnplugged, 2015;AI4ALL, 2019;Ali et al., 2019;Parsons and Sklar, 2004;MIT, 2019)). Furthermore, we excluded material only consisting of videos explaining ML (CS4fn, 2019) or tools ( (Agassi et al., 2019;Makeblock, 2019)) or demos (such as Google Teachable Machine (Google, 2017). We also excluded articles that do not provide substantial information on the instructional unit on Machine Learning (e.g. (Kandlhofer et al., 2016)). Duplicates were eliminated and articles describing the same instructional unit were unified. As a result, 30 instructional units were considered relevant, as shown in Table 4.

Which Instructional Units Exist?
As a result of the research, a total of 30 instructional units covering the teaching Machine Learning in elementary to high school were identified (Table 4). Some instructional units focus exclusively on Machine Learning, whereas others approach ML concepts as part of a more comprehensive AI and/or programming/software engineering course.    This shows that so far very few IUs approach Machine Learning education in K-12. Most of them are also very recently due to the increasing importance of AI/ML as well as the increasing trend of computing education in K-12 worldwide (Fig. 1).

Which Machine Learning Competencies are Taught in the IUs?
The IUs teach competencies varying from presenting what is ML, to specific ML techniques as well as the impacts of ML. Among the topics most frequently approached by the IUs are artificial neural networks and an introduction to what is learning (Fig. 2). Several IUs also present other ML algorithms such as decision tree and/or instancebased algorithms typically using unplugged activities. A few IUs also approach the topic of social implications and ethical concerns.
The majority of the IUs focuses on supervised learning algorithms (Fig. 3), only very few approach other types of learning.
And, although several IUs approach the topic of neural networks, they typically present this content in an abstract way and/or through practical applications. We also observed that the degree of abstraction of the ML concepts varies between the IUs. Whereas some IUs only teach a general understanding of ML mechanisms and its applications, most IUs cover one or more ML algorithms typically by presenting an example, demonstration or hands-on activity in order to provide a deeper understanding.
A general strength observed in the encountered IUs is their strong focus on demonstrating the application of ML in practice, typically presenting various application examples in order to gain the attention of the students (Fig. 4). This includes mainly the demonstration of the application of ML for classification in computer vision tasks, such as facial or gesture recognition (Hitron et al., 2019) for diverse domains, including recycling, biology, etc. Several IUs present various application domains (e.g., (Zhu, 2019)) including also sentiment analysis for examples of tweets, conversational AI (e.g., creating Alexa skills (Van Brummelen and Abelson, 2018)), robotics or games (e.g., (Zhu, 2019)). Some units also integrate ML into robotics activity, such as creating a selflearning lawn bowling robot (Ho and Scadding, 2019) or running toy cars on a physical track (Narahara and Kobayashi., 2018).  The IUs also vary largely in terms of levels of learning they are designed to achieve in accordance with Bloom's Taxonomy (Bloom et al., 1956). Several instructional units focus exclusively on lower learning levels (remembering and understanding), whereas some IUs also approach the level of synthesis taking students to create their own ML model. On this level, various IUs adopt a computational action approach (Tissenbaum et al., 2019) aiming at the development of an ML solution for a problem in the community (AI Family Challenge, 2019; Apps For Good, 2019b). Few IUs approach the highest level of learning evaluation by making judgments based on evidence of different ML models or techniques and/or how training data influences learning.
Observing the complexity of ML concepts, several UIs cover only the most accessible processes, such as data management (such as (Mobasher et al., 2019) (Srikant and Aggarwal, 2017)). On the other hand, a considerable number of UIs also cover model learning and testing, yet, on very different levels of depth. Most of these IUs present several ML concepts only on an abstract level black-boxing some of the underlying ML processes. In these cases, the model learning process may be approached by only executing a pre-defined model learning process without any need for further interaction (e.g. (ML4Kids, 2019)). Very few IUs systematically introduce ML performance measures, such as a correctness table, confidence graph, presenting accuracy often in a more superficial way. Only a small number of IUs also include the deployment of the created ML models, for example as part of games of mobile applications.
Different ML frameworks or tools are used on this educational stage visioning the abstraction of several stages and complexity of ML models (Fig. 6). For example, ML4kids (2019) provides an abstract interface permitting young people to easily train a neural network. On the other hand, several IUs directly use general ML frameworks such as TensorFlow and Jupyter Notebooks that are not specifically developed for this educational stage.
As typically used in computing education in K-12, IUs on ML also adopt predominantly block-based programming languages such as Scratch (6 IUs), Snap! (1 IU) or App Inventor (5 IUs). Six IUs also directly use Python.
Hands-on activities of the IUs mostly work with image data for classification tasks. These vary from paper images in unplugged activities to digital images ranging from Disney princesses and faces to chocolate chip cookies.  Some units focusing on sports-related themes also use time series of images/acceleration graphs for the classification of gestures. For example, by applying ML to sports, students collect data from their own bodies using wearable sensors playing softball (Zimmermann-Niefield et al., 2019a). Several IUs adopting a computational action approach (Tissenbaum et al., 2019) in open-ended project-based activities leave the type of image used open depending on students' choice of the application domain. Other IUs also use datasets based on texts (e.g. tweets), audio clips, genes, etc. During the IU presented by Sakulkueakulsuk et al. (2018), students collect data on features (skin color, texture, etc.) of mango fruits.

What are the Instructional Characteristics of the IUs?
As the teaching of ML competencies is currently not typically included in computing education, the majority of the IUs are proposed as extracurricular activities, workshops, courses, summer camps, challenges or individual activities. Only MIT (2019)   Most of the IUs are focused on teaching ML in high school (Fig. 9). Also, several UIs are available for elementary and/or middle school level indicates that the insertion of ML education can be beneficial already on these earlier educational stages.
Very few IUs focus on specific groups of students such as girls (Vachovsky et al., 2016), underrepresented groups in computing by targeting economically disadvantaged, African American, Hispanic, and female students (Mobasher et al., 2019) or specifically at further education (Apps for Good, 2019a) (Apps for Good, 2019b). The AI Family Challenge (2019) is designed for families, teaching AI not only to the children but also to other family members.
The duration of the IUs varies largely from short and focused activities (45 minutes) to long-term courses of 100 hours, yet, with the majority being rather short units of few lessons. Several initiatives also offer instructional units of different durations, such as a one-day taster workshop (Apps for Good, 2019a) as well as a 12-sessions course (Apps for Good, 2019b).
With respect to the instructional methods, there is a strong predominance of active learning approaches aiming at the achievement of learning objectives on the application level. These range from tasks with a well-defined specification of the tasks for which an expected solution exists to tasks with ill-defined problems without a previously known solution, which aims at a higher cognitive level to take the students to create their own practical solution.
We also encountered a considerable number of IUs using unplugged activities adopting diverse materials for activities teaching mostly data management (partly supported by spreadsheet tools) or decision tree algorithms (e.g., (Curiositymachine, 2019)). Other activities also explore how biology and specifically animal brains can be the inspira- tion for a new way to program computers using paper cards (CS4FN, 2011). Another unplugged example is "Be the machine" (Fryden curriculum, 2019), a team role-playing game that teaches how ML works, in which each member of the team assumes a different role to manually train an ML model.
Although focusing more on active learning, several IUs also include other direct instructional methods such as lectures, videos, and demonstrations, especially in the initial part of the IU as well as the foundations of neural nets (Fig. 10). Examples include the Digit Classifier Tool, Drawing Completion Tool, Teachable Machine, and Tensorflow Playground. Interactive methods such as challenges and discussions were also used. Apps for Good (2019a) also study cases to achieve an understanding of ML. (Vachovsky et al., 2016) and (Mobasher et al., 2019) also included invited talks with professionals from IT companies and/or field trips in order to amplify the students' perspective on ML.
According to this variety of instructional methods, several types of instructional material are adopted (Fig. 11). Instructional videos, tutorials, etc. are specific to IUs designed as online courses. Several IUs also use worksheets to record the students' experiences. However, in general, we observed a lack of information regarding the instructional material, their availability and license, which makes it difficult for others to use them. With only one exception the materials are available in one language only (predominantly in English), which may also limit a broader adoption of IU in other countries that require instructional material in the native language at this educational stage.
The majority of the IUs does not cover the assessment of the students' learning. Only AI Family challenge (2019) and AIinSchools (2019) propose a rubric/assessment sheet for a performance-based assessment analyzing artifacts created by the students. Sakulkueakulsuk et al. (2018) allocate scores based on the accuracy of the ML models developed. As an alternative, AI Family challenge (2019) and Elements of AI (2019) also adopt quizzes or exercises for the students' assessment.

How Were the IUs Developed and Evaluated?
To achieve effective learning outcomes, IUs need to be developed systematically following instructional design models. However, we observed a general lack of information in relation to the way the IUs were developed. Very few publications mention any information on this issue. Most IUs were evaluated by means of a case study (Fig. 12). In these studies, the evaluation was systematically defined and, during and after the treatment (teaching ML), data was collected in relation to the objective of the evaluation. Only one study adopted a more rigorous research design. Hitron et al. (2019) conducted an experiment comparing the students' understanding in three conditions: learning activity uncovering Data Labeling only, Evaluation only, or both. Two IUs indicate a more informal way of evaluation (ReadyAI, 2019) (Sperling and Lickerman, 2012), without detailed definition. In addition, no information on evaluation was being encountered for a considerable number of IUs.
Most studies evaluate more than one quality factor (Fig. 13). Learning is the most evaluated quality factor. This shows that, in fact, the main concern is the learning ef- Fig. 11. Types of instructional material used. fect provided by the IUs. Several studies also assess the degree of interest in a STEM/ computing career motivated by the IU. Besides evaluating the impact of the IUs, several evaluations also included the measurement of feedback on the IU itself as well as the observed strengths and weaknesses.
Data regarding the evaluation is collected in several ways (Fig. 14). Most of the data is collected via questionnaires at the end of the IU. Few studies also extract data based on the performance-based assessment of artifacts created by students during the IU, tests, interviews or observations.
Taking into consideration the less rigorous research designs adopted, most studies only perform qualitative data analyses and/or descriptive quantitative analyses. Only three studies report the usage of statistical tests (Cognimates.me, 2019;Vachovsky et al., 2016;Hitron et al., 2019). Evaluations were performed with samples ranging from 9 to 7500+ participants, but the majority with rather small samples with less than 50 par-  In general, we observed a lack of information provided on how the IUs were developed and evaluated indicating the need for a more systematic adoption of methods for the development of such instructional units.

Discussion
Considering the recentness of ML, we were surprised to encounter already 30 instructional units aiming at teaching ML concepts in schools. Observing, that most of these have been developed in 2019 we also expect this number to further increase in the near future.
These IUs mostly focusing on beginners at any educational stage from elementary to high school also indicates the recognition of an early exposure of students to ML concepts, not limited only to high school as typically indicated by general computing curriculum guidelines.
Being an emergent topic, most of the IUs are proposed as extracurricular units ranging from 1-hour taster workshops to semester-long courses. Providing diverse instructional materials available for free they also facilitate their application in practice. Several IUs also provide customized frameworks and tools in order to teach ML at this educational stage using e.g., block-based programming environments. However, as so far most IUs are only available in English, this may hinder their direct application in other countries. Another issue is an almost complete lack of information on the assessment of the students' learning, which is important as feedback to the learner and instructor in order to guide the learning process. The IUs teach competencies varying from presenting what is ML, to specific ML techniques as well as the impacts of ML. However, we observed that several IUs present ML concepts only on an abstract level, black-boxing some of the underlying ML processes even as part of hands-on activities in order to reduce complexity. However, in some cases, this high level of black-boxing may limit the students to explore and construct mental models on ML (Hmelo and Guzdial, 1996) as also pointed out by Hitron et al. (2019). Therefore, adopting non-black-boxed processes may be imperative to acquire an effective understanding of ML. On the other hand, considering the complexity of ML, it is also important to not overwhelm novice learners (Resnick et al., 2000). Therefore, it will be important to identify a balance between black-boxed processes and uncovered processes as well as a learning sequence based on the complexity of the concepts. As some of the ML concepts seem more accessible than others it seems important to analyze their difficulty using statistical methods such as the Item Response Theory (DeMars, 2010) in order to systematically guide the scaffolding process.
A general strength observed in the encountered IUs is their strong focus on demonstrating the application of ML in practice, typically presenting various application examples in order to gain the attention of the students. Furthermore, several IUs also covers the learning of how to apply ML concepts to practical problems with respect to the most diverse tasks from the context of the students, ranging from the classification of Disney princesses to the feature extraction of mango fruits for classification. However, only a few IUs go so far to guide the students to develop their own ML solution for a problem in the community adopting a computational action approach (Tissenbaum et al., 2019).
In addition, it is possible to observe the existence of a concern with social aspects involved in the application of AI concepts during the practical activities. Some studies lead the student to reflect on the usage of AI in of today's society (Elements of AI, 2019; Tang, 2019). Others address moral issues and the impact of AI on humans (AIinSchools, 2019; Apps for Good, 2019a; ReadyAI, 2019; Touretzky et al., 2019c). Some studies also focus on the democratization of Machine Learning/Artificial Intelligence teaching, in order to impact society not only on content but on the approach used, seeking to involve minorities (Mobasher et al., 2019) (Vachovsky et al., 2016. (Van Brummelen, 2019).
Another issue we observed is the lack of support for the training of instructors in order to prepare them adequately for the application of the IUs in the classroom. Besides a few IUs providing lesson plans and guides no further training is provided as part of the IUs. Taking into account that today there is a lack of K-12 teachers with computing background, most computing education is applied in a multidisciplinary way by teachers trained in other disciplines. Therefore, the motivation and training of in-service teachers become essential for a larger-scale adoption of ML education. This includes not only computing and ML knowledge but also knowledge of relevant pedagogical and technological content.
In general, we observed a lack of systematic presentation of the IUs and the way they were developed and evaluated. As many have not been published as scientific articles, no further information on their impact is available, which leaves the achievement of the learning goals questionable. However, considering the recentness of this topic, we expect more rigorous studies soon observing the large increase of IUs just this year. The systematic development of such IUs will also further supported by the development of curriculum guidelines currently underway.
Threats to validity. Some threats may affect the validity of our mapping study. We, therefore, identified potential threats and applied mitigation strategies in order to minimize their impact. Systematic mappings may suffer from the common bias that positive outcomes are more likely to be published than negative ones. However, we consider that the findings of the articles, whether positive or negative, have only a minor influence on this systematic mapping since we sought to characterize the approaches rather than analyze their impacts on learning.
Another risk is the omission of relevant studies. In order to mitigate this risk, we carefully constructed the search string to be as inclusive as possible, considering not only core concepts but also synonyms. Furthermore, considering the recentness of the topic studies, we also searched for any IU available online, not only considering scientific articles, in order to reduce the risk of excluding existing IUs. On the other hand, our observation that most IUs are available in one language only (predominantly in English), may be due to the fact that based on our search using an English search string only returned IUs available in English.
Threats to the selection of relevant IUs and data extraction were mitigated by providing a detailed definition of inclusion/exclusion and quality criteria. We defined and documented a rigid protocol for the study selection and all authors performed the selection together, discussing the selection until consensus was reached. Data extraction was hindered in some cases, as the relevant information was often not presented explicitly and, therefore, in some cases had to be inferred. However, this inference was made by the first two authors and carefully reviewed by the third author.

Conclusion
In this article, we present the state of the art and practice of teaching Machine Learning in elementary to high school. We have identified 30 IUs mainly focused on beginners for any of these educational stages. The results of our review indicate the importance of this topic to the rapid increase of IUs developed this year. Being an emergent topic, most of the IUs are proposed as extracurricular units ranging from 1-hour taster workshops to semester-long courses. The IUs teach competencies varying from presenting what is ML, to specific ML techniques as well as the impacts of ML with an emphasis on artificial neural networks. Observing the complexity of ML concepts, several UIs cover only the most accessible processes, such as data management or cover model learning and testing on an abstract level black-boxing some of the underlying ML processes. The IUs provide diverse instructional materials available for free as well as customized frameworks and tools in order to teach ML at this educational level, using e.g., block-based programming environments as well as Python and general ML frameworks. As a result of our study we, thus, expect to contribute to the mapping of these emergent IUs, facilitating the teaching of ML in practice. However, observing a lack of teacher training and more information on the development and evaluation of these IUs, it also becomes obvious that there is a need for further research in this area.