Prior programming knowledge of students has a major impact on introductory programming courses. Those with prior experience often seem to breeze through the course. Those without prior experience see others breeze through the course and disengage from the material or drop out. The purpose of this study is to demonstrate that novice student programming behavior can be modeled as a Markov process. The resulting transition matrix can then be used in machine learning algorithms to create clusters of similarly behaving students. We describe in detail the state machine used in the Markov process and how to compute the transition matrix. We compute the transition matrix for 665 students and cluster them using the k-means clustering algorithm. We choose the number of cluster to be three based on analysis of the dataset. We show that the created clusters have statistically different means for student prior knowledge in programming, when measured on a Likert scale of 1-5.