We investigate the possibility to apply a known machine learning algorithm of Q-learning in the domain of a Virtual Learning Environment (VLE). It is important in this problem domain to have algorithms that learn their optimal values in a rather short time expressed in terms of the iteration number. The problem domain is a VLE in which an agent plays a role of the teacher. With time it moves to different states and makes decisions which regarding action to choose for moving from current state to the next state. Some actions taken are more efficient than others. The transition process through the set of states ends in a final (goal) state, one which provides the agent with the largest benefit possible. The best course of action is to reach the goal state with the maximum return available. This paper introduces a way of definition of a rewards matrix, which allows the maximum tolerance for the changes of a discounted reward value to be achieved. It also proposes way of an application of the Q-learning that allows a teaching policy to exist, which maps the situation in the learning environment.