A gentle introduction to reinforcement learning.

31 May, 2022 • 7 min read

You get a promotion for performing excellently at work, awarded for academic performance or receive tax credits for charity donations. Somehow humans have been able to translate even the most basic things in life into mathematical equations and formulas. This idea of being rewarded for good behaviour explains the idea behind reinforcement learning.

The easiest way to describe reinforcement learning is a dog being rewarded for performing good actions.

How did it come into being?

Stepping into the limelight back in the 1960s, researchers were looking for a way to design a controller for a dynamic system. A dynamic system is one whose current state determines its next/future state and is guided by a set of rules. Any phenomenon (physical or statistical) whose value changes over time can be regarded as a dynamic system.

How then can a dynamic system be controlled? This leads us to the concept of a feedback loop just like in traditional engineering. To control the outcome of a process, a feedback mechanism is placed at the output to capture and send output values back to the controller at the input. The controller then increases or decreases input values depending on the type of feedback it receives (negative or positive feedback).

"As Neural Networks adjust weights and biases from one neuron to the other, reinforcement learning algorithms iteratively modifies the input parameters to satisfy a set output."

This is the foundational backbone of reinforcement learning. What makes it more interesting is the concept of punishment and rewards measured by the difference between the system output and the desired output.

Concepts of Reinforcement Learning

There are seven key concepts you need to understand. They include; goal, agent, environment, actions, rewards, episodes and policies.

A Goal

When you hop into the back of a self-driving car, your goal is to roll up the car windows, turn on the air conditioning and watch your favourite netflix series with your back laid comfortably in the passenger seat. Or maybe not.

Anyway, the goal of a self-driving car is to move from point A to point B safely, with minimal or zero human input. This involves multiple actions such as staying in the proper lane, accelerating and decelerating at the right pace, keeping a certain time-based distance from the vehicle in front to avoid rear-end collision, turning indications and a host of other actions (more on actions later in this article).

Every single action the vehicle takes is a step to achieve a particular goal.

An Agent

An agent is the entity being trained. In our self-driving scenario, the car itself is the agent. It is the duty of the agent to carry out actions by following policies that define each action.

The Environment

The environment is a world where the agent interacts. Again with our analogy, the road, crosswalks, traffic signs etc are all part of the environment.

How does the agent know about the environment?

Well it identifies it's position in the environment through what is called a state. The state of an agent at any point in time refers to the portion of the environment known to the agent. Just like humans can determine where they are by using their sense of smell, sight, or touch, an agent can use sensors such as cameras or GPS to determine where they are in the environment as well.

Actions

An agent performs "actions" in the environment. Once the agent knows the state of the environment, it can proceed to perform actions such as moving back and forth or activating something.

Actions that can be performed by the car (agent) includes braking, steering, turning the indicators on or off, lane change, acceleration and much more.

Rewards

When the agent does well, it is given a reward. Think of it as a sort of feedback relayed to the agent about how the actions it has perfomed contributed towards achieving the ultimate goal. In practical scenarios, the reward is numerical, not cookies. Say the car accelerates when passing by a slower vehicle, it could be given a +50 reward. On the other hand, if the car runs a red light, it could be given a -80 reward (or punishment). The better the agent perfoms, the higher the numeric value it achieves.

Rewards have a big impact on the future actions of an agent. An agent will do all it takes to get higher value reward after each action.

Episodes

Reinforcement learning continues through so many cycles, and each cycle is referred to as an episode. A single episode is a full cycle where an agent identifies it's position in the environment, performs an action and receives feedback in form of a reward which informs its future actions.

Policies

A policy, just as the name implies, is simply a course of action. For every given state of an agent, the policy defines the next course of action of the agent. The two major policy types to note are: deterministic and stoichastic policy types.

A deterministic policy is typically used when an agent fully understands the environment and repeatedly performs a particular action at a given state. Here, there is a direct relationship between the agent's state and the action it performs.

A stoichastic policy on the other hand is more probability focused. That is, the action taken is based on a probability distribution

Wrap-Up

Reinforcement learning has seen a wide range of applications in robotics manipulation, gaming, natural language processing, healthcare, self-driving cars and other notable areas. With the knowledge of these seven key concepts, you've got the basic foundation to start taking a deeper dive into what reinforcement learning entails and how to build your next model using this algorithm.