Building your First Model on AWS Deepracer

Nickodemus R.
6 min readNov 14, 2020
Photo by Til Jentzsch on Unsplash

Gentleman, start your engine!

We see an overview of reinforcement learning and AWS Deepracer on my first post of this series. Then we look one of basic framework for reinforcement learning, which MDP on the last post. Today, we will learn on how to build our first reinforcement learning model using AWS Deepracer Console.

Reinforcement Learning Refresher

Reinforcement learning is one of machine learning algorithm where we build a model (an agent) that can learn by itself from its interaction with the environment to reach some goals. There are lots of example of reinforcement learning agent, some of them are:

  • AlphaGo agent which play Chinese Go game that can defeat Go Professional Player
  • Agent made by OpenAI which able to win Dota2 competition
  • Research on simulation of humanoid walker conducted by DeepMind
  • Self-driving car, one which we will talk more on this article
  • and many more to come …

Do you want to build reinforcement learning for a self driving car? Seems interesting right? But do you have all the resources, a car for experimentation, sensor, technical skill to hack your car and installed your agent in the car, maybe you have all of them like this guy, who build his own self-driving car experimentation. Although you have all the resources don’t forget also about the safety, regulations, etc. So maybe you need your own track to test your car. Maybe we want to start learn reinforcement learning and see the implementation on self driving car, but it seems a lot to prepare that just learn reinforcement learning algorithm, which is also hard by itself. Thankfully AWS has develop the learning environment which is called AWS Deepracer. It has fully autonomous 1/18th scale race car, different time of race track, and some cool competition to compete, like race against F1 racer. There are 3 types of race that we can do in AWS Deepracer:

  1. Time Trial: complete the lap (usually 3 laps) as fast as possible
  2. Object Avoidance: complete the lap as the car avoid object in road
  3. Head-to-head racing: race again another car on one track

For a note, AWS Deepracer has physical car that can run in a real track, but also virtual car that run on our computer. In this article I will focused to write about virtual car.

AWS Deepracer Competition, source: [https://aws.amazon.com/deepracer/]
virtual simulation, image by author

AWS Deepracer in a Glance

In AWS Deepracer our agent is the car. The environment is a track simulator that we choose. Than the agent will try to get maximum cumulative reward in order to completed the race.

Create The Car

First thing first, we need to build our car. The car will be the agent. When we look to our garage, there will be a predefined car. We can use it right away, or if we like, we can create new car by clicking ‘build new vehicle’ button. We need to define the sensor that the car will use.

  1. Camera: single camera in front side; good for time trial race
  2. Stereo camera: like we have two eyes to see, we mimic it on the car using right and left camera (stereo); good for object avoidance
  3. LIDAR (light detection and ranging) sensor: 360 degree laser detector; good for head-to-head competition, as we need not only to see what in front of the car but also other cars that surround it.
image by author
image by author

After we determine our sensor than we need to create some action space. It is possible action that agent can do in respond to the environment. Actually reinforcement learning action can be separated into discrete and continuous action. AWS Deepracer however currently use the discrete action. So, not like normal car that has continuous choice of speed like from 0 until 300 km/h, the Deepracer agent will only move according to some rules we specified. So what we can adjust is maximum speed, speed granularity, maximum steering angle, steering angle granularity. Based on our setting then the console will set the action list for our car.

Defining action space, image by author
Action list example, image by author

Create the Model

Create model option, image by author

After we have the car, now we are ready to create our reinforcement learning model. We need to choose the environment simulation, which is the track that we will use to train our model. There are lots of different track that we can choose. After we choose the track then we choose which race we want to train our model for. As mentioned above, there are three types of race that we can choose: time trial, object avoidance, head-to-head racing.

Some variety of track, image by author
choosing the race type. image by author

From the last article, we have learned that the goal of our agent is to get the maximum expected cumulative reward. So our next step is to define the reward function. We write the reward function using Python. We can give the reward based on speed, whether or not the car is on track, etc. The code snippet below show how we determine reward function following the center line of the track, the agent will get more reward if it close to the center line during the race. There are some template that we can use as our reward function.

Example of reward function, image by author
Reward function template, image by author

After we define our reward function, then we can adjust some hyperparemeter, such as

  • Gradient descent batch size: subset of experiment sample that we used on one iteration
  • Number of epochs: number of passing through training data before updating neural network weight and bias
  • Learning rate: control how much a gradient descent update the network weights
  • Entropy: degree of randomness, bigger entropy correspond to bigger exploration as oppose to exploitation
  • Discount factor: the factor to tell how much the future reward will be discounted
  • Loss type: determine which loss type we will use, Huber or mean square error. Typically, Huber loss is good to achieve convergence while mean square error can help to make the training time faster.
  • Number of experience episodes between each policy-updating iteration: determine how many episode the model make before updating its policy.
Hyperparemeter Tuning, image by author

Compete in a race

Finally we train our model. After train and evaluation is finished, we can submit our model to the virtual race. As a part of JML x AWS Deepracer Bootcamp, we have community race for bootcamp participants. Using some defaults value, I got 6th position on the race. As the session continue, hopefully we can learn more things to improve the model performance and get better result.

That’s the wrap up of this article. I hope now you will be able to build your AWS Deepracer Model. This month, there is one virtual race occur. So maybe you can try to put into practice what you have acquired from this article and enter the race and maybe we see each other on the circuit.

AWS DeepRacer League Championship

--

--

Nickodemus R.

Data science enthusiast who curious about technology and how it's implemented in real life.