top of page

Reinforcement Learning Environment for a 6DoF Robotic Arm in Gazebo and ROS2 

This repository describes a custom RL environment where a robotic arm is included. This environment consists of a custom state space,  actions, and rewards. Additionally, a visual target point is included.  The goal is that you can use this environment to test your RL algorithms. All the code and configuration files can be found and downloaded in my Github Repository.

​

  • Github Repository (LINK)

reinforcement-learning-fig1-700.jpg

Outcomes after this section

  • Custom RL environment

  • Generate  action to move the robotic arm

  • Simulate your robotic arm in Gazebo

  • Visualize your robotic arm in RVIZ 2

2

Prerequisites

In order to succeed with this tutorial, please make sure you have installed and completed the following points:

​

  • Ubuntu 20.04

  • Full installation of ROS2 Foxy

  • ros2_control and gazebo_ros2_control packages (instruction in our previous tutorial section 4 LINK

​

3

About Reinforcement Learning

Reinforcement learning problems are formally modelled as a Markov Decision Process (MDP). A finite MDP is described as a five-tuple〈S, A,T, R, γ 〉where S is a finite set of states, A is a finite set of possible actions, T is a transition function (i.e. the probability that action a ∈ A in state s ∈ S will lead to state s'), R is a reward function (i.e the reward received after transitioning from state s to state s'), and finally γ ∈ [0,1] denotes a discount parameter. The goal of a MDP is to find the best action or decision, for each state to maximize future rewards. In other words, the aim is to find the optimal policy π* which describes in detail what action must be taken in which state, in order to maximize the reward for any given state. 

4  Run the Environment

Assuming you come from our previous tutorial (LINK) and installed all the necessary packages, cloned our repository and successfully built the packages in your workspace, we are going to run our custom RL environment so you can see what we will get with this repository. 

​

Therefore, after building this repository, open a new terminal and write:

​

cd ros2_ws

. install/setup.bash

ros2 launch my_environment_pkg my_environment.launch.py

​

If the installation was correct and you did not get any error, Gazebo will be open and you should be able to see your robotic arm along with a green sphere (target point). The launch file will load and start the controllers, load and spawn the robot and the sphere and finally load all the necessary configurations. You can find the launch file here (LINK).

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

It is time to activate our RL environment, so open a new terminal and run the following commands.

​

cd ros2_ws

. install/setup.bash

ros2 run my_environment_pkg run_environment

​

You will see how the robot will start to move (with random action) until a certain number of steps are finished. After that, the environment will reset; consequently, the robotic arm will return to its home position, and the sphere will move to a new location. This will happen until the number of episodes concludes.

​

​

​

​

​

​

 

 

 

 

 

You can see and modify the number of steps and episodes and seeing the reward value, states and actions took, in the python file run_environment.py (LINK).  In this file, you can add your RL algorithm and train the robot to perform specific actions according to your needs.

 

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

 

​

​

5  Characteristics of the environment

 

Each of the components that make up this RL environment can be modified or adapted to your need.  You can change the environment in the main_environment.py (LINK)

 

The current version is confirmed by: 

  • state-space = [Robot End-Efector Position (x,y,z), Joint State (Joint1,Joint2,Joint3,Joint4,Joint5, Joint6), Target Position (x,y,z)]

  • action-space = [position  Joint1, position Joint2, position Joint3, position Joint4, position Joint5, position Joint6]

  • Reward = The agent (robot) will receive a reward of -1 for each action step but if reach the goal (the green sphere) the reward value will be 100

rl_sim.png
esc.png

Comments?

Any questions or comments about this post will always be welcome. If you have something to say, please do not hesitate to contact me.

Thanks for submitting!

bottom of page