Robot Localization Using AprilTags

Introduction

In this blog post, I will explain how I implemented a robot localization system using AprilTags. By leveraging a series of roto-translations, we can determine the robot's position relative to the world frame.

The transformation sequence used is:

world2robot = world2apriltag * apriltag2apriltagopticalframe * apriltagopticalframe2cameraopticalframe * cameraopticalframe2camera * camera2robot

This approach allows us to accurately estimate the robot's pose by detecting AprilTags in the environment and using known transformations between different coordinate frames.

Coordinate Systems

Understanding the coordinate frames is crucial:

Standard frames: X-axis points forward, Y-axis to the left, and Z-axis upwards.
Optical frames: X-axis points to the right, Y-axis downward, and Z-axis forward.

This distinction is crucial, as libraries processing image data typically operate in an "optical frame" convention, requiring proper transformations to align the poses correctly.

Deep Dive

The localization process follows these steps:

Capture an image using the robot’s camera.
Detect AprilTags within the image using a tag detection algorithm provided by the library pyapriltags.
Estimate the pose of the detected AprilTags using the OpenCV function "solvePnP", which provides the rotation and translation vectors.
Convert the estimated pose into a homogeneous transformation matrix.
Apply the sequence of transformations to compute the final robot pose in world coordinates.

When no AprilTags are detected, we use the robot's odometry. Since we know that this sensor accumulates error over time, we do not rely directly on the odometry data it provides. Instead, we consider only the variations in odometry during the iterations when no AprilTags are detected and add them to the last known position obtained from the AprilTags.

Finally, when multiple AprilTags are detected, I only use the one closest to the camera, as the farther they are, the higher the error in the estimated position. You can find more information about this here.

The matrices

An example of a roto-translation matrix would be world2tag:

Where $t_x, t_y, t_z$ represent the coordinates of the tag in the world, and θ represents the angle with respect to the z-axis, indicating its orientation. With this, the first matrix is fully defined.

The roto-translation matrix that converts world coordinates to optical coordinates is composed of the multiplication of two transformations:

A first rotation of -90 degrees around the z-axis.
A second rotation of -90 degrees around the x-axis.

The resulting transformation is what allows us to convert from world coordinates to "optical" coordinates, and its inverse is used to transform from optical coordinates back to world coordinates.

The transformation matrix from the camera to the tag is obtained using solvePnP from OpenCV, which provides the rotation and orientation of the tag with respect to the camera. However, what we actually need is the transformation from the tag to the camera, so we simply need to invert this matrix.

The camera2robot transformation matrix is obtained by applying a simple translation, considering the position of the camera relative to the robot's base. This can be easily extracted from the .sdf file that defines it.

Another way to obtain this roto-translation matrix is by using the ROS2 transform system, allowing us to verify that our matrix is correct. This would be the transformation from the robot to the camera, so we need to compute the inverse for our intended use.

Finally, the robot's position in x, y, z in world coordinates will be obtained from the last column of the world2robot roto-translation matrix. However, its orientation (yaw) is more complex to determine, requiring the following formula:

$ \text{pitch} = \arctan\left(-World2robot[2,0], \sqrt{World2robot[0,0]^2 + World2robot[1,0]^2} \right) $$ \text{yaw} = \arctan\left(\frac{World2robot[1,0]}{\cos(\text{pitch})}, \frac{World2robot[0,0]}{\cos(\text{pitch})}\right) $

Demo

Conclusion

Using AprilTags and a sequence of roto-translations, we can achieve accurate robot localization, making it a robust and cost-effective method for estimating positions with high precision. This approach is particularly useful in environments where GPS is unavailable, such as indoor settings or areas with significant signal interference.

AprilTag-based localization is widely applicable in robotics, including autonomous navigation in warehouses, precise docking of robots to charging stations, and augmented reality applications for tracking real-world objects. By leveraging well-defined transformations, accurate pose estimations can be obtained and further enhanced with sensor fusion techniques.

Future improvements could involve integrating filtering techniques like Kalman filters or particle filters to improve accuracy and smooth localization results, ensuring more reliable positioning in dynamic environments.

Buscar este blog

Learning robotic