Monte Carlo Visual Localization: localizing a robot with particles

1. Introduction

One of the key challenges in mobile robotics is localization: knowing exactly where the robot is within its environment. In this project, I tackled this problem using a particle filter, combining odometry and computer vision.

The particle filter allows the robot, starting from a random distribution of possible positions, to gradually adjust its estimation until it converges to its true location on the map.

2. What is a particle filter?

A particle filter (or Monte Carlo Localization) is a probabilistic method that represents the robot's belief about its position using a cloud of particles. Each particle is a hypothesis about where the robot could be (position and orientation) and has a weight that indicates the likelihood of that hypothesis.

The algorithm runs in a loop through three main phases:

Particle motion: particles move according to the robot's estimated movement (via odometry).
Weight update: each particle is evaluated by comparing the robot's sensory perception (vision) with what would be expected at that particle’s location.
Resampling: new particles are generated, favoring those with higher weights.

Through this cycle, particles gradually converge toward the robot's true position.

In the following image, each phase of the algorithm can be observed in detail. This image was obtained from here.

3. My approach and design choices

3.1. Particle initialization

Particles were randomly distributed across the entire map, including inaccessible areas (black regions). Although it would be possible to filter them so that they only initialize in valid zones (white regions), I deliberately chose not to filter them. This allowed me to visually observe the convergence process of the particle filter, as particles in impossible zones gradually disappear during resampling.

This design decision helped me better appreciate how the algorithm corrects erroneous hypotheses over multiple iterations.

3.2. Particle motion

Particles move based on the robot’s odometry, which estimates how far and in which direction the robot has moved between two time steps. Even though odometry is inherently noisy, I relied on the fact that the resampling process and the generation of new particles around the best hypotheses would compensate for this noise and allow the system to converge. It's also worth mentioning that I don't rely on the full odometry history, as the error increases over time. Instead, I only use the relative motion between two consecutive time steps.

The motion model simply translates and rotates the particles following the robot’s movement, keeping the cloud of particles aligned with the robot’s estimated path. Additionally, to convert from real-world coordinates to map coordinates, I relied on the functions provided by the exercise. However, I also took several measurements based on the robot's odometry and the number of pixels it had moved in the map. Using least squares, I computed the conversion factor between meters and pixels, concluding that 1 pixel in the map corresponds to 0.01 meters according to my calculations.

3.3. Observation model: visual comparison

The observation model in my implementation relies on the texture of the ceiling. The robot’s camera points upward, capturing an image of the ceiling’s texture, which is then compared against a color map of the environment.

For each iteration, I used the central pixel of the robot’s camera image (representing what is directly above the robot) and compared it to the ceiling texture at each particle's position. The smaller the difference between these two pixels, the higher the weight assigned to that particle.

I couldn’t directly compare the robot’s image with the map’s texture because I didn’t know the exact scale relationship between the map’s pixels and the robot’s camera image. For this reason, I simply decided to use the central pixel of the robot's camera image as the observation, following the approach used in Dellaert et al., 1999【Dellaert et al., "Using the CONDENSATION algorithm for robust, vision-based mobile robot localization"】.

To compute the weight of each particle, I used an exponential function that maps color differences into a semi-probability space:

$w_i = \frac{e^{-\lVert z_{robot} - z_{particle} \rVert}}{e^0}$

Where:

$w_i$ is the weight of particle $i$ .
$z_{robot}$ is the color of the robot's central camera pixel (normalized between 0 and 1).
$z_{particle}$ is the color of the ceiling texture at the particle's location (also normalized).
$\lVert \cdot \rVert$ is the Euclidean distance between the two RGB color vectors.

This method transforms color differences into a meaningful weight that represents the likelihood of each particle.

Note: The denominator was used as a normalizer, and it is not exactly 1 due to approximation errors and other issues related to using NumPy for these functions. Also at first, I was using the Euclidean distance, but since the gradients in the map's color transitions are quite smooth, I decided to switch to the fourth norm to penalize small variations more strongly. I will discuss this issue in more detail in the reflections and future improvements section.

3.4. Resampling: hybrid method (roulette + low-variance inspired)

For the resampling step, I used a hybrid method that combines roulette wheel selection with a low-variance inspired approach. While the selection is proportional to particle weights (as in roulette), this time it is introduced a dynamic threshold ( $\beta$ ) and an iterative selection process that resembles low-variance resampling, reducing the variance of the resampling step.

The core formula of the method is:

$β_{i} = 2 \cdot \max (w) \cdot {rand}_{i}$

where:

$\max(w)$ is the maximum particle weight.
$\text{rand}_i$ is a random value for each iteration.

For each particle:

A random index is selected.
Its weight is compared against $\beta$ .
If the weight is lower than $\beta$ , $\beta$ is reduced, and the index moves circularly to the next particle.
New particles are generated from the selected ones, adding random noise to maintain exploration.

This hybrid resampling method allows me to favor high-weight particles while ensuring diversity in the population, achieving faster convergence than the standard roulette wheel alone and preventing premature particle collapse.

In the following video, you can see a demonstration of the implemented particle filter in action:

Back-up link

Note: It is interesting to observe how, at first, the particles gradually converge towards areas where the robot is most likely to be, based on the observations it receives. Although it may seem that they have reached a single, precise position (min 0:24) , a sufficiently informative observation is still needed for one particle to stand out clearly and for the estimation to converge definitively (transition from the red zone to the yellow one, min 0:54). One interesting behavior observed in the video is that, after the particle filter has correctly converged to the robot’s position and orientation (min 0:54), it may begin to diverge. This is reflected by the particles spreading out and the estimated mean fluctuating significantly. However, after a few iterations, a new observation allows the algorithm to recover and converge again (min 1:08).

4. Results and observations

During the experiment:

Particles started spread across the entire map, including impossible zones (black pixels at the map).
As the robot moved and gathered visual observations, the particles adjusted their weights, and those in low-probability zones disappeared, while the new genterated ones began to concentrate in areas with high similarity to the robot’s observations
After several iterations, the robot obtained an observation that allowed the particle filter to discriminate between the previously formed high-probability regions, leading it to converge onto a single region that accurately represented the robot’s true position.

This behavior clearly demonstrates how the particle filter can correct initial errors and sensor uncertainties, converging progressively to a robust position estimate.

5. Reflections and future improvements

Filter out inaccessible areas: although I chose not to filter particles initially, this could be implemented to improve efficiency.
Enhance the observation model: currently, I only use the central pixel; extending this to multiple pixels or using color histograms could improve robustness.
Try alternative resampling methods: while low-variance works well, it would be interesting to compare it with systematic or stratified resampling methods.
Incorporate an observation model that also considers the robot’s orientation. For instance, integrating a LiDAR sensor could enhance the estimation by providing more accurate data about the robot’s surroundings and heading.
Use a more robust observation model than color. While color can help differentiate between clearly distinct regions, comparing similar colors becomes more challenging.

Although I experimented with the HSV color space, it resulted in worse performance compared to RGB. This is likely due to the circular nature of HSV, where the red hue appears at both the lowest and highest ends of the scale, making color comparisons less reliable.

6. Conclusion

This project was a great opportunity to deepen my understanding of probabilistic localization and particle filters. By combining odometry and visual data, the robot can estimate its position on the map in a robust way, despite sensor noise and uncertainties.

Buscar este blog

Learning robotic