Mapping and 3D Representations

Before starting anything, a lot is taken from the one and only Cyrll Stachniss lectures. Link to the lecture .

Point Clouds

A point cloud is one of the most fundamental ways to represent a 3D environment. It is essentially a large collection of points where each point has an (x,y,z) coordinate.

These points are often generated by sensors like LIDAR, which measures distance (ρ), azimuth angle (θ), and elevation angle (ϕ). These raw spherical coordinates are then converted into Cartesian coordinates (X,Y,Z) to form the point cloud using the following sensor model:

Pasted image 20250919121618.png

[XYZ]=g(ρ,θ,ϕ)=[ρcos(θ)cos(ϕ)ρsin(θ)cos(ϕ)ρsin(ϕ)]

Operations on Point Clouds

The basic operations of translation and rotation can be combined. A common sequence to transform a point cloud Ps from a source frame {s} to a target frame {s} is: 1) translate, 2) rotate, and 3) scale.

For each point ps in the cloud, the new point ps is given by:

ps=SssRss(pstss)

Where:

Feature Extraction from Point Clouds

A raw point cloud is just a collection of coordinates.
To make it useful, we often need to extract higher-level features from it, like planes, spheres, or cylinders. This process of identifying geometric primitives is a form of segmentation.

Point Cloud Accumulation

A single LIDAR scan only provides a point cloud from one perspective. To build a complete map, multiple point clouds taken from different robot poses must be combined into a single, global reference frame {G}. This process is called accumulation. ( Usually we take a defined global frame, or just the first frame )

If we have a point cloud PL measured in the local LIDAR frame {L}, and we know the pose (position and orientation) of the LIDAR frame with respect to the global frame, we can transform point cloud into global coordinates, PG. This is done using a homogeneous transformation matrix LGT.

For each point pi in the local point cloud PL, the corresponding global point pi is calculated as:

pi=LGT pi

By applying this transformation to all points, we can merge multiple scans to create a dense, large-scale map of the environment.

Plane Fitting for Road Detection

A common task for an autonomous vehicle is to identify the road surface from a LIDAR point cloud. We can model the road as a simple plane and then fit this model to the point cloud data.

The equation of a plane in 3D is:

z=a+bx+cy

Our goal is to find the parameters x=[a,b,c]T that best fit the measured points. ( Think of this as a linear regression problem in 3D )

For a set of n points (xj,yj,zj) from the point cloud, we can define the measurement error for each point as the vertical distance between the measured zj and the value predicted by our plane model:

ej=zpredictedzmeasured=(a+bxj+cyj)zj

We can stack all n of these error equations into a single matrix form:

e=Axb

Where:

e=[e1e2en],A=[1x1y11x2y21xnyn],x=[abc],b=[z1z2zn]

Now to find the best-fit parameters x^, we use the method of least-squares to minimize the sum of the squared errors. The cost function is:

LLS(x)=eTe=(Axb)T(Axb)

To find the minimum, we take the partial derivative with respect to x and set it to zero:

LLS(x)x=2ATAx^2ATb=0

This gives us the normal equations:

ATAx^=ATb

We can then solve for the optimal parameters x^ by using an efficient numerical solver (ideally we use this in practice ) or by calculating the pseudo-inverse:

x^=(ATA)1ATb

This gives us the parameters [a,b,c] of the plane that best represents the road surface in the point cloud.

Voxel Grids

While point clouds are a direct representation of sensor data, they can be inefficient for tasks like collision checking.
Voxel grids address this by discretizing 3D space into a grid of volumetric elements, or voxels.

2.5D Maps / Height Maps

A common simplification of a full 3D voxel grid is a 2.5D map, often called a height map. Instead of storing a full 3D grid, a 2.5D map is a 2D grid where each cell stores a single height value, typically representing the highest point of an object within that cell's column.

Random Link: 2.5-D-Scene-Representation
A simple way to generate this is to average all the scan points that fall into a given (x,y) grid cell.

Pasted image 20250919145531.png

This representation is very memory efficient and provides constant-time access, but it's non-probabilistic and cannot distinguish between free and unknown space.

Elevation Maps

An elevation map is a more sophisticated, probabilistic version of a height map. Instead of just storing an average height, each cell stores a probabilistic estimate of the height, often updated using a Kalman filter. This allows the map to represent uncertainty, which typically increases with the measured distance from the sensor.

Multi-Level Surface (MLS) Maps

To overcome the single-level limitation of elevation maps, Multi-Level Surface (MLS) maps can be used. An MLS map allows each 2D cell to store multiple surface "patches". This enables the representation of complex vertical structures like bridges and underpasses.

Each patch in a cell consists of:

TODO: Add more here, conversion from LIDAR to MLS .

Octrees (OctoMap)

While voxel grids are powerful, their memory usage is a significant drawback, especially in 3D where most cells are empty.
The Octree is a hierarchical data structure that efficiently addresses this problem by only allocating memory for volumes as needed.

An Octree works by recursively subdividing 3D space. A root node represents the entire volume. If this volume is not homogeneous (i.e., not entirely occupied, free, or unknown), it is subdivided into eight children nodes ("octants"), each representing a sub-volume. This process continues until a minimum voxel size is reached or a node's volume is homogeneous.

TODO: Probabilistic map updates, Kalman Filter pre-context and Bayes pre-context for adding more content.

Signed Distance Functions (SDF)

A Signed Distance Function (SDF) is way to represent geometry implicitly. An SDF is a function f(p):R3R that, for any point p in space, returns the shortest distance to a surface.

For example, the SDF for a sphere of radius r centered at the origin is:

f(p)=||p||r

A key property is that the gradient of the SDF, f(p), is a unit vector that always points in the direction of the closest point on the surface. This is extremely useful for optimization-based algorithms in motion planning and reconstruction.

Occupancy Grids

An occupancy grid is a specific type of voxel grid where each cell mi stores the probability P(mi) that the corresponding region of space is occupied. This is powerful because it explicitly models uncertainty and distinguishes between occupied, free, and unknown space.

The map is updated sequentially using a Bayesian filter. Given a new measurement zt, we want to update the probability of a cell mi being occupied. Using Bayes' rule, the update is:

P(mi|z1:t)=P(zt|mi,z1:t1)P(mi|z1:t1)P(zt|z1:t1)

This can be computationally expensive. A more efficient way to handle this is to use the log-odds representation. The odds of a cell being occupied are P(mi)1P(mi), and the log-odds are:

l(mi)=logP(mi)1P(mi)

The update then becomes a simple addition:

lt(mi)=lt1(mi)+linverse_sensor_model

Here, linverse_sensor_model is a pre-calculated log-odds value based on the sensor model. A positive value increases the belief of occupancy (e.g., for the cell where the beam ends), and a negative value increases the belief of the cell being free (for cells the beam passes through). This avoids costly multiplications and makes the map update very fast.

Probabilistic Map Update (Deeper Dive)

The occupancy of each voxel in a grid, mi, is updated using a recursive binary Bayes filter. The belief Bel(mi) is updated based on a new measurement zt.

The update rule in log-odds form is efficient and avoids numerical instability near probabilities of 0 or 1.

lt(mi)=lt1(mi)+linverse_sensor_model