Mobile Robotics Mid-Semester Q&A Compilation
NOTE: AI AI CAPTAIN
Q1: Warm-up Table
Question
Fill up the following table by indicating the quantities that are known, to be estimated, or unknown, and the type of measurements that are needed.
Solution Table
Problem | Structure (Scene Geometry) | Motion (Camera Parameters) | Measurements |
---|---|---|---|
F-matrix estimation | Unknown | To be Estimated | 2D-2D features |
Camera calibration | Known | To be Estimated | 2D-3D features |
Triangulation | To be Estimated | Known | 2D-2D features |
Stereo rectification | Unknown | Known & To be Estimated | Unknown |
PnP (Perspective-n-Point) | Known | To be Estimated | 2D-3D features |
Bundle adjustment | To be Estimated | To be Estimated | 2D-3D features |
Q2: Transformations
Question (i)
Derive the expression for
Solution (i)
We know that the transformation from a frame to itself is the identity matrix. Therefore, the product of a transform and its inverse must be the identity matrix:
Expanding this with the block matrix form gives:
From the block matrix multiplication, we get two key equations:
- For the rotation part:
. This implies that . Since rotation matrices are orthogonal, the inverse is equal to the transpose, so . - For the translation part:
.
Solving the second equation for the unknown translation vector
Combining these results, the final expression for the inverse transformation is:
Question (ii)
Consider the figure below, where {W} is the world frame and {C} is the camera frame. The Z-axis of {W} comes out of the plane, while the Y-axis of {C} goes into the plane. Find (a)
Solution (ii)
(a) Find the Rotation Matrix
To find the rotation matrix that describes the orientation of frame {C} with respect to frame {W}, we express the unit axes of {C} in {W}'s coordinates.
From the diagram:
- The X-axis of {C} (
) points along the Y-axis of {W} ( ). So, . - The Y-axis of {C} (
) points into the plane, which is along the negative Z-axis of {W} ( ). So, . - The Z-axis of {C} (
) points along the negative X-axis of {W} ( ). So, .
Stacking these as column vectors gives the rotation matrix:
(b) Find the YXZ-Euler Angles
The YXZ-Euler angle representation corresponds to the matrix product
So, the YXZ-Euler angles are
( c ) Find
The origin of frame {C} is located at
Finally, we construct the full transformation matrix
Q3: Single-View Geometry and Reconstruction
Question 3.1
Given a camera matrix P, detail how you can obtain the camera center C and the rotation matrix R without knowing the intrinsic parameter matrix K.
Solution 3.1
The camera matrix P is a
-
Finding the Camera Center: The camera center C is the point in 3D space that projects to the null-space of P. That is,
. Since P is , its null-space is one-dimensional, so C is uniquely determined up to a scale factor. -
Finding the Rotation Matrix: Without knowing K, we cannot directly recover R. However, we can use the fact that the first
submatrix of P, let's call it , is the product of an upper-triangular matrix (K) and an orthogonal matrix (R). We can use a QR decomposition on . If , where is orthogonal and is upper-triangular, then . From this, we can identify and .
Alternatively, we can use the property that
This equation relates the known matrix
Question 3.2
State and justify the cases when the 3D reconstruction obtained from two views is (a) Unambiguous (b) Up to an unknown scaling factor (c) Up to an unknown projective transformation.
Solution 3.2
(a) Unambiguous (Euclidean) Reconstruction:
Reconstruction is unambiguous (up to a global rigid motion) when the intrinsic matrices of both cameras (
(b) Up to an Unknown Scaling Factor (Metric Reconstruction):
Reconstruction is determined up to a scale factor when the intrinsic matrices (
( c ) Up to an Unknown Projective Transformation:
Reconstruction is only determined up to a projective transformation when the intrinsic (
Q4: Essential Matrix
Question
Two cameras fixate on a point P in 3D space such that their optical axes intersect at this point. Show that the
Solution
When a camera fixates on a point, its optical axis passes through that point. The optical axis intersects the image plane at the principal point. If both cameras fixate on the same 3D point X, it means that X is imaged onto the principal point of each camera.
In normalized image coordinates, the principal point is at the origin
The epipolar constraint is defined by the Essential matrix E as:
Substituting the coordinates of the principal points into this equation:
Performing the matrix multiplication:
This simplifies directly to:
Thus, the (3,3) element of the Essential matrix must be zero.
Q5: Homography
Question
Suppose a camera with intrinsic matrix K rotates about its optical centre by a rotation matrix R. (a) Show that its two views are related by a homography H such that
Solution
(a) Derivation of the Homography
Let a point in 3D space be
Since the camera only rotates, the translation is zero. The 3D point
The projection of the same 3D point
Now, we substitute the expression for
Since
Dropping the scalar depth constants
We define the homography matrix
Thus, the two image points are related by the homography
(b) Homography for Rotation
If the camera is rotated again by the same rotation R (corresponding to an angle
Substituting the expression for
The total rotation from the first view to the third view is
Q6: Dense Visual Odometry
Question
The photometric error for Dense-VO is given as
Solution
(a) The Warping Function
The function
-
Back-projection to 3D: First, we lift the 2D pixel
(in homogeneous coordinates) back into a 3D point in the first camera's coordinate frame. This is done by inverting the pinhole camera projection equation, using the known intrinsic matrix K and depth . -
Transform to Second Camera Frame: Next, we transform this 3D point
into the coordinate system of the second camera using the rigid body motion . -
Re-projection to 2D: Finally, we project the transformed 3D point
back onto the image plane of the second camera using the intrinsic matrix K.
Combining these steps gives the full mathematical expression for the warping function
(b) Nature of the Photometric Error and its Solution
The photometric error is the difference in intensity values between a pixel in the first image and its corresponding warped location in the second image. This error function is non-linear and non-convex with respect to the motion parameters R and t.
Because it's a non-linear optimization problem, we cannot solve for the best camera motion
Q7: Bundle Adjustment
Question
What is Bundle Adjustment? Describe its objective function mathematically. Explain the structure of the Jacobian matrix used in its optimization.
Solution
Bundle Adjustment (BA) is a large-scale non-linear optimization problem that is fundamental to 3D reconstruction tasks like Structure from Motion (SfM) and SLAM. Its primary purpose is to simultaneously refine the 3D coordinates of all map points and the parameters of all camera poses (position and orientation) to achieve a globally consistent and optimal reconstruction. The name comes from the "bundles" of light rays that project from each 3D point to each camera center, which are adjusted to minimize error.
The Objective Function
The goal of Bundle Adjustment is to minimize the total reprojection error. This error is the sum of squared distances between the observed 2D feature locations in the images and the predicted 2D projections of the 3D map points based on the current camera pose estimates.
Mathematically, if we have
Where:
represents the parameters of the -th camera (rotation and translation ). represents the coordinates of the -th 3D point. is the observed 2D location of point in image . is the projection function that maps the 3D point into image using camera parameters . is a binary variable that is 1 if point is visible in camera , and 0 otherwise.
The Jacobian Structure
This non-linear least squares problem is solved iteratively using methods like Levenberg-Marquardt. This requires computing the Jacobian matrix of the residuals. The state vector is partitioned into two sets of variables: the camera parameters and the 3D point coordinates.
The Jacobian matrix has a very specific sparse block structure. For a single residual term
This results in a Jacobian matrix that looks like this:
- The matrix
(derivatives with respect to camera parameters) is block-diagonal. - The matrix
(derivatives with respect to point parameters) is also sparse.
This sparse structure is critical because it can be exploited by specialized solvers (like the Schur complement trick) to solve the large linear system required at each iteration much more efficiently than a dense solver could.
Q8: EKF-SLAM
Question
Describe the key steps of the EKF-SLAM algorithm. What are the main challenges or limitations of this approach?
Solution
Extended Kalman Filter (EKF) SLAM is a classic probabilistic approach to solving the SLAM problem. It maintains a state estimate as a single multivariate Gaussian distribution, represented by a mean vector
The algorithm operates in a two-step prediction-correction cycle:
-
Prediction Step: When the robot moves, a motion model
is used to predict its new state. This step propagates the robot's uncertainty, causing the covariance to grow. - Predict Mean:
- Predict Covariance:
Here,is the control input, is the Jacobian of the motion model with respect to the state, and is the motion noise covariance.
- Predict Mean:
-
Correction (Update) Step: When the robot observes a landmark, the measurement
is used to correct the state estimate. This step reduces uncertainty. - Compute Kalman Gain:
- Update Mean:
- Update Covariance:
Here,is the observation model, is its Jacobian with respect to the state, and is the measurement noise covariance. The term is the innovation or measurement residual.
- Compute Kalman Gain:
Challenges and Limitations of EKF-SLAM
-
Computational Complexity: The covariance matrix
is of size for a 2D robot with N point landmarks. Updating this matrix at each step requires computation. This quadratic complexity makes the standard EKF-SLAM algorithm computationally infeasible for large-scale maps with many landmarks. -
Data Association: The algorithm assumes that correspondences between observations and landmarks are known. In practice, determining which landmark an observation corresponds to (the data association problem) is very difficult. An incorrect association can be catastrophic, leading to a corrupted map and a lost robot.
-
Linearization Errors: The EKF relies on a first-order Taylor series approximation (linearization) of the typically non-linear motion and observation models. If the models are highly non-linear or the uncertainty is large, this linearization can introduce significant errors, potentially causing the filter to diverge and become inconsistent.
-
Gaussian Assumption: EKF represents the state with a single, unimodal Gaussian distribution. It cannot represent multi-modal beliefs, which can arise in ambiguous situations (e.g., seeing two identical-looking corridors). If the true belief is not Gaussian, the EKF's estimate can be a poor approximation.
Q9: Image Formation
Question
Derive the pinhole camera model. Show the full relationship between a 3D world coordinate and a 2D pixel coordinate, explaining the role of the intrinsic and extrinsic parameter matrices.
Solution
The pinhole camera model is the simplest mathematical model for describing how a 3D scene is projected onto a 2D image plane.
Ideal Pinhole Model and Similar Triangles
Imagine a camera as an ideal box with a tiny aperture (a pinhole) on the front and an image plane at the back. A light ray from a 3D point
Using similar triangles, we can derive the coordinates
From Ideal to Pixel Coordinates: The Intrinsic Matrix
The ideal coordinates
- Scaling by pixel size: The focal lengths
and (in units of pixels per meter) scale the metric coordinates. - Offsetting by the principal point: The origin of the pixel coordinate system is usually at the top-left corner, while the optical axis intersects the image plane at the principal point
.
This relationship can be expressed in matrix form using homogeneous coordinates:
The
From World to Camera Coordinates: The Extrinsic Matrix
A 3D point is usually defined in a world coordinate system, not the camera's. We need to transform the world point
This can be written in homogeneous coordinates using a
The Full Camera Projection Matrix
Combining all steps, we can write a single equation that maps a 3D point in the world frame directly to a 2D point in the image frame.
Let
Here,
Q10: Triangulation
Question
What is triangulation? Given two camera matrices
Solution
Triangulation is the geometric process of determining the 3D position of a point by observing it from two or more different viewpoints with known positions and orientations. Given a pair of corresponding 2D points in two images, we can back-project them as rays into 3D space. The intersection of these rays reveals the location of the 3D point.
Mathematical Procedure
Let the two camera projection matrices be
where
Let
Only two of these equations are linearly independent. We select two from the first camera and two from the second camera to form a system of four linear equations in the four components of the homogeneous 3D point
where A is a
In practice, due to noise in the image point locations, the back-projected rays will not perfectly intersect. Therefore, the system
Q11: Recovering Rotation from Homography
Question
If a homography H is known to be induced by a pure camera rotation R, and the camera's intrinsic matrix K is also known, how can the rotation R be recovered?
Solution
When a camera undergoes a pure rotation R about its optical center, with no translation, the relationship between a point
This homography matrix
Our goal is to solve this equation for
Derivation:
- Start with the known relationship:
- Pre-multiply both sides by the inverse of the intrinsic matrix,
: - Since matrix multiplication is associative, we can regroup the terms on the right side:
- The product of a matrix and its inverse is the identity matrix,
: - Now, to isolate R, post-multiply both sides by the intrinsic matrix,
: - Again, using associativity and the property of the inverse:
This gives us the final expression for recovering the rotation matrix:
Intuition: This result shows that the homography matrix
Q12: Properties of the Fundamental Matrix
Question
What is the relationship between the Fundamental Matrix F and the epipoles
Solution
The Fundamental Matrix
Relationship 1: The Left Null-Space
The epipole in the second image,
Intuition and Derivation:
The epipolar line
The epipole
The condition for a point to lie on a line in homogeneous coordinates is that their dot product is zero. Thus, for any
Substituting the expression for
Using associativity of matrix multiplication:
This equation must hold true for all possible image points
This proves that
Relationship 2: The Right Null-Space
The epipole in the first image,
Intuition and Derivation:
Similarly, the epipolar line
The epipole
Substituting the expression for
Using the property of the transpose,
This equation must hold for all possible image points
This proves that
Q13: Lie Algebra for Robotics Optimization
Question
What are the Lie Group SE(3) and the Lie Algebra
Solution
Lie theory provides the mathematical foundation for representing and optimizing rigid body motions in a way that correctly respects their geometric structure.
The Lie Group SE(3): The Space of Poses
SE(3) is the Special Euclidean Group in 3D. It is the set of all
It is called a group because it satisfies the group axioms (closure under multiplication, identity element, inverse for every element, associativity). More importantly, it is a smooth manifold. This means it is a space that locally looks like a standard Euclidean space, but has a global curved structure. This curvature is the reason why standard vector addition is not a valid operation for poses; the sum of two transformation matrices is not, in general, a valid transformation matrix.
The Lie Algebra : The Space of Velocities
The Lie Algebra
An element
This vector can be mapped to a
Connecting the Spaces: Exponential and Logarithmic Maps
-
The Exponential Map,
, is the function that converts an element of the Lie algebra (a velocity vector ) into a finite transformation on the Lie group (a pose matrix ). Intuitively, it is equivalent to integrating a constant velocity for one unit of time to get a finite displacement . -
The Logarithmic Map,
, is the inverse operation. It takes a transformation matrix and returns the unique twist vector that would generate that transformation in one unit of time.
Role in SLAM Optimization
In optimization problems like SLAM or Bundle Adjustment, we need to iteratively update an estimate of a camera pose,
- The optimization algorithm (e.g., Gauss-Newton) computes an update step. This update step,
, is calculated in the local tangent space, which is a well-behaved 6D Euclidean vector space. - This small update vector
is then mapped from the Lie algebra back onto the manifold using the exponential map to become a small transformation matrix, . - This small transformation is then applied multiplicatively to the current pose estimate to get the new, refined pose:
This procedure guarantees that the updated pose