Depth and Camera Pose
- Depth (d): The distance of a 3D point from the camera.
- Camera Pose (T): The position and orientation of the camera in the world (usually represented as a 4×4 transformation matrix, combining rotation and translation).
Projection (π) and Backprojection (π⁻¹)
- Projection (π):Maps a 3D point (in camera coordinates) to 2D pixel coordinates.
- Backprojection (π⁻¹): Given a pixel (u, v) and its depth d, reconstructs the 3D point in camera coordinates.
Step 1: Backprojection (π⁻¹)
Given a pixel (u, v) in image Iᵢ and its depth d, we compute its 3D coordinates in the camera frame of Iᵢ:
- For a pinhole camera:
where are focal lengths and is the principal point.
Step 2: Transform from Frame i to Frame j (Tᵢⱼ)
The 3D point
where:
= camera-to-world transform for frame i, = camera-to-world transform for frame j, = relative transform from frame i to frame j.
The transformed point is:
Step 3: Projection (π)
Project
For a pinhole camera:
Step 4: Compute Optical Flow
The flow vector for pixel (u, v) is:
Intuition
- Static Scene Assumption: Only the camera moves; the world is static.
- Flow Depends on Depth: Closer points (small d) induce larger flow (more motion in the image), while distant points (large d) induce smaller flow.
- Flow Depends on Camera Motion: The direction and magnitude of flow depend on how the camera moves (rotation vs. translation).
Example
Given:
- Two images
- Camera intrinsics: fx = fy = 500, cx = cy = 320.
- Camera poses:
- Tᵢw = identity (camera i is at world origin).
- Tⱼw = translation of (0.1, 0, 0) (camera j moves right by 0.1m).
- Pixel (u, v) = (320, 320) ( Assume center of the image).
- Depth d = 1m.
Step 1: Backprojection
Step 2: Transform to Frame j
Step 3: Projection
Step 4: Flow Vector
Interpretation:
- The pixel moved left by 50 pixels because the camera moved right.
- If depth d were larger (e.g., 10m), the flow would be smaller (-5, 0).
Dense Flow Field
Instead of computing flow for one pixel, we compute it for all pixels (u, v) in a grid (h × w), given a dense depth map d. This gives a flow field:
Why is this useful?
- For visual odometry: Estimate camera motion by comparing predicted flow (from depth and pose) with observed flow.
- For depth estimation: If camera motion is known, we can estimate depth from flow.