Depth and Camera Pose
- Depth (d): The distance of a 3D point from the camera.
 - Camera Pose (T): The position and orientation of the camera in the world (usually represented as a 4×4 transformation matrix, combining rotation and translation).
 
Projection (π) and Backprojection (π⁻¹)
- Projection (π):Maps a 3D point (in camera coordinates) to 2D pixel coordinates.
 - Backprojection (π⁻¹): Given a pixel (u, v) and its depth d, reconstructs the 3D point in camera coordinates.
 
Step 1: Backprojection (π⁻¹)
Given a pixel (u, v) in image Iᵢ and its depth d, we compute its 3D coordinates in the camera frame of Iᵢ:
- For a pinhole camera:
where are focal lengths and is the principal point.  
Step 2: Transform from Frame i to Frame j (Tᵢⱼ)
The 3D point 
where:
= camera-to-world transform for frame i, = camera-to-world transform for frame j, = relative transform from frame i to frame j. 
The transformed point is:
Step 3: Projection (π)
Project 
For a pinhole camera:
Step 4: Compute Optical Flow
The flow vector for pixel (u, v) is:
Intuition
- Static Scene Assumption: Only the camera moves; the world is static.
 - Flow Depends on Depth: Closer points (small d) induce larger flow (more motion in the image), while distant points (large d) induce smaller flow.
 - Flow Depends on Camera Motion: The direction and magnitude of flow depend on how the camera moves (rotation vs. translation).
 
Example
Given:
- Two images
 - Camera intrinsics: fx = fy = 500, cx = cy = 320.
 - Camera poses:
- Tᵢw = identity (camera i is at world origin).
 - Tⱼw = translation of (0.1, 0, 0) (camera j moves right by 0.1m).
 
 - Pixel (u, v) = (320, 320) ( Assume center of the image).
 - Depth d = 1m.
 
Step 1: Backprojection
Step 2: Transform to Frame j
Step 3: Projection
Step 4: Flow Vector
Interpretation:
- The pixel moved left by 50 pixels because the camera moved right.
 - If depth d were larger (e.g., 10m), the flow would be smaller (-5, 0).
 
Dense Flow Field
Instead of computing flow for one pixel, we compute it for all pixels (u, v) in a grid (h × w), given a dense depth map d. This gives a flow field:
Why is this useful?
- For visual odometry: Estimate camera motion by comparing predicted flow (from depth and pose) with observed flow.
 - For depth estimation: If camera motion is known, we can estimate depth from flow.