Created: December 20, 2023
Modified: December 26, 2024
A camera's intrinsics matrix K is used to project points in the camera's 3D reference space to points on the image frame. The inverse K^{-1} can transform pixel values back to 3D space.
K = \begin{bmatrix} f_x & 0 & o_x \\ 0 & f_y & o_y \\ 0 & 0 & 1 \end{bmatrix}
where f_x and f_y represent the camera's focal length in pixels^[If pixels as a unit of focal length seems unusual, we'll address this directly.], and o_x and o_y are the offsets in pixels from the top left corner of the image to the center of the image frame.
An astute observer may realize that what we've called "focal length" has an x and y component. This may seem strange given that focal length f as commonly understood is a scalar value in millimeters representing the distance from the center of a camera's lens to the focal point F. However, the model that we're using generalizes to non-square pixels. If we say that s_x and s_y are the number of pixels per millimeter of a physical camera sensor in their respective dimensions, then f_x = f \cdot s_x and f_y = f \cdot s_y.
For a point p where
p = \begin{bmatrix} x \\ y \\ z \end{bmatrix}
we can get a point p' by multiplying by K.
p' = K p
To get the image (u,v) coordinate we care about, we have to homogenize p'.
\begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \frac{p'}{p'_3} = \begin{bmatrix} p'_1\ /\ p'_3 \\ p'_2\ /\ p'_3 \\ p'_3\ /\ p'_3 \end{bmatrix}
In practice, you may have to truncate p'_1\ /\ p'_3 and p'_2\ /\ p'_3 to get integer pixel values.
If we know K^{-1}, we can map from pixel space to 3D space. For a pixel point (u,v), multiplying with K^{-1} yields our point in the camera's 3D reference space.
K^{-1} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \begin{bmatrix} x \\ y \\ z \end{bmatrix}
TODO