Camera Intrinsics and Extrinsics

Created: December 20, 2023

Modified: December 26, 2024

Intrinsics Matrix

A camera's intrinsics matrix K is used to project points in the camera's 3D reference space to points on the image frame. The inverse K^{-1} can transform pixel values back to 3D space.

K = \begin{bmatrix} f_x & 0 & o_x \\ 0 & f_y & o_y \\ 0 & 0 & 1 \end{bmatrix}

where f_x and f_y represent the camera's focal length in pixels^[If pixels as a unit of focal length seems unusual, we'll address this directly.], and o_x and o_y are the offsets in pixels from the top left corner of the image to the center of the image frame.

Focal Length

An astute observer may realize that what we've called "focal length" has an x and y component. This may seem strange given that focal length f as commonly understood is a scalar value in millimeters representing the distance from the center of a camera's lens to the focal point F. However, the model that we're using generalizes to non-square pixels. If we say that s_x and s_y are the number of pixels per millimeter of a physical camera sensor in their respective dimensions, then f_x = f \cdot s_x and f_y = f \cdot s_y.

Transforming from \mathbb{R}^3 to \mathbb{R}^2

For a point p where

p = \begin{bmatrix} x \\ y \\ z \end{bmatrix}

we can get a point p' by multiplying by K.

p' = K p

To get the image (u,v) coordinate we care about, we have to homogenize p'.

\begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \frac{p'}{p'_3} = \begin{bmatrix} p'_1\ /\ p'_3 \\ p'_2\ /\ p'_3 \\ p'_3\ /\ p'_3 \end{bmatrix}

In practice, you may have to truncate p'_1\ /\ p'_3 and p'_2\ /\ p'_3 to get integer pixel values.

Transforming from \mathbb{R}^2 to \mathbb{R}^3

If we know K^{-1}, we can map from pixel space to 3D space. For a pixel point (u,v), multiplying with K^{-1} yields our point in the camera's 3D reference space.

K^{-1} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \begin{bmatrix} x \\ y \\ z \end{bmatrix}

Extrinsics Matrix

TODO