Depth without nonlinearity

Depth without nonlinearity is one matrix

Three networks on the same rings. Linear depth adds nothing; ReLU depth does.

Noise

0.15

Steps

1 linear layer

draws one straight line

accuracy—

5 linear layers

still just a straight line

accuracy—

5 layers + ReLU

a kink between each layer

accuracy—

Proof: W₅·W₄·W₃·W₂·W₁ = one 2→1 matrix

Multiplying the five weight matrices gives a single affine map. Running every point through it reproduces the 5-layer model exactly, because it is the same function.

collapsed weights (w₁, w₂ · bias)

—

max |logit difference|

—

Warming up…