Why activations matter

Two networks on the same noisy rings. Only difference: one hidden layer of ReLU.

Linear only
can only draw a straight line
accuracy
1 hidden layer + ReLU
can bend around the ring
accuracy
Warming up…