Image recognition is just a classifier, great idea for a next post on how classification works :) But in a nutshell, pixels are numbers, they move forward in the neural network, and the last layer convert the outputs to probabilities, then the loss function compares probabilities. Let’s say we have 2 categories: cat/dog -> the output will be cat 0.6 dog 0.4. If the real output is a cat meaning the probabilities should be: cat 1.0 dog 0.0. As simple as this :)