The names of the matrices don’t matter. You can call them O=WxI for Output = Weights x Input if it’s better for you. But many neural network layers are made of several Matrix operations. Check for instance the LSTM layer (https://en.wikipedia.org/wiki/Long_short-term_memory), there are like 12 matrix operations in one layer. What is important to know, is that backpropagation is performed at each operation. Where the error at the output of each operation, is backpropagated to all the matrices that participated in the creation of the output of this operation using the jacobian method. So if O= WxI + B (where B is the bias matrix), when you have the error at O, you backpropagate it to W, I and B. After that, if any of these matrices, is calculated out of an operation on other matrices (let’s say I=CxD), you recursively back-propagate the error from I to C and D. Hope it’s clearer now, Cheers!

Interested in artificial intelligence, machine learning, neural networks, data science, blockchain, technology, astronomy. Co-founder of Datathings, Luxembourg

Interested in artificial intelligence, machine learning, neural networks, data science, blockchain, technology, astronomy. Co-founder of Datathings, Luxembourg