The [line](https://github.com/adeveloperdiary/blog/blob/6442b3af0d462608c5c7c8e6533e1fa920fe559a/Backpropagation_Algorithm_using_Softmax/main.py#L91) to calculate the cost of Softmax output is not quite correct. Since Y and A are of shape (n_categories, batch_size), the cost should be  So I think the correct version should be `np.mean(-np.sum(Y * np.log(A.T), axis=0))`