Thank you for sharing the code. I have a conceptual question about optimizing a loss function in NumPy without using automatic differentiation (e.g., PyTorch's requires_grad). How to manually compute gradients of a loss function in NumPy (without automatic differentiation like requires_grad) and optimize parameters using gradient descent?