3/24/2023 0 Comments Cross entropy loss![]() ![]() The formula of cross entropy in Python is def cross_entropy(p): return -np.log(p) But it is not always obvious how good the model is doing from the looking at this value. Often, as the machine learning model is being trained, the average value of this loss is printed on the screen. criterion = nn.Cross-entropy loss at different probabilities for the correct classĬross-entropy loss is used for classification machine learning models. Which is mathematically equivalent to using CrossEntropyLoss with a model that does not use softmax activation. X = model(data) # assuming the output of the model is softmax activated That said, if you absolutely needed to for some reason, you would take the log of the outputs and provide them to NLLLoss criterion = nn.NLLLoss() Generally, you don't want to train a network that outputs softmaxed outputs for stability reasons mentioned above. If the network has a final softmax layer, how to train the network (which loss, and how)? If the goal is to just find the relative ordering or highest probability class then just apply argsort or argmax to the output directly (since softmax maintains relative ordering). If the network has a final linear layer, how to infer the probabilities per class?Īpply softmax to the output of the network to infer the probabilities per class. How to train a "standard" classification network in the best way? Conclusionīased on the above discussion the answers to your questions are 1. NLLLoss in PyTorch)ĬrossEntropyLoss(x, y) := NLLLoss(LogSoftmax(x), y)ĭue to the exponentiation in softmax, there are some computational "tricks" that make directly using CrossEntropyLoss more stable (more accurate, less likely to get NaN), than computing it in stages. Note that one_hot is a function that takes an index y, and expands it into a one-hot vector.Įquivalently you can formulate CrossEntropyLoss as a combination of LogSoftmax and negative log-likelihood loss (i.e. SpecificallyĬrossEntropyLoss(x, y) := H(one_hot(y), softmax(x)) The definition of CrossEntropyLoss in PyTorch is a combination of softmax and cross-entropy. In this case, prior to softmax, the model's goal is to produce the highest value possible for the correct label and the lowest value possible for the incorrect label. This implies that the maximum element in the input to softmax corresponds to the maximum element in the output of softmax.Ĭonsider a softmax activated model trained to minimize cross-entropy. It's important to stress the second point about relative ordering. The key thing from a practical standpoint is that softmax is a function that takes a list of unbounded values as input, and outputs a valid probability mass function with the relative ordering maintained. SoftmaxĪgain, there are some complicated statistical ways to interpret softmax that we won't discuss here. Therefore, the class that your model thinks is most likely is the class corresponding to the highest value of q. By training a system to minimize cross entropy we are telling the system that we want it to try and make the estimated distribution as close as it can to the true distribution. On the other hand, the estimated distribution (output of a model), q, generally contains some uncertainty, so the probability of any class in q will be between 0 and 1. In the context of ML classification we know the actual label of the training data, so the true/target distribution, p, has a probability of 1 for the true label and 0 elsewhere, i.e. q represents an estimated distribution, and p represents a true distribution. The key thing to pay attention to is that cross-entropy is a function that takes, as input, two probability distributions: q and p and returns a value that is minimal when q and p are equal. This is the intro text I used in grad school and I thought it did a very good job (granted I had a wonderful instructor as well). This concept is introduced pretty early on (chapter 2 I believe). From a practical standpoint it's probably not worth getting into the formal motivation of cross-entropy, though if you're interested I would recommend Elements of Information Theory by Cover and Thomas as an introductory text. Cross Entropy H(p, q)Ĭross-entropy is a function that compares two probability distributions. Once you have a grasp on these two concepts then it should be clear how they may be "correctly" used in the context of ML. I think that it's important to understand softmax and cross-entropy, at least from a practical point of view. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |