binary_cross_entropy
.binary_cross_entropy(
labels, logits, name = 'binary_cross_entropy'
)
Binary Cross Entropy
Measures the probability error in discrete binary classification tasks in which each class is independent and not mutually exclusive.
On Entropy and Cross-Entropy
Entropy refers to the number of bits required to transmit a randomly selected event from a probability distribution. A skewed distribution has a low entropy, whereas a distribution where events have equal probability has a larger entropy.
The entropy of a random variable with a set x \in X discrete states and their probability P(x), can be computed as:
Cross-entropy builds upon this idea to compute the number of bits required to represent or transmit an average event from one distribution compared to another distribution. if we consider a target distribution P and an approximation of the target distribution Q, the cross-entropy of Q from P is the number of additional bits to represent an event using Q instead of P:
Warning
This is to be used on the logits of a model, not on the predicted labels. See also from TensorFlow.
Args
- labels (
Tensor
) : empiric probability values (labels that occurred for a given sample) - logits (
Tensor
) : unscaled log probabilities used to predict the labels withsigmoid(logits)
- name (str) : op name
Returns
- tensor (
Tensor
) : binary (sigmoid) cross-entropy loss.
categorical_cross_entropy
.categorical_cross_entropy(
labels, logits, axis = -1, name = 'categorical_cross_entropy'
)
Categorical Cross entropy
Measures the probability error in discrete classification tasks in which the classes are mutually exclusive.
Warning
This is to be used on the logits of a model, not on the predicted labels. Do not call this loss with the output of softmax. See also from TensorFlow.
Args
- labels (Tensor) : empiric probability distribution. Each row labels[i] must be a valid probability distribution
- logits (Tensor) : unscaled log probabilities used to predict the labels with
softmax(logits)
- axis (int) : The class dimension. Defaulted to -1 which is the last dimension.
- name (str) : op name (integrate to 1).
Returns
- tensor (
Tensor
) : categorical (softmax) cross-entropy loss.
mse
.mse(
target, predicted
)
Mean Squared Error (MSE) Loss
Measures the average of the squares of the errors - the difference between an estimator and what is estimated. This is a risk function, corresponding to the expected value of the quadratic loss:
Info
MSE is sensitive towards outliers and given several examples with the same input feature values, the optimal prediction will be their mean target value. This should be compared with Mean Absolute Error, where the optimal prediction is the median. MSE is thus good to use if you believe that your target data, conditioned on the input, is normally distributed around a mean value --and when it's important to penalize outliers.
Args
- predicted (
Tensor
) : estimated target values - target (
Tensor
) : ground truth, correct values
Returns
- tensor (
Tensor
) : mean squared error value
kld
.kld(
target, predicted
)
Kullback–Leibler Divergence Loss
Kullback–Leibler divergence (also called relative entropy) is a measure of how one probability distribution is different from a second, reference probability distribution.
it is the expectation of the logarithmic difference between the probabilities P and Q, where the expectation is taken using the probabilities P.
Args
- target (
Tensor
) : target probability distribution - predicted (
Tensor
) : distribution predicted by the model
Returns
- kld (
Tensor
) : LK divergence between the target and predicted distributions
sinkhorn_loss
.sinkhorn_loss(
target, predicted, epsilon, n_iter, cost_fn = None
)
Sinkhorn Loss
Alias:
* tx.metrics.sinkhorn
Info
Optimal Transport (OT) provides a framework from which one can define a more powerful geometry to compare probability distributions. This power comes, however, with a heavy computational price. The cost of computing OT distances scales at least in O(d^3 log(d)) when comparing two histograms of dimension d. Sinkhorn algorithm alleviate this problem by solving an regularized OT in linear time.
Given two measures with n points each with locations x and y outputs an approximation of the Optimal Transport (OT) cost with regularization parameter epsilon, niter is the maximum number of steps in sinkhorn loop
References
Args
- predicted (
Tensor
) : model distribution - target (
Tensor
) : ground_truth, empirical distribution - epsilon (float) : regularization term >0
- n_iter (int) : number of sinkhorn iterations
- cost_fn (Callable) : function that returns the cost matrix between y_pred and y_true, defaults to |x_i-y_j|^p.
Returns
- cost (
Tensor
) : sinkhorn cost of moving from the mass from the model distributiony_pred
to the empirical distributiony_true
.
sparsemax_loss
.sparsemax_loss(
logits, labels, name = 'sparsemax_loss'
)
Sparsemax Loss
A loss function for the sparsemax activation function. This is similar to tf.nn.softmax
, but able to output s
parse probabilities.
Info
Applicable to multi-label classification problems and attention-based neural networks (e.g. for natural language inference)
Args
- labels (
Tensor
) : the target dense labels (one hot encoded) - logits (
Tensor
) : unnormalized log probabilities - name (str) : op name
Returns
- loss (
Tensor
) : sparsemax loss
binary_hinge
.binary_hinge(
labels, logits
)
Binary Hinge Loss
Measures the classification error for maximum-margin classification. Margin classifiers like Support Vector Machines (SVM) maximise the distance between the closest examples and the decision boundary separating the binary classes. The hinge loss is defined as:
where t is the intended output (labels) and y are the output logits from the classification decision function, not the predicted class label.
Args
- labels (
Tensor
) : tensor with values -1 or 1. Binary (0 or 1) labels are converted to -1 or 1. - logits (
Tensor
) : unscaled log probabilities.
Returns
- tensor (
Tensor
) : hinge loss float tensor