loss

binary_cross_entropy

.binary_cross_entropy(
   labels, logits, name = 'binary_cross_entropy'
)

Binary Cross Entropy

Measures the probability error in discrete binary classification tasks in which each class is independent and not mutually exclusive.

On Entropy and Cross-Entropy

Entropy refers to the number of bits required to transmit a randomly selected event from a probability distribution. A skewed distribution has a low entropy, whereas a distribution where events have equal probability has a larger entropy.

The entropy of a random variable with a set $x \in X$ discrete states and their probability $P(x)$ , can be computed as:

$H(X) = –\sum_{x \in X} P(x) * log(P(x))$

Cross-entropy builds upon this idea to compute the number of bits required to represent or transmit an average event from one distribution compared to another distribution. if we consider a target distribution $P$ and an approximation of the target distribution $Q$ , the cross-entropy of $Q$ from $P$ is the number of additional bits to represent an event using Q instead of P:

$H(P, Q) = –\sum_{x \in X} P(x) * log(Q(x))$

Warning

This is to be used on the logits of a model, not on the predicted labels. See also from TensorFlow.

Args

labels (Tensor) : empiric probability values (labels that occurred for a given sample)
logits (Tensor) : unscaled log probabilities used to predict the labels with sigmoid(logits)
name (str) : op name

Returns

tensor (Tensor) : binary (sigmoid) cross-entropy loss.

categorical_cross_entropy

source

.categorical_cross_entropy(
   labels, logits, axis = -1, name = 'categorical_cross_entropy'
)

Categorical Cross entropy

Measures the probability error in discrete classification tasks in which the classes are mutually exclusive.

Warning

This is to be used on the logits of a model, not on the predicted labels. Do not call this loss with the output of softmax. See also from TensorFlow.

Args

labels (Tensor) : empiric probability distribution. Each row labels[i] must be a valid probability distribution
logits (Tensor) : unscaled log probabilities used to predict the labels with softmax(logits)
axis (int) : The class dimension. Defaulted to -1 which is the last dimension.
name (str) : op name (integrate to 1).

Returns

tensor (Tensor) : categorical (softmax) cross-entropy loss.

mse

source

.mse(
   target, predicted
)

Mean Squared Error (MSE) Loss

Measures the average of the squares of the errors - the difference between an estimator and what is estimated. This is a risk function, corresponding to the expected value of the quadratic loss:

$MSE =\frac{1}{N}\sum^{N}_{i=0}(y-\hat{y})^2$

Info

MSE is sensitive towards outliers and given several examples with the same input feature values, the optimal prediction will be their mean target value. This should be compared with Mean Absolute Error, where the optimal prediction is the median. MSE is thus good to use if you believe that your target data, conditioned on the input, is normally distributed around a mean value --and when it's important to penalize outliers.

Args

predicted (Tensor) : estimated target values
target (Tensor) : ground truth, correct values

Returns

tensor (Tensor) : mean squared error value

kld

source

.kld(
   target, predicted
)

Kullback–Leibler Divergence Loss

Kullback–Leibler divergence (also called relative entropy) is a measure of how one probability distribution is different from a second, reference probability distribution.

$D_{KL}(P || Q) = - \sum_{x \in X}P(x) log\left(\frac{Q(x)}{P(x)}\right)$

it is the expectation of the logarithmic difference between the probabilities $P$ and $Q$ , where the expectation is taken using the probabilities $P$ .

Args

target (Tensor) : target probability distribution
predicted (Tensor) : distribution predicted by the model

Returns

kld (Tensor) : LK divergence between the target and predicted distributions

sinkhorn_loss

source

.sinkhorn_loss(
   target, predicted, epsilon, n_iter, cost_fn = None
)

Sinkhorn Loss

Alias: * tx.metrics.sinkhorn

Info

Optimal Transport (OT) provides a framework from which one can define a more powerful geometry to compare probability distributions. This power comes, however, with a heavy computational price. The cost of computing OT distances scales at least in $O(d^3 log(d))$ when comparing two histograms of dimension $d$ . Sinkhorn algorithm alleviate this problem by solving an regularized OT in linear time.

Given two measures with n points each with locations x and y outputs an approximation of the Optimal Transport (OT) cost with regularization parameter epsilon, niter is the maximum number of steps in sinkhorn loop

References

Args

predicted (Tensor) : model distribution
target (Tensor) : ground_truth, empirical distribution
epsilon (float) : regularization term >0
n_iter (int) : number of sinkhorn iterations
cost_fn (Callable) : function that returns the cost matrix between y_pred and y_true, defaults to $|x_i-y_j|^p$ .

Returns

cost (Tensor) : sinkhorn cost of moving from the mass from the model distribution y_pred to the empirical distribution y_true.

sparsemax_loss

source

.sparsemax_loss(
   logits, labels, name = 'sparsemax_loss'
)

Sparsemax Loss

A loss function for the sparsemax activation function. This is similar to tf.nn.softmax, but able to output s parse probabilities.

Info

Applicable to multi-label classification problems and attention-based neural networks (e.g. for natural language inference)

References

From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification

Args

labels (Tensor) : the target dense labels (one hot encoded)
logits (Tensor) : unnormalized log probabilities
name (str) : op name

Returns

loss (Tensor) : sparsemax loss

binary_hinge

source

.binary_hinge(
   labels, logits
)

Binary Hinge Loss

Measures the classification error for maximum-margin classification. Margin classifiers like Support Vector Machines (SVM) maximise the distance between the closest examples and the decision boundary separating the binary classes. The hinge loss is defined as:

$\ell(y) = \max(0, 1-t \cdot y),$

where $t$ is the intended output (labels) and $y$ are the output logits from the classification decision function, not the predicted class label.

Args

labels (Tensor) : tensor with values -1 or 1. Binary (0 or 1) labels are converted to -1 or 1.
logits (Tensor) : unscaled log probabilities.

Returns

tensor (Tensor) : hinge loss float tensor