Skip to content

binary_cross_entropy

source

.binary_cross_entropy(
   labels, logits, name = 'binary_cross_entropy'
)

Binary Cross Entropy

Measures the probability error in discrete binary classification tasks in which each class is independent and not mutually exclusive.

On Entropy and Cross-Entropy

Entropy refers to the number of bits required to transmit a randomly selected event from a probability distribution. A skewed distribution has a low entropy, whereas a distribution where events have equal probability has a larger entropy.

The entropy of a random variable with a set x \in X discrete states and their probability P(x), can be computed as:

H(X) = –\sum_{x \in X} P(x) * log(P(x))

Cross-entropy builds upon this idea to compute the number of bits required to represent or transmit an average event from one distribution compared to another distribution. if we consider a target distribution P and an approximation of the target distribution Q, the cross-entropy of Q from P is the number of additional bits to represent an event using Q instead of P:

H(P, Q) = –\sum_{x \in X} P(x) * log(Q(x))

Warning

This is to be used on the logits of a model, not on the predicted labels. See also from TensorFlow.

Args

  • labels (Tensor) : empiric probability values (labels that occurred for a given sample)
  • logits (Tensor) : unscaled log probabilities used to predict the labels with sigmoid(logits)
  • name (str) : op name

Returns

  • tensor (Tensor) : binary (sigmoid) cross-entropy loss.

categorical_cross_entropy

source

.categorical_cross_entropy(
   labels, logits, axis = -1, name = 'categorical_cross_entropy'
)

Categorical Cross entropy

Measures the probability error in discrete classification tasks in which the classes are mutually exclusive.

Warning

This is to be used on the logits of a model, not on the predicted labels. Do not call this loss with the output of softmax. See also from TensorFlow.

Args

  • labels (Tensor) : empiric probability distribution. Each row labels[i] must be a valid probability distribution
  • logits (Tensor) : unscaled log probabilities used to predict the labels with softmax(logits)
  • axis (int) : The class dimension. Defaulted to -1 which is the last dimension.
  • name (str) : op name (integrate to 1).

Returns

  • tensor (Tensor) : categorical (softmax) cross-entropy loss.

mse

source

.mse(
   target, predicted
)

Mean Squared Error (MSE) Loss

Measures the average of the squares of the errors - the difference between an estimator and what is estimated. This is a risk function, corresponding to the expected value of the quadratic loss:

MSE =\frac{1}{N}​\sum^{N}_{i=0}​(y-\hat{y})^2

Info

MSE is sensitive towards outliers and given several examples with the same input feature values, the optimal prediction will be their mean target value. This should be compared with Mean Absolute Error, where the optimal prediction is the median. MSE is thus good to use if you believe that your target data, conditioned on the input, is normally distributed around a mean value --and when it's important to penalize outliers.

Args

  • predicted (Tensor) : estimated target values
  • target (Tensor) : ground truth, correct values

Returns

  • tensor (Tensor) : mean squared error value

kld

source

.kld(
   target, predicted
)

Kullback–Leibler Divergence Loss

Kullback–Leibler divergence (also called relative entropy) is a measure of how one probability distribution is different from a second, reference probability distribution.

D_{KL}(P || Q) = - \sum_{x \in X}P(x) log\left(\frac{Q(x)}{P(x)}\right)

it is the expectation of the logarithmic difference between the probabilities P and Q, where the expectation is taken using the probabilities P.

Args

  • target (Tensor) : target probability distribution
  • predicted (Tensor) : distribution predicted by the model

Returns

  • kld (Tensor) : LK divergence between the target and predicted distributions

sinkhorn_loss

source

.sinkhorn_loss(
   target, predicted, epsilon, n_iter, cost_fn = None
)

Sinkhorn Loss

Alias: * tx.metrics.sinkhorn

Info

Optimal Transport (OT) provides a framework from which one can define a more powerful geometry to compare probability distributions. This power comes, however, with a heavy computational price. The cost of computing OT distances scales at least in O(d^3 log(d)) when comparing two histograms of dimension d. Sinkhorn algorithm alleviate this problem by solving an regularized OT in linear time.

Given two measures with n points each with locations x and y outputs an approximation of the Optimal Transport (OT) cost with regularization parameter epsilon, niter is the maximum number of steps in sinkhorn loop

Args

  • predicted (Tensor) : model distribution
  • target (Tensor) : ground_truth, empirical distribution
  • epsilon (float) : regularization term >0
  • n_iter (int) : number of sinkhorn iterations
  • cost_fn (Callable) : function that returns the cost matrix between y_pred and y_true, defaults to |x_i-y_j|^p.

Returns

  • cost (Tensor) : sinkhorn cost of moving from the mass from the model distribution y_pred to the empirical distribution y_true.

sparsemax_loss

source

.sparsemax_loss(
   logits, labels, name = 'sparsemax_loss'
)

Sparsemax Loss

A loss function for the sparsemax activation function. This is similar to tf.nn.softmax, but able to output s parse probabilities.

Info

Applicable to multi-label classification problems and attention-based neural networks (e.g. for natural language inference)

Args

  • labels (Tensor) : the target dense labels (one hot encoded)
  • logits (Tensor) : unnormalized log probabilities
  • name (str) : op name

Returns

  • loss (Tensor) : sparsemax loss

binary_hinge

source

.binary_hinge(
   labels, logits
)

Binary Hinge Loss

Measures the classification error for maximum-margin classification. Margin classifiers like Support Vector Machines (SVM) maximise the distance between the closest examples and the decision boundary separating the binary classes. The hinge loss is defined as:

\ell(y) = \max(0, 1-t \cdot y),

where t is the intended output (labels) and y are the output logits from the classification decision function, not the predicted class label.

Args

  • labels (Tensor) : tensor with values -1 or 1. Binary (0 or 1) labels are converted to -1 or 1.
  • logits (Tensor) : unscaled log probabilities.

Returns

  • tensor (Tensor) : hinge loss float tensor