### binary_cross_entropy

```
.binary_cross_entropy(
labels, logits, name = 'binary_cross_entropy'
)
```

Binary Cross Entropy

Measures the probability error in discrete binary classification tasks in which each class is independent and not mutually exclusive.

On Entropy and Cross-Entropy

Entropy refers to the number of bits required to transmit a randomly selected event from a probability distribution. A skewed distribution has a low entropy, whereas a distribution where events have equal probability has a larger entropy.

The entropy of a random variable with a set x \in X discrete states and their probability P(x), can be computed as:

Cross-entropy builds upon this idea to compute the number of bits required to represent or transmit an average event from one distribution compared to another distribution. if we consider a target distribution P and an approximation of the target distribution Q, the cross-entropy of Q from P is the number of additional bits to represent an event using Q instead of P:

Warning

This is to be used on the **logits** of a model, not on the predicted labels.
See also from TensorFlow.

**Args**

**labels**(`Tensor`

) : empiric probability values (labels that occurred for a given sample)**logits**(`Tensor`

) : unscaled log probabilities used to predict the labels with`sigmoid(logits)`

**name**(str) : op name

**Returns**

**tensor**(`Tensor`

) : binary (sigmoid) cross-entropy loss.

### categorical_cross_entropy

```
.categorical_cross_entropy(
labels, logits, axis = -1, name = 'categorical_cross_entropy'
)
```

Categorical Cross entropy

Measures the probability error in discrete classification tasks in which the classes are mutually exclusive.

Warning

This is to be used on the **logits** of a model, not on the predicted labels. Do not call this loss with the
output of softmax.
See also from TensorFlow.

**Args**

**labels**(Tensor) : empiric probability distribution. Each row labels[i] must be a valid probability distribution**logits**(Tensor) : unscaled log probabilities used to predict the labels with`softmax(logits)`

**axis**(int) : The class dimension. Defaulted to -1 which is the last dimension.**name**(str) : op name (integrate to 1).

**Returns**

**tensor**(`Tensor`

) : categorical (softmax) cross-entropy loss.

### mse

```
.mse(
target, predicted
)
```

Mean Squared Error (MSE) Loss

Measures the average of the squares of the errors - the difference between an estimator and what is estimated. This is a risk function, corresponding to the expected value of the quadratic loss:

Info

MSE is sensitive towards outliers and given several examples with the same input feature values,
the optimal prediction will be their mean target value. This should be compared with *Mean Absolute
Error*, where the optimal prediction is the median. MSE is thus good to use if you believe that your
target data, conditioned on the input, is normally distributed around a mean value --and when it's
important to penalize outliers.

**Args**

**predicted**(`Tensor`

) : estimated target values**target**(`Tensor`

) : ground truth, correct values

**Returns**

**tensor**(`Tensor`

) : mean squared error value

### kld

```
.kld(
target, predicted
)
```

Kullback–Leibler Divergence Loss

Kullback–Leibler divergence (also called relative entropy) is a measure of how one probability distribution is different from a second, reference probability distribution.

it is the expectation of the logarithmic difference between the probabilities P and Q, where the expectation is taken using the probabilities P.

**Args**

**target**(`Tensor`

) : target probability distribution**predicted**(`Tensor`

) : distribution predicted by the model

**Returns**

**kld**(`Tensor`

) : LK divergence between the target and predicted distributions

### sinkhorn_loss

```
.sinkhorn_loss(
target, predicted, epsilon, n_iter, cost_fn = None
)
```

Sinkhorn Loss

Alias:
* `tx.metrics.sinkhorn`

Info

Optimal Transport (OT) provides a framework from which one can define a more powerful geometry to compare probability distributions. This power comes, however, with a heavy computational price. The cost of computing OT distances scales at least in O(d^3 log(d)) when comparing two histograms of dimension d. Sinkhorn algorithm alleviate this problem by solving an regularized OT in linear time.

Given two measures with n points each with locations x and y outputs an approximation of the Optimal Transport (OT) cost with regularization parameter epsilon, niter is the maximum number of steps in sinkhorn loop

References

**Args**

**predicted**(`Tensor`

) : model distribution**target**(`Tensor`

) : ground_truth, empirical distribution**epsilon**(float) : regularization term >0**n_iter**(int) : number of sinkhorn iterations**cost_fn**(Callable) : function that returns the cost matrix between y_pred and y_true, defaults to |x_i-y_j|^p.

**Returns**

**cost**(`Tensor`

) : sinkhorn cost of moving from the mass from the model distribution`y_pred`

to the empirical distribution`y_true`

.

### sparsemax_loss

```
.sparsemax_loss(
logits, labels, name = 'sparsemax_loss'
)
```

Sparsemax Loss

A loss function for the sparsemax activation function. This is similar to `tf.nn.softmax`

, but able to output s
parse probabilities.

Info

Applicable to multi-label classification problems and attention-based neural networks (e.g. for natural language inference)

**Args**

**labels**(`Tensor`

) : the target dense labels (one hot encoded)**logits**(`Tensor`

) : unnormalized log probabilities**name**(str) : op name

**Returns**

**loss**(`Tensor`

) : sparsemax loss

### binary_hinge

```
.binary_hinge(
labels, logits
)
```

Binary Hinge Loss

Measures the classification error for maximum-margin classification. Margin classifiers like Support Vector Machines (SVM) maximise the distance between the closest examples and the decision boundary separating the binary classes. The hinge loss is defined as:

where t is the intended output (labels) and y are the output logits from the classification decision function, not the predicted class label.

**Args**

**labels**(`Tensor`

) : tensor with values -1 or 1. Binary (0 or 1) labels are converted to -1 or 1.**logits**(`Tensor`

) : unscaled log probabilities.

**Returns**

**tensor**(`Tensor`

) : hinge loss float tensor