KLDivLoss

classtorch.nn.KLDivLoss(size_average=None, reduce=None, reduction='mean', log_target=False)[源代码]

Kullback-Leibler 散度损失函数。

对于形状相同的张量 $y_{\text{pred}},\ y_{\text{true}}$ ，其中 $y_{\text{pred}}$ 是input，而 $y_{\text{true}}$ 是target，我们定义逐点KL散度为：

L(y_{\text{pred}},\ y_{\text{true}}) = y_{\text{true}} \cdot \log \frac{y_{\text{true}}}{y_{\text{pred}}} = y_{\text{true}} \cdot (\log y_{\text{true}} - \log y_{\text{pred}})

为了避免计算此数量时出现下溢问题，该损失函数要求输入参数input以对数形式表示。如果log_target为True，则目标参数target也应以对数形式提供。

总之，这个函数大致相当于进行如下计算

if not log_target: # default
    loss_pointwise = target * (target.log() - input)
else:
    loss_pointwise = target.exp() * (target - input)

然后根据参数 reduction 来减少该结果：

if reduction == "mean":  # default
    loss = loss_pointwise.mean()
elif reduction == "batchmean":  # mathematically correct
    loss = loss_pointwise.sum() / input.size(0)
elif reduction == "sum":
    loss = loss_pointwise.sum()
else:  # reduction == "none"
    loss = loss_pointwise

注意

与 PyTorch 中的所有其他损失函数一样，此函数期望第一个参数 input 是模型（例如神经网络）的输出，第二个参数 target 是数据集中观察到的数据。这不同于标准数学表示法 $KL(P\ ||\ Q)$ ，其中 $$P$$ 表示观察值的分布，而 $$Q$$ 表示模型。

警告

reduction= "mean" 不会返回真正的 KL 散度值，请使用 reduction= "batchmean"，这与数学定义一致。

参数

size_average (bool, optional) – 已弃用（请参见reduction）。默认情况下，损失值会在批次中的每个损失元素上进行平均计算。需要注意的是，对于某些损失函数，每个样本包含多个损失元素。如果将size_average设置为False，则损失值会针对每个小批量求和。当reduce为False时，此参数会被忽略。默认值： True
reduce (bool, optional) – 已弃用（请参见reduction）。默认情况下，损失值会根据size_average参数在每个小批量中进行平均或求和。当reduce为False时，返回每批元素的单独损失，并忽略size_average设置。默认值： True
reduction (str, 可选) – 指定要应用于输出的缩减方式。默认值： "mean"
log_target (bool, 可选) – 指定target是否在对数空间中。默认值： False

形状:

输入: $$(*)$$ ，其中 $$*$$ 表示任意维度的数量。
目标: $$(*)$$ ，形状与输入相同。
输出：默认为标量。如果 reduction 是 'none'，则输出形式为 $$(*)$$ ，与输入的形状相同。

示例:

>>> kl_loss = nn.KLDivLoss(reduction="batchmean")
>>> # input should be a distribution in the log space
>>> input = F.log_softmax(torch.randn(3, 5, requires_grad=True), dim=1)
>>> # Sample a batch of distributions. Usually this would come from the dataset
>>> target = F.softmax(torch.rand(3, 5), dim=1)
>>> output = kl_loss(input, target)

>>> kl_loss = nn.KLDivLoss(reduction="batchmean", log_target=True)
>>> log_target = F.log_softmax(torch.rand(3, 5), dim=1)
>>> output = kl_loss(input, log_target)