前向模式自动微分（Beta） - pytorch tutorials中文文档

PyTorch 入门指南

学习基础知识

快速入门

张量

数据集与数据加载器

变换操作

构建神经网络

自动微分与 torch.autograd

优化模型参数

保存和加载模型

PyTorch 自定义操作符

学习 PyTorch

PyTorch 深度学习实战：60 分钟快速入门教程

通过示例学习 PyTorch

torch.nn 究竟是什么？

从零开始的自然语言处理

使用 TensorBoard 可视化模型、数据和训练过程。

关于在 PyTorch 中使用非阻塞和 pin_memory() 的良好实践指南

图像和视频

TorchVision 目标检测微调教程

计算机视觉中的迁移学习教程

对抗样本生成

DCGAN教程

空间变换网络教程

优化视觉变压器模型以进行部署

使用 PyTorch 和 TIAToolbox 进行全-slide 图像分类

音频

音频输入输出

音频重采样

音频数据增强

音频特征提取

音频特征增强

音频数据集

基于 Wav2Vec2 的语音识别技术

基于Tacotron2的文本转语音系统

使用 Wav2Vec2 进行强制对齐

后端

ONNX 入门

强化学习

强化学习（DQN）教程

强化学习（PPO）与 TorchRL 教程

训练一个玩马里奥的游戏代理，使用强化学习方法。

Pendulum：用TorchRL编写环境和转换

在生产环境中部署 PyTorch 模型

ONNX 入门

通过 Flask 框架使用 REST API 在 Python 中部署 PyTorch

TorchScript简介

在 C++ 中加载 TorchScript 模型

（可选）将 PyTorch 模型导出为 ONNX，并使用 ONNX Runtime 进行运行。

在 Raspberry Pi 4 上实现实时推理（30 帧/秒！）

Profiling PyTorch

_profiling您的PyTorch模块_

Holistic Trace 分析介绍

使用整体痕迹分析的痕迹差异追踪或者更自然一些：基于整体痕迹分析的痕迹差异追踪

代码变换与FX

（测试版）在FX中构建卷积和批量归一化的融合器

（测试版）使用FX 构建简单的CPU性能剖析工具

前端API

(beta) PyTorch 中的 Channels Last 内存格式

前向模式自动微分（ Beta 版）

雅可比矩阵、海森矩阵、HVP、VHP 等：组合函数变换

模型集成

per-样本梯度

使用 PyTorch 的 C++ 前端

TorchScript中的动态并行计算

C++ 前端的自动微分

扩展 PyTorch

PyTorch 自定义操作符

Python 自定义运算符

自定义 C++ 和 CUDA 操作符

双反向传播与自定义函数

使用自定义函数将卷积和批量归一化融合在一起

自定义 C++ 和 CUDA 扩展

使用自定义 C++ 操作符扩展 TorchScript

使用自定义 C++ 类扩展 TorchScript

在 C++ 中注册一个调度操作符

在 C++ 中扩展调度器以支持新的后端

通过PrivateUse1简化新后端集成

模型优化

_profiling您的PyTorch模块_

使用 TensorBoard 的 PyTorch 分析器

使用 Ray Tune 进行超参数调优

优化视觉变压器模型以进行部署

参数化教程

剪枝教程

（测试版）LSTM 单词语言模型的动态量化

（测试版）BERT的动态量化

（测试版）计算机视觉中的量化迁移学习教程

（测试版）PyTorch 中的静态量化（带 Eager 模式）

从基础知识出发，掌握 PyTorch 在英特尔 CPU 上的性能

从基础知识出发，掌握 PyTorch 在英特尔 CPU 上的性能（第二部分）

入门 - 使用 nvFuser 加速您的脚本

使用 Ax 进行多目标神经架构搜索

torch.compile 介绍

编译的自动微分：为 torch.compile 捕获更大范围的反向图

Inductor CPU 后端调试与性能分析

（测试版）使用缩放点积注意力（SDPA）实现高性能变压器

知识蒸馏教程

并行和分布式训练

分布式和并行训练教程

PyTorch 分布式概述

PyTorch 分布式数据并行 - 视频教程

单机模型并行的最佳实践

分布式数据并行入门

使用 PyTorch 编写分布式应用程序

开始使用全 shards 数据并行 (FSDP)

使用全数据并行（FSDP）进行高级模型训练

Libuv TCPStore 后端简介

使用张量并行（TP）进行大规模变压器模型训练

分布式管道并行简介

使用 C++ 扩展自定义进程组后端

分布式RPC框架入门

使用分布式远程过程调用框架实现参数服务器

使用异步执行来实现批处理 RPC 处理

结合分布式数据并行和分布式远程过程调用框架

使用 Join 上下文管理器进行输入不均匀的分布式训练

边缘端的 ExecuTorch

导出到 ExecuTorch 教程

在 C++ 中运行 ExecuTorch 模型教程

使用 ExecuTorch 开发者工具进行模型性能分析

构建 ExecuTorch iOS 演示应用

构建一个 ExecuTorch Android 演示应用

将模型降级为委托

推荐系统

TorchRec 入门

探索 TorchRec 分片功能

多模态

TorchMultimodal教程：微调FLAVA

前向模式自动微分（测试版） 本教程演示了如何使用前向模式自动微分（AD）来计算方向导数（或等效地，雅可比-向量乘积）。 以下教程使用了一些仅在版本 >= 1.11（或 nightly 构建）中可用的 API。 另外请注意，前向模式 AD 目前处于测试阶段。API 可能会发生变化，且操作符的覆盖范围尚不完整。 基本用法 与反向模式自动微分不同，正向模式自动微分会与正向传播同时急切地计算梯度。我们可以使用正向模式自动微分来计算方向导数，具体方法是在执行正向传播之前，首先将输入与另一个表示方向导数方向（或者等价地，雅可比-向量乘积中的 v）的张量相关联。当我们将输入（称为“原始值”）与一个“方向”张量（称为“切向量”）关联时，生成的新张量对象被称为“对偶张量”，因为它与对偶数[0]相关。 在前向传播过程中，如果任何输入张量是双张量，则会执行额外的计算以传播该函数的“敏感性”。 importtorch importtorch.autograd.forward_adasfwAD primal = torch.randn(10, 10) tangent = torch.randn(10, 10) deffn(x, y): return x ** 2 + y ** 2 # All forward AD computation must be performed in the context of # a ``dual_level`` context. All dual tensors created in such a context # will have their tangents destroyed upon exit. This is to ensure that # if the output or intermediate results of this computation are reused # in a future forward AD computation, their tangents (which are associated # with this computation) won't be confused with tangents from the later # computation. with fwAD.dual_level(): # To create a dual tensor we associate a tensor, which we call the # primal with another tensor of the same size, which we call the tangent. # If the layout of the tangent is different from that of the primal, # The values of the tangent are copied into a new tensor with the same # metadata as the primal. Otherwise, the tangent itself is used as-is. # # It is also important to note that the dual tensor created by # ``make_dual`` is a view of the primal. dual_input = fwAD.make_dual(primal, tangent) assert fwAD.unpack_dual(dual_input).tangent is tangent # To demonstrate the case where the copy of the tangent happens, # we pass in a tangent with a layout different from that of the primal dual_input_alt = fwAD.make_dual(primal, tangent.T) assert fwAD.unpack_dual(dual_input_alt).tangent is not tangent # Tensors that do not have an associated tangent are automatically # considered to have a zero-filled tangent of the same shape. plain_tensor = torch.randn(10, 10) dual_output = fn(dual_input, plain_tensor) # Unpacking the dual returns a ``namedtuple`` with ``primal`` and ``tangent`` # as attributes jvp = fwAD.unpack_dual(dual_output).tangent assert fwAD.unpack_dual(dual_output).tangent is None 与模块的使用 要在前向自动微分中使用 nn.Module，在执行前向传播之前，将模型的参数替换为对偶张量。截至目前，无法创建对偶张量 `nn.Parameter`。作为一种解决方法，必须将对偶张量注册为模块的非参数属性。 importtorch.nnasnn model = nn.Linear(5, 5) input = torch.randn(16, 5) params = {name: p for name, p in model.named_parameters()} tangents = {name: torch.rand_like(p) for name, p in params.items()} with fwAD.dual_level(): for name, p in params.items(): delattr(model, name) setattr(model, name, fwAD.make_dual(p, tangents[name])) out = model(input) jvp = fwAD.unpack_dual(out).tangent 使用功能模块 API（测试版） 另一种将 nn.Module 与前向自动微分（AD）结合使用的方式是利用函数式 Module API（也称为无状态 Module API）。 fromtorch.funcimport functional_call # We need a fresh module because the functional call requires the # the model to have parameters registered. model = nn.Linear(5, 5) dual_params = {} with fwAD.dual_level(): for name, p in params.items(): # Using the same ``tangents`` from the above section dual_params[name] = fwAD.make_dual(p, tangents[name]) out = functional_call(model, dual_params, input) jvp2 = fwAD.unpack_dual(out).tangent # Check our results assert torch.allclose(jvp, jvp2) 自定义 autograd 函数 自定义函数同样支持前向模式自动微分。要创建支持前向模式自动微分的自定义函数，需要注册 jvp() 静态方法。自定义函数可以同时支持前向和反向自动微分，但这不是强制要求。更多信息请参阅文档。 classFn(torch.autograd.Function): @staticmethod defforward(ctx, foo): result = torch.exp(foo) # Tensors stored in ``ctx`` can be used in the subsequent forward grad # computation. ctx.result = result return result @staticmethod defjvp(ctx, gI): gO = gI * ctx.result # If the tensor stored in`` ctx`` will not also be used in the backward pass, # one can manually free it using ``del`` del ctx.result return gO fn = Fn.apply primal = torch.randn(10, 10, dtype=torch.double, requires_grad=True) tangent = torch.randn(10, 10) with fwAD.dual_level(): dual_input = fwAD.make_dual(primal, tangent) dual_output = fn(dual_input) jvp = fwAD.unpack_dual(dual_output).tangent # It is important to use ``autograd.gradcheck`` to verify that your # custom autograd Function computes the gradients correctly. By default, # ``gradcheck`` only checks the backward-mode (reverse-mode) AD gradients. Specify # ``check_forward_ad=True`` to also check forward grads. If you did not # implement the backward formula for your function, you can also tell ``gradcheck`` # to skip the tests that require backward-mode AD by specifying # ``check_backward_ad=False``, ``check_undefined_grad=False``, and # ``check_batched_grad=False``. torch.autograd.gradcheck(Fn.apply, (primal,), check_forward_ad=True, check_backward_ad=False, check_undefined_grad=False, check_batched_grad=False) True 功能API（测试版） 我们还提供了一个更高级的函数式 API functorch，用于计算雅可比-向量积。根据您的使用场景，您可能会发现这个 API 更简单易用。 函数式 API 的优势在于，您无需理解或使用底层的对偶张量 API，并且可以将其与其他 functorch 变换（如 vmap） 结合使用；缺点是它提供的控制较少。 请注意，本教程的剩余部分需要 functorch (https://github.com/pytorch/functorch) 才能运行。请访问指定链接以获取安装说明。 importfunctorchasft primal0 = torch.randn(10, 10) tangent0 = torch.randn(10, 10) primal1 = torch.randn(10, 10) tangent1 = torch.randn(10, 10) deffn(x, y): return x ** 2 + y ** 2 # Here is a basic example to compute the JVP of the above function. # The ``jvp(func, primals, tangents)`` returns ``func(*primals)`` as well as the # computed Jacobian-vector product (JVP). Each primal must be associated with a tangent of the same shape. primal_out, tangent_out = ft.jvp(fn, (primal0, primal1), (tangent0, tangent1)) # ``functorch.jvp`` requires every primal to be associated with a tangent. # If we only want to associate certain inputs to `fn` with tangents, # then we'll need to create a new function that captures inputs without tangents: primal = torch.randn(10, 10) tangent = torch.randn(10, 10) y = torch.randn(10, 10) importfunctools new_fn = functools.partial(fn, y=y) primal_out, tangent_out = ft.jvp(new_fn, (primal,), (tangent,)) 使用功能式 API 与模块 要将 nn.Module 与 functorch.jvp 结合使用来计算模型参数的 Jacobian-向量积，我们需要将 nn.Module 重新表述为一个同时接受模型参数和模块输入的函数。 model = nn.Linear(5, 5) input = torch.randn(16, 5) tangents = tuple([torch.rand_like(p) for p in model.parameters()]) # Given a ``torch.nn.Module``, ``ft.make_functional_with_buffers`` extracts the state # (``params`` and buffers) and returns a functional version of the model that # can be invoked like a function. # That is, the returned ``func`` can be invoked like # ``func(params, buffers, input)``. # ``ft.make_functional_with_buffers`` is analogous to the ``nn.Modules`` stateless API # that you saw previously and we're working on consolidating the two. func, params, buffers = ft.make_functional_with_buffers(model) # Because ``jvp`` requires every input to be associated with a tangent, we need to # create a new function that, when given the parameters, produces the output deffunc_params_only(params): return func(params, buffers, input) model_output, jvp_out = ft.jvp(func_params_only, (params,), (tangents,)) [0] https://en.wikipedia.org/wiki/Dual_number 下载 Python 源代码: forward_ad_usage.py 下载 Jupyter notebook: forward_ad_usage.ipynb

本页目录