将模型降级为委托

目标受众：对在运行时应用委托以加速程序感兴趣的机器学习工程师。

后端委托是后端处理和执行 PyTorch 程序的入口点，旨在充分利用专用后端和硬件的性能与效率优势，同时为 PyTorch 用户提供接近 PyTorch 运行时的体验。后端委托通常由 ExecuTorch 或供应商提供。在程序中利用委托的方式是通过标准的入口点 to_backend。

前端接口

有三种流程可以将程序委托给后端执行：

将整个模块降级到后端。这适用于测试后端和预处理阶段。
将整个模块降级到后端，并与其他模块组合。这适用于重用从其他流程导出的降级模块。
根据分区器降级模块的部分内容。这适用于降级包含可降级和不可降级节点的模型，并且是最简化的流程。

流程 1：降低整个模块

该流程从一个带有 Edge Dialect 表示的追踪图模块开始。为了将其降级，我们调用以下函数，该函数返回一个 LoweredBackendModule（有关此函数的更多文档可以在 Export API 参考中找到）。

# defined in backend_api.py
defto_backend(
    backend_id: str,
    edge_program: ExportedProgram,
    compile_spec: List[CompileSpec],
) -> LoweredBackendModule:

在此函数中，会调用后端的 preprocess() 函数，该函数生成一个已编译的二进制数据块，该数据块将被输出到 flatbuffer 二进制文件中。降级后的模块可以直接捕获，或者放回父模块中再捕获。最终，捕获的模块会被序列化为 flatbuffer 的模型，该模型可以被运行时加载。

以下是一个示例流程：

fromexecutorch.exir.backend.backend_apiimport to_backend
importexecutorch.exirasexir
importtorch
fromtorch.exportimport export
fromexecutorch.exirimport to_edge

# The submodule runs in a specific backend. In this example,  `BackendWithCompilerDemo` backend
classLowerableSubModel(torch.nn.Module):
    def__init__(self):
        super().__init__()

    defforward(self, x):
        return torch.sin(x)

# Convert the lowerable module to Edge IR Representation
to_be_lowered = LowerableSubModel()
example_input = (torch.ones(1), )
to_be_lowered_exir_submodule = to_edge(export(to_be_lowered, example_input))

# Import the backend implementation
fromexecutorch.exir.backend.test.backend_with_compiler_demoimport (
    BackendWithCompilerDemo,
)
lowered_module = to_backend('BackendWithCompilerDemo', to_be_lowered_exir_submodule.exported_program(), [])

我们可以通过直接运行以下命令将程序序列化为 flatbuffer 格式：

# Save the flatbuffer to a local file
save_path = "delegate.pte"
with open(save_path, "wb") as f:
    f.write(lowered_module.buffer())

流程 2：降低整个模块和组合

或者，在流程1之后，我们可以将这个降级模块与另一个模块组合：

# This submodule runs in executor runtime
classNonLowerableSubModel(torch.nn.Module):
    def__init__(self, bias):
        super().__init__()
        self.bias = bias

    defforward(self, a, b):
        return torch.add(torch.add(a, b), self.bias)


# The composite module, including lower part and non-lowerpart
classCompositeModel(torch.nn.Module):
    def__init__(self):
        super().__init__()
        self.non_lowerable = NonLowerableSubModel(torch.ones(1) * 0.3)
        self.lowerable = lowered_module

    defforward(self, x):
        a = self.lowerable(x)
        b = self.lowerable(a)
        ret = self.non_lowerable(a, b)
        return a, b, ret

composite_model = CompositeModel()
model_inputs = (torch.ones(1), )
exec_prog = to_edge(export(composite_model, model_inputs)).to_executorch()

# Save the flatbuffer to a local file
save_path = "delegate.pte"
with open(save_path, "wb") as f:
    f.write(exec_prog.buffer)

流程 3: 分区

第三个流程同样从带有 Edge Dialect 表示的追踪图模块开始。为了降低该图模块中的某些节点，我们可以使用重载的 to_backend 函数。

defto_backend(
    edge_program: ExportedProgram,
    partitioner: Partitioner,
) -> ExportedProgram:

该函数接收一个 Partitioner，它会为所有需要被降低（lowered）的节点添加标签。它将返回一个 partition_tags 字典，该字典将标签映射到后端名称和模块编译规范。随后，标记的节点将被分区，并使用 Flow 1 的流程降低到它们映射的后端。可用的辅助分区器文档在这里。这些降低后的模块将被插入到顶层模块中并进行序列化。

以下是一个流程的示例：

importexecutorch.exirasexir
fromexecutorch.exir.backend.backend_apiimport to_backend
fromexecutorch.exir.backend.test.op_partitioner_demoimport AddMulPartitionerDemo
fromexecutorch.exir.programimport (
    EdgeProgramManager,
    to_edge,
)
fromtorch.exportimport export
importtorch

classModel(torch.nn.Module):
    def__init__(self):
        super().__init__()

    defforward(self, x, y):
        x = x + y
        x = x * y
        x = x - y
        x = x / y
        x = x * y
        x = x + y
        return x

model = Model()
model_inputs = (torch.randn(1, 3), torch.randn(1, 3))

core_aten_ep = export(model, model_inputs)
edge: EdgeProgramManager = to_edge(core_aten_ep)
edge = edge.to_backend(AddMulPartitionerDemo())
exec_prog = edge.to_executorch()

# Save the flatbuffer to a local file
save_path = "delegate.pte"
with open(save_path, "wb") as f:
    f.write(exec_prog.buffer)

运行时

在程序中使用委托之后，为了与后端一起运行模型，我们需要注册后端。根据委托的实现方式，后端可以作为全局变量的一部分进行注册，也可以在主函数中显式注册。

如果后端在全局变量初始化期间注册，只要它是静态链接的，后端就会被注册。用户只需将该库作为依赖项的一部分包含进来。
如果供应商提供了注册后端的API，用户需要将该库作为依赖项的一部分包含进来，并在主函数中调用供应商提供的API显式注册后端。