准备FX

torch.ao.quantization.quantize_fx.prepare_fx(model, qconfig_mapping, example_inputs, prepare_custom_config=None, _equalization_config=None, backend_config=None)[源代码]

为模型的训练后的量化做好准备

参数
  • model (*) – 基于 torch.nn.Module 的模型

  • qconfig_mapping (*) – 用于配置模型量化方式的 QConfigMapping 对象,详情请参见 QConfigMapping

  • example_inputs (*) – 模型前向函数的示例输入,以元组形式提供位置参数(关键字参数也可作为位置参数传递)

  • prepare_custom_config (*) – 用于量化工具的自定义配置。更多详情请参见 PrepareCustomConfig

  • _equalization_config (*) – 用于配置如何对模型进行均衡化的设置

  • backend_config (*) – 指定后端中操作符的量化方式,包括如何观察操作符、支持的融合模式、如何插入量化和去量化的操作符以及支持的数据类型等。详情请参阅BackendConfig

返回值

一个带有观察器(通过 qconfig_mapping 配置)的 GraphModule,准备进行校准

返回类型

GraphModule

示例:

import torch
from torch.ao.quantization import get_default_qconfig_mapping
from torch.ao.quantization.quantize_fx import prepare_fx

class Submodule(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.linear = torch.nn.Linear(5, 5)
    def forward(self, x):
        x = self.linear(x)
        return x

class M(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.linear = torch.nn.Linear(5, 5)
        self.sub = Submodule()

    def forward(self, x):
        x = self.linear(x)
        x = self.sub(x) + x
        return x

# initialize a floating point model
float_model = M().eval()

# define calibration function
def calibrate(model, data_loader):
    model.eval()
    with torch.no_grad():
        for image, target in data_loader:
            model(image)

# qconfig is the configuration for how we insert observers for a particular
# operator
# qconfig = get_default_qconfig("fbgemm")
# Example of customizing qconfig:
# qconfig = torch.ao.quantization.QConfig(
#    activation=MinMaxObserver.with_args(dtype=torch.qint8),
#    weight=MinMaxObserver.with_args(dtype=torch.qint8))
# `activation` and `weight` are constructors of observer module

# qconfig_mapping is a collection of quantization configurations, user can
# set the qconfig for each operator (torch op calls, functional calls, module calls)
# in the model through qconfig_mapping
# the following call will get the qconfig_mapping that works best for models
# that target "fbgemm" backend
qconfig_mapping = get_default_qconfig_mapping("fbgemm")

# We can customize qconfig_mapping in different ways.
# e.g. set the global qconfig, which means we will use the same qconfig for
# all operators in the model, this can be overwritten by other settings
# qconfig_mapping = QConfigMapping().set_global(qconfig)
# e.g. quantize the linear submodule with a specific qconfig
# qconfig_mapping = QConfigMapping().set_module_name("linear", qconfig)
# e.g. quantize all nn.Linear modules with a specific qconfig
# qconfig_mapping = QConfigMapping().set_object_type(torch.nn.Linear, qconfig)
# for a more complete list, please see the docstring for :class:`torch.ao.quantization.QConfigMapping`
# argument

# example_inputs is a tuple of inputs, that is used to infer the type of the
# outputs in the model
# currently it's not used, but please make sure model(*example_inputs) runs
example_inputs = (torch.randn(1, 3, 224, 224),)

# TODO: add backend_config after we split the backend_config for fbgemm and qnnpack
# e.g. backend_config = get_default_backend_config("fbgemm")
# `prepare_fx` inserts observers in the model based on qconfig_mapping and
# backend_config. If the configuration for an operator in qconfig_mapping
# is supported in the backend_config (meaning it's supported by the target
# hardware), we'll insert observer modules according to the qconfig_mapping
# otherwise the configuration in qconfig_mapping will be ignored
#
# Example:
# in qconfig_mapping, user sets linear module to be quantized with quint8 for
# activation and qint8 for weight:
# qconfig = torch.ao.quantization.QConfig(
#     observer=MinMaxObserver.with_args(dtype=torch.quint8),
#     weight=MinMaxObserver.with-args(dtype=torch.qint8))
# Note: current qconfig api does not support setting output observer, but
# we may extend this to support these more fine grained control in the
# future
#
# qconfig_mapping = QConfigMapping().set_object_type(torch.nn.Linear, qconfig)
# in backend config, linear module also supports in this configuration:
# weighted_int8_dtype_config = DTypeConfig(
#   input_dtype=torch.quint8,
#   output_dtype=torch.quint8,
#   weight_dtype=torch.qint8,
#   bias_type=torch.float)

# linear_pattern_config = BackendPatternConfig(torch.nn.Linear) \
#    .set_observation_type(ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT) \
#    .add_dtype_config(weighted_int8_dtype_config) \
#    ...

# backend_config = BackendConfig().set_backend_pattern_config(linear_pattern_config)
# `prepare_fx` will check that the setting requested by suer in qconfig_mapping
# is supported by the backend_config and insert observers and fake quant modules
# in the model
prepared_model = prepare_fx(float_model, qconfig_mapping, example_inputs)
# Run calibration
calibrate(prepared_model, sample_inference_data)
本页目录