Compass Optimizer, for shortOPTis a part of Zhouyi Compass Neural Network Compiler (Python package named AIPUBuilder) tool chain, mainly responsible for theCompass Unified ParserThe intermediate representation (IR) of the converted floating-point model is further optimized (through quantization, calculation graph optimization, etc.) to be a quantized model or a mixed-precision model intermediate representation suitable for execution on the Zhouyi NPU hardware platform.For more introduction about the Compass IR specification and the Compass SDK, seehttps://aijishu.com/a/1060000000215443.

In addition to this summary description, seetutorial.pdfdetailed introduction.

main features

The main functional characteristics of OPT are as follows:

  • Supports multiple model quantization methods: tensor-by-tensor quantization, channel-by-channel quantization, asymmetric quantization, and symmetric quantization

  • Support mixed precision quantization: such as 8-bit, 16-bit mixed quantization, partial layer quantization, partial layer floating-point operation, automatic search for quantization accuracy

  • Support layer-by-layer configuration of quantitative related parameters: parameters can be configured layer by layer through the json configuration file

  • Supports a variety of commonly used quantitative calibration schemes:



  • Support tiling of common operators

  • Adapt to Zhouyi’s full series of hardware platforms, improve the quantitative realization of each operator and optimize and adjust the calculation graph structure

quick start

installation guide

you can passCompass_IntegrationTo compile an AIPUBuilder that includes OPT, please refer to the instructions for using AIPUBuilderMiniPkgInstructions inside.

Additionally, OPT can be run independently.As long as the following dependencies are met, it can be executed directlyAIPUBuilder/Optimizer/tools/optimizer_main.pyfile to run OPT.

install dependencies

  • Python3 >= 3.8.5
  • NumPy >= 1.22.3
  • NetworkX >= 2.8
  • torch >= 1.11.1
  • torchvision >= 0.12.0

run OPT

OPT is driven by a configuration file as input, you can run OPT as follows:


export PYTHONPATH=./:$PYTHONPATH
python3 ./AIPUBuilder/Optimizer/tools/optimizer_main.py --cfg ./opt.cfg

Configuration file format

All options must be in Common Inside the field:

  • graph Enter the definition file path of Float IR

  • bin Enter the weight file path for Float IR

  • model_name the name of this model

  • dataset The name of the plug-in class used to read the dataset corresponding to this model (can be used separatelyoptimizer_main.py --plugincommand to view all implemented dataset plugin classes)

  • calibration_data Dataset file path for calibration quantification

  • calibration_batch_size for calibration quantificationbatch_size

  • metric The name of the plug-in class used to calculate the performance index of this model (can be used separatelyoptimizer_main.py --plugincommand to view all implemented index plug-in classes), if you do not need to calculate the effect index, you do not need to set this item

  • data If setmetricThen specify the verification data set file path that should be applied

  • label If setmetricThen specify the annotation file path corresponding to the validation data set to be applied

  • metric_batch_size If setmetricThen specify the corresponding forward operationbatch_size

  • quantize_method_for_weight Methods for quantifying model weights, including:


    • per_tensor_symmetric_restricted_range

    • per_tensor_symmetric_full_range

    • per_channel_symmetric_restricted_range

    • per_channel_symmetric_full_range


    The default isper_tensor_symmetric_restricted_range

  • quantize_method_for_activation Methods for quantifying model activation responses, including:




    The default isper_tensor_symmetric_full_range

  • weight_bits The bit width of quantized model weight parameters, the default is 8

  • bias_bits The bit width of the quantization model bias parameter, the default is 32

  • activation_bits Quantize the bit width of the activation response of the model, the default is 8

  • lut_items_in_bits Quantize the size of the lut lookup table used in the calculation of some mathematical functions (such as sigmoid, tanh, etc.) (use 2lut_items_in_bitspower), the default is 8, that is, the number of entries is 256.whenactivation_bitsWhen changing, you need to adjust this setting accordingly to balance performance and accuracy (the larger the lut table, the higher the accuracy, but the more resources it consumes)

  • output_dir Directory for outputting IR and other files

More configurable options and their meanings describe the executableoptimizer_main.py --fieldCheck.

run test

model testing

existAIPUBuilder/Optimizer/test/model_test/squeezenetA typical model test case is given in , enter this directory and executesh ./run.shThat is, more complete model use cases can be found inZhouyi Model Zoo.

operator test

The test of a single operator can be regarded as a special model test. After constructing the test IR (and its input data) according to the IR definition of the single operator, the process of the model test can be reused.AIPUBuilder/Optimizer/test/op_testA typical operator test case is given in , enter this directory and executesh ./run.shThat’s it.

OPT processing flow and design concept

The main processing flow of OPT is shown in the figure below:

opt flow

  1. readParserGenerate the Float IR, and construct an internal unified graph representation g.
  2. Perform a forward operation on g with all zero inputs to check correctness.
  3. Computational graph optimization before one round of quantization for g.
  4. Perform forward calculation based on the given calibration data set, and count various statistics of each tensor in the graph.
  5. Perform a round of quantization-related computational graph optimization on g.
  6. Perform quantization transformation on the corresponding layer according to the given configuration to generate a new graph representation qg.
  7. Computational graph optimization after quantization of qg.
  8. Perform an all-zero input forward operation on qg to check correctness.
  9. Outputs this optimized quantized or mixed precision IR.
  10. Finally, decide whether to export the intermediate tensor according to the configuration, and whether to calculate the model performance indicator based on the given verification data set.

OPT as a whole adopts a mechanism that separates the scheduling framework from the specific implementation. The implementation of each operator, as well as the input data provision (dataset) and output data processing (metric) of the model are integrated into the overall process in the form of plugins.OptMasterClass scheduling, which is convenient for users to develop again, expand support for new operators or update the implementation of existing operators.

development guidelines

core data structure

The following is an overview of the OPT core data structure: opt uml

  • DtypeDefines various basic data types that may appear in IR
  • PyTensorIt is the basic class for expressing tensor in OPT, and its actual data storage and calculation are carried out through torch.Tensor
  • PyNodeRepresent the concept of layer nodes (layer) in the model, and the connection relationship between layers is reflected by shared edges (that is, PyTensor instances stored in inputs and outputs)
  • PyGraphRepresents the entire model structure, which stores all layer nodes, and its topology is maintained through the internal networkx.DiGraph instance.QuantizeGraphasPyGraphsubclasses ofOptMasterThe class actually uses the
  • OptMasterThe class controls the entire execution process of OPT, and dynamically instantiates the input data (dataset) class and output data processing (metric) class of the model according to the configuration file

Develop various plug-ins

The most common OPT development paradigm is to add more Operator, Dataset, and Metric plug-ins to support your own exclusive models, so here is a detailed introduction to the development of various plug-ins. The expansion or modification of other OPT functions (quantization method, calibration algorithm, graph optimization algorithm, etc.) is omitted here.

Naming conventions

It is recommended to use the following prefixes to identify the distinction:

  • aipubt_opt_op_ for the optimizer operator plugin.
  • aipubt_opt_dataset_ for the dataset plugin of the optimizer.
  • aipubt_opt_metric_ for the metric plugin of the optimizer.

search path

The search order for plugin files is as follows:

  1. The path specified by the environment variable AIPUPLUGIN_PATH is set in a similar way as follows: export AIPUPLUGIN_PATH=/home/user/aipubuilder_plugins/:$AIPUPLUGIN_PATH
  2. The plugin file directory under the current path, namely./plugin/.

Operator plugin writing

Operator plugin needs to implement and register two interfaces:

  • useop_register(OpType, version)Register forward functionforward_function_name(self, *args).
  • usequant_register(OpType, version)Register the quantize functionquantize_function_name(self, *args).

Among them, OpType is the built-in operator type enumeration class. If you want to replace the implementation of a built-in operator, you can directly pass in OpType when registering.layer_type_name, and set the version to 1.0 or higher (the version number of the built-in operator is 1.0).If you want to implement a new operator, call it globally before registeringregister_optype('new_layer_type_name')OpType.*new_layer_type_name can be used normally after the function registers the name to OpType; version indicates the version number of the plugin, that is, when there are multiple plugins with the same name and the same type, it will actually call the one with a larger version number (at the same time, it is necessary to pay attention to the forward function and quantize Functions are registered separately, so when you want to replace the implementation of an operator as a whole, you need to ensure that the implemented forward function and quantize function have a higher version number); self points to aPyNodeA class instance (corresponding to a certain layer in the IR), the use of its important members will be illustrated with the following code examples:


from AIPUBuilder.Optimizer.framework import *
from AIPUBuilder.Optimizer.utils import *

register_optype('DummyOP')

@op_register(OpType.DummyOP, '1024')
def dummy_forward(self, *args):
    #self.inputs and self.outputs are lists of PyTensors of this layer
    #PyTensor.betensor is the really backend tensor variable and is a instance of torch.Tensor
    inp = self.inputs[0]
    out = self.outputs[0]
    #self.constants is an ordered-dictionary for storing constant tensors, such as weights and biases
    #suggest to use self.get_constant to safely visit it
    w = self.constants['weights'] if 'weights' in self.constants else 0

    #'OPT_DEBUG, OPT_INFO, OPT_WARN, OPT_ERROR, OPT_FATAL' are basic log APIs, and only OPT_FATAL will abort execution
    OPT_INFO('layer_type=%s, layer_name=%s' % (str(self.type), self.name))

    if self.name in ['name_of_layer_x', 'name_of_layer_y'] :
        print('you can set a breakpoint here for debug usage')
    #self.attrs is an ordered-dictionary for storing the intermediate parameters, which is not writing to IR
    #suggest to use self.get_attrs to safely get a atrribute
    if self.get_attrs('layer_id') in ['2', '4', '8'] :
        print('you can also set breakpoint here in this way for debug usage')

    #self.current_batch_size indicate the current batch_size the dataloader offers
    dummy_var = inp.betensor + self.current_batch_size
    #self.quantized is flag maintained by the optimizer framework that indicates whether it's a quant_forward or normal_forward
    if self.quantized :
        #self.params is an ordered-dictionary for storing the necessary parameters
        #suggest to use self.get_param to safely get a parameter
        if self.get_param('whether_plus_one') :
            dummy_var += 1
    else :
        if self.get_param('whether_minus_one') :
            dummy_var -= 1
    out.betensor = inp.betensor if True else dummy_var

    #self.placeholders is a list where you can store temporary PyTensors for whatever you like
    if len(self.placeholders) < 1 :
        #you can use PyTensor(tensor_name) to construct an empty PyTensor,
        #or use PyTensor(tensor_name, numpy_array) to construct and initialize a PyTensor
        #dtype2nptype is a utility function in AIPUBuilder.Optimizer.utils and you can access many other utility functions here
        #Dtype defines data types NN compiler supports
        ph0 = Tensor(self.name+"/inner_temp_vars", (inp.betensor+1).cpu().numpy().astype(dtype2nptype(Dtype.FP32)))
        self.placeholders.append(ph0)
    else :
        #if the ph0 has already been put into placeholders, then we only need to update its value every time when dummy_forward is called
        self.placeholders[0].betensor = inp.betensor + 1

@quant_register(OpType.DummyOP, '1024')
def dummy_quantize(self, *args):
    inp = self.inputs[0]
    out = self.outputs[0]
    #PyTensor.scale is the linear quantization scale
    out.scale = inp.scale
    #PyTensor.zerop is the linear quantization zero point
    out.zerop = inp.zerop
    #PyTensor.qbits is the quantization bit width
    out.qbits = inp.qbits
    #PyTensor.dtype is the quantization Dtype information
    out.dtype = inp.dtype
    #PyTensor.qinvariant indicates whether the tensor is quantization invariant (like index values), and if it's True, the scale = 1.0, zerop=0
    out.qinvariant = inp.qinvariant
    #PyTensor.qmin and PyTensor.qmax are the clamp boundaries when tensor is quantized
    out.qmin = inp.qmin
    out.qmax = inp.qmax

    ph0 = self.placeholders[0]
    ph0.qinvariant = False
    #q_bits_weight, q_bits_bias, q_bits_activationin in self.attrs are used to carry the quantization bits information from per-layer opt_config file
    ph0.qbits = self.get_attrs('q_bits_activation')
    #q_mode_weight, q_mode_bias, q_mode_activationin in self.attrs are used to carry the quantization mode (per-tensor or per-channel, symmetric or asymmetric) information from per-layer opt_config file
    q_mode_activation = self.get_attrs('q_mode_activation')
    #get_linear_quant_params_from_tensor is a utility function in AIPUBuilder.Optimizer.utils and you can access many other utility functions here
    ph0.scale, ph0.zerop, ph0.qmin, ph0.qmax, ph0.dtype = get_linear_quant_params_from_tensor(ph0, q_mode_activation, ph0.qbits, is_signed = True)

    #you can set simple parameters to self.params which will be wrote to IR when serialize the model.
    self.params['whether_plus_one'] = True
    self.params['whether_minus_one'] = False
    #you can set complicated parameters like lookup tables to self.constants which will also be wrote to IR when serialize the model
    self.constants['lut'] = Tensor(self.name+"/lut", (torch.zeros(256)).cpu().numpy().astype(dtype2nptype(Dtype.UINT16)))

It should be added that after the optimizer initially reads the float IR, it will perform a normal forward to ensure that before the quantize function of each operator is called, its forward function must be called at least once (there is no guarantee that the forward function will be called before the forward function is called) The quantize function has been called), therefore, the attribute values ​​such as placeholder or attrs correctly set in the forward function can be read smoothly in the quantize function, but not necessarily vice versa.For more detailed and practical examples, please refer toAIPUBuilder/Optimizer/opsThe built-in operators in the directory.

Dataset plugin writing

The Dataset plugin directly inherits fromtorch.utils.data.Datasetclass, three public interfaces need to be provided when implementing__init__, __len__ and __getitem__see the following NumpyDatset class for specific examples:


from AIPUBuilder.Optimizer.framework import *
from AIPUBuilder.Optimizer.logger import *
from torch.utils.data import Dataset
import numpy as np

@register_plugin(PluginType.Dataset, '1.0')
class NumpyDataset(Dataset):
    #when used as calibration dataset, label_file can be omitted.
    def __init__(self, data_file, label_file=None):
        self.data = None
        self.label = None
        try:
            self.data = np.load(data_file, mmap_mode='c')
        except Exception as e:
            OPT_FATAL('the data of NumpyDataset plugin should be Numpy.ndarray and allow_pickle=False.')
        if label_file is not None:
            try:
                self.label = np.load(label_file, mmap_mode='c')
            except ValueError:
                self.label = np.load(label_file, allow_pickle=True)
    def __len__(self):
        return len(self.data)
    def __getitem__(self, idx):
        #Assume that all preprocesses have been done before save to npy file.
        #If the graph has single input tensor,
        #the data part sample[0] will be passed to the input node as is,
        #if the graph has multiple input tensors,
        #the data part sample[0][i] should be consistent with input_tensors[i] in IR.
        #If the graph has multiple output tensors,
        #the label part sample[1][i] should be consistent with output_tensors[i] in IR.
        sample = [[self.data[idx]], float("-inf")]
        if self.label is not None:
            sample[1] = self.label[idx]
        return sample

The core of the Dataset plugin is through__len__The interface informs OPT of the size of the corresponding data, and through__getitem__returned by the interfacesample[0]As the input of each forward operation of the model (the input data specification required by the user model is only fully known to the user),sample[1]As the groundtruth label information will be provided by OPTPenetrateGive the corresponding Metric plugin (only the user is fully aware of the output and label meaning information corresponding to the user model). What needs special explanation is:

  • The first parameter passed in during registration indicates the plugin category isPluginType.Dataset. The second parameter is the version number, that is, when there are multiple plugins of the same type with the same name, the one with a higher version number will actually be called.
  • When instantiating the Dataset plugin, two parameters will be passed in (specify their values ​​in the relevant fields in the cfg file):data_fileandlabel_file. These two parameters can be either the path of the file that actually stores data or label, or the path of a plain text file that indirectly stores a series of other file information (the specific parsing process is all customized by the writer).
  • All data preprocessing operations can be done in advance and stored on the hard disk (this method is recommended), and only deserialized reading is performed during actual operation to speed up forward operation, or it can be performed in__getitem__It is performed inside the function (if the stored data is in NHWC format, but the model requires NCHW format, then after reading the data, perform the corresponding permute operation and then fill in the return; for example, the stored label index starts from 0, but the model requires from 1, after reading, do the corresponding offset operation and then fill in and return; if the stored data has not been normalized, but the model requires normalized data input, then do the corresponding normalization after reading Fill in and return after one operation).
  • When a model has multiple inputs or outputs,__getitem__returnsampleNeed to be withfloat IRThe order of input and output defined in is the same, that is,sample[0]The specified data list must be the same as in the IRinput_tensorsin the same order,sample[1]The specified label list must be the same as in the IRoutput_tensorsThe order is consistent (if the called metric plugin has other requirements for label data, follow the requirements of the metric plugin).

Metric plugin writing

Metric plugin needs to inherit fromOptBaseMetricclass, use @register_plugin(PluginType.Metric, version)register(versionIndicates the version number, when there is a plugin of the same type with the same name, the one with the higher version number will be called first), and implement__init__, __call__, reset, compute, reportinterface. What each interface means and how to write it is explained with the following code examples:


from AIPUBuilder.Optimizer.framework import *
from AIPUBuilder.Optimizer.logger import *
import torch

@register_plugin(PluginType.Metric, '1.0')
class TopKMetric(OptBaseMetric):
    #you can pass any string parameters from cfg file, and parse it to what you really want
    #e.g. you can set 'metric = TopKMetric,TopKMetric(5),TopKMetric(10)' in cfg file to enable
    #calculate top1, top5 and top10 accuracy together
    def __init__(self, K='1'):
        self.correct = 0
        self.total = 0
        self.K = int(K)
    #will be called after every batch iteration, the pred is model's output_tensors (the same order in IR),
    #the target is the sample[1] generated by dataset plugin,
    #during quantize_forward the pred will be dequantized before calling metric
    def __call__(self, pred, target):
        _, pt = torch.topk(pred[0].reshape([pred[0].shape[0], -1]), self.K, dim=-1)    #NHWC
        for i in range(target.numel()):
            if target[i] in pt[i]:
                self.correct += 1
        self.total += target.numel()
    #will be called before every epoch iteration to reset the initial state
    def reset(self):
        self.correct = 0
        self.total = 0
    #will be called after every epoch iteration to get the final metric score
    def compute(self):
        try:
            acc = float(self.correct) / float(self.total)
            return acc
        except ZeroDivisionError:
            OPT_ERROR('zeroDivisionError: Topk acc total label = 0')
            return float("-inf")
    #will be called when outputing a string format metric report
    def report(self):
        return "top-%d accuracy is %f" % (self.K, self.compute())

What needs special explanation is:

  • The Metric plugin supports passing construction parameters from the cfg file, but it is limited to the string type. When writing the plugin, you need to convert the string type parameters to the target type by yourself.
  • The order of the model output results passed to the metric plugin is consistent with the order of output_tensors in the IR, and the target passed is the sample given in the Dataset plugin[1]the corresponding relationship between pred (when quant_forward will be automatically dequantized in advance) and target, as well as the logic of calculating indicators are completely controlled by the user.

code style

OPT useautopep8to check code style.AIPUBuilder/Optimizer/scriptsThere is an installation script to enable the automatic checking mechanism, please make sure it has been installedautopep8And it can be called normally in the current development environment.

local test

Before submitting code, it is strongly recommended to conduct some local testing.The most straightforward and efficient test cases can be found inZhouyi Model ZooSampling in and modifying for multiplexing. If you modify an existing function or add a new function, remember to enable the corresponding function in the configuration file of the corresponding test case. If you modify or add an operator, remember to include the corresponding operator in the sampled or constructed test case.

#Zhouyi #Compass #Optimizer #Homepage #Documentation #Download #Compiler #News Fast Delivery

Leave a Comment

Your email address will not be published. Required fields are marked *