Compass Optimizer, for shortOPTis a part of Zhouyi Compass Neural Network Compiler (Python package named AIPUBuilder) tool chain, mainly responsible for theCompass Unified ParserThe intermediate representation (IR) of the converted floating-point model is further optimized (through quantization, calculation graph optimization, etc.) to be a quantized model or a mixed-precision model intermediate representation suitable for execution on the Zhouyi NPU hardware platform.For more introduction about the Compass IR specification and the Compass SDK, seehttps://aijishu.com/a/1060000000215443.
In addition to this summary description, seetutorial.pdf
detailed introduction.
main features
The main functional characteristics of OPT are as follows:
-
Supports multiple model quantization methods: tensor-by-tensor quantization, channel-by-channel quantization, asymmetric quantization, and symmetric quantization
-
Support mixed precision quantization: such as 8-bit, 16-bit mixed quantization, partial layer quantization, partial layer floating-point operation, automatic search for quantization accuracy
-
Support layer-by-layer configuration of quantitative related parameters: parameters can be configured layer by layer through the json configuration file
-
Supports a variety of commonly used quantitative calibration schemes:
-
Support tiling of common operators
-
Adapt to Zhouyi’s full series of hardware platforms, improve the quantitative realization of each operator and optimize and adjust the calculation graph structure
quick start
installation guide
you can passCompass_IntegrationTo compile an AIPUBuilder that includes OPT, please refer to the instructions for using AIPUBuilderMiniPkgInstructions inside.
Additionally, OPT can be run independently.As long as the following dependencies are met, it can be executed directlyAIPUBuilder/Optimizer/tools/optimizer_main.py
file to run OPT.
install dependencies
- Python3 >= 3.8.5
- NumPy >= 1.22.3
- NetworkX >= 2.8
- torch >= 1.11.1
- torchvision >= 0.12.0
run OPT
OPT is driven by a configuration file as input, you can run OPT as follows:
export PYTHONPATH=./:$PYTHONPATH python3 ./AIPUBuilder/Optimizer/tools/optimizer_main.py --cfg ./opt.cfg
Configuration file format
All options must be in Common
Inside the field:
-
graph
Enter the definition file path of Float IR -
bin
Enter the weight file path for Float IR -
model_name
the name of this model -
dataset
The name of the plug-in class used to read the dataset corresponding to this model (can be used separatelyoptimizer_main.py --plugin
command to view all implemented dataset plugin classes) -
calibration_data
Dataset file path for calibration quantification -
calibration_batch_size
for calibration quantificationbatch_size
-
metric
The name of the plug-in class used to calculate the performance index of this model (can be used separatelyoptimizer_main.py --plugin
command to view all implemented index plug-in classes), if you do not need to calculate the effect index, you do not need to set this item -
data
If setmetric
Then specify the verification data set file path that should be applied -
label
If setmetric
Then specify the annotation file path corresponding to the validation data set to be applied -
metric_batch_size
If setmetric
Then specify the corresponding forward operationbatch_size
-
quantize_method_for_weight
Methods for quantifying model weights, including:
-
per_tensor_symmetric_restricted_range
-
per_tensor_symmetric_full_range
-
per_channel_symmetric_restricted_range
-
per_channel_symmetric_full_range
The default is
per_tensor_symmetric_restricted_range
-
-
quantize_method_for_activation
Methods for quantifying model activation responses, including:
The default is
per_tensor_symmetric_full_range
-
weight_bits
The bit width of quantized model weight parameters, the default is 8 -
bias_bits
The bit width of the quantization model bias parameter, the default is 32 -
activation_bits
Quantize the bit width of the activation response of the model, the default is 8 -
lut_items_in_bits
Quantize the size of the lut lookup table used in the calculation of some mathematical functions (such as sigmoid, tanh, etc.) (use 2lut_items_in_bits
power), the default is 8, that is, the number of entries is 256.whenactivation_bits
When changing, you need to adjust this setting accordingly to balance performance and accuracy (the larger the lut table, the higher the accuracy, but the more resources it consumes) -
output_dir
Directory for outputting IR and other files
More configurable options and their meanings describe the executableoptimizer_main.py --field
Check.
run test
model testing
existAIPUBuilder/Optimizer/test/model_test/squeezenet
A typical model test case is given in , enter this directory and executesh ./run.sh
That is, more complete model use cases can be found inZhouyi Model Zoo.
operator test
The test of a single operator can be regarded as a special model test. After constructing the test IR (and its input data) according to the IR definition of the single operator, the process of the model test can be reused.AIPUBuilder/Optimizer/test/op_test
A typical operator test case is given in , enter this directory and executesh ./run.sh
That’s it.
OPT processing flow and design concept
The main processing flow of OPT is shown in the figure below:
- readParserGenerate the Float IR, and construct an internal unified graph representation g.
- Perform a forward operation on g with all zero inputs to check correctness.
- Computational graph optimization before one round of quantization for g.
- Perform forward calculation based on the given calibration data set, and count various statistics of each tensor in the graph.
- Perform a round of quantization-related computational graph optimization on g.
- Perform quantization transformation on the corresponding layer according to the given configuration to generate a new graph representation qg.
- Computational graph optimization after quantization of qg.
- Perform an all-zero input forward operation on qg to check correctness.
- Outputs this optimized quantized or mixed precision IR.
- Finally, decide whether to export the intermediate tensor according to the configuration, and whether to calculate the model performance indicator based on the given verification data set.
OPT as a whole adopts a mechanism that separates the scheduling framework from the specific implementation. The implementation of each operator, as well as the input data provision (dataset) and output data processing (metric) of the model are integrated into the overall process in the form of plugins.OptMaster
Class scheduling, which is convenient for users to develop again, expand support for new operators or update the implementation of existing operators.
development guidelines
core data structure
The following is an overview of the OPT core data structure:
Dtype
Defines various basic data types that may appear in IRPyTensor
It is the basic class for expressing tensor in OPT, and its actual data storage and calculation are carried out through torch.TensorPyNode
Represent the concept of layer nodes (layer) in the model, and the connection relationship between layers is reflected by shared edges (that is, PyTensor instances stored in inputs and outputs)PyGraph
Represents the entire model structure, which stores all layer nodes, and its topology is maintained through the internal networkx.DiGraph instance.QuantizeGraph
asPyGraph
subclasses ofOptMaster
The class actually uses theOptMaster
The class controls the entire execution process of OPT, and dynamically instantiates the input data (dataset) class and output data processing (metric) class of the model according to the configuration file
Develop various plug-ins
The most common OPT development paradigm is to add more Operator, Dataset, and Metric plug-ins to support your own exclusive models, so here is a detailed introduction to the development of various plug-ins. The expansion or modification of other OPT functions (quantization method, calibration algorithm, graph optimization algorithm, etc.) is omitted here.
Naming conventions
It is recommended to use the following prefixes to identify the distinction:
- aipubt_opt_op_ for the optimizer operator plugin.
- aipubt_opt_dataset_ for the dataset plugin of the optimizer.
- aipubt_opt_metric_ for the metric plugin of the optimizer.
search path
The search order for plugin files is as follows:
- The path specified by the environment variable AIPUPLUGIN_PATH is set in a similar way as follows:
export AIPUPLUGIN_PATH=/home/user/aipubuilder_plugins/:$AIPUPLUGIN_PATH
- The plugin file directory under the current path, namely
./plugin/
.
Operator plugin writing
Operator plugin needs to implement and register two interfaces:
- use
op_register(OpType, version)
Register forward functionforward_function_name(self, *args)
. - use
quant_register(OpType, version)
Register the quantize functionquantize_function_name(self, *args)
.
Among them, OpType is the built-in operator type enumeration class. If you want to replace the implementation of a built-in operator, you can directly pass in OpType when registering.layer_type_name, and set the version to 1.0 or higher (the version number of the built-in operator is 1.0).If you want to implement a new operator, call it globally before registeringregister_optype('new_layer_type_name')
OpType.*new_layer_type_name can be used normally after the function registers the name to OpType; version indicates the version number of the plugin, that is, when there are multiple plugins with the same name and the same type, it will actually call the one with a larger version number (at the same time, it is necessary to pay attention to the forward function and quantize Functions are registered separately, so when you want to replace the implementation of an operator as a whole, you need to ensure that the implemented forward function and quantize function have a higher version number); self points to aPyNode
A class instance (corresponding to a certain layer in the IR), the use of its important members will be illustrated with the following code examples:
from AIPUBuilder.Optimizer.framework import * from AIPUBuilder.Optimizer.utils import * register_optype('DummyOP') @op_register(OpType.DummyOP, '1024') def dummy_forward(self, *args): #self.inputs and self.outputs are lists of PyTensors of this layer #PyTensor.betensor is the really backend tensor variable and is a instance of torch.Tensor inp = self.inputs[0] out = self.outputs[0] #self.constants is an ordered-dictionary for storing constant tensors, such as weights and biases #suggest to use self.get_constant to safely visit it w = self.constants['weights'] if 'weights' in self.constants else 0 #'OPT_DEBUG, OPT_INFO, OPT_WARN, OPT_ERROR, OPT_FATAL' are basic log APIs, and only OPT_FATAL will abort execution OPT_INFO('layer_type=%s, layer_name=%s' % (str(self.type), self.name)) if self.name in ['name_of_layer_x', 'name_of_layer_y'] : print('you can set a breakpoint here for debug usage') #self.attrs is an ordered-dictionary for storing the intermediate parameters, which is not writing to IR #suggest to use self.get_attrs to safely get a atrribute if self.get_attrs('layer_id') in ['2', '4', '8'] : print('you can also set breakpoint here in this way for debug usage') #self.current_batch_size indicate the current batch_size the dataloader offers dummy_var = inp.betensor + self.current_batch_size #self.quantized is flag maintained by the optimizer framework that indicates whether it's a quant_forward or normal_forward if self.quantized : #self.params is an ordered-dictionary for storing the necessary parameters #suggest to use self.get_param to safely get a parameter if self.get_param('whether_plus_one') : dummy_var += 1 else : if self.get_param('whether_minus_one') : dummy_var -= 1 out.betensor = inp.betensor if True else dummy_var #self.placeholders is a list where you can store temporary PyTensors for whatever you like if len(self.placeholders) < 1 : #you can use PyTensor(tensor_name) to construct an empty PyTensor, #or use PyTensor(tensor_name, numpy_array) to construct and initialize a PyTensor #dtype2nptype is a utility function in AIPUBuilder.Optimizer.utils and you can access many other utility functions here #Dtype defines data types NN compiler supports ph0 = Tensor(self.name+"/inner_temp_vars", (inp.betensor+1).cpu().numpy().astype(dtype2nptype(Dtype.FP32))) self.placeholders.append(ph0) else : #if the ph0 has already been put into placeholders, then we only need to update its value every time when dummy_forward is called self.placeholders[0].betensor = inp.betensor + 1 @quant_register(OpType.DummyOP, '1024') def dummy_quantize(self, *args): inp = self.inputs[0] out = self.outputs[0] #PyTensor.scale is the linear quantization scale out.scale = inp.scale #PyTensor.zerop is the linear quantization zero point out.zerop = inp.zerop #PyTensor.qbits is the quantization bit width out.qbits = inp.qbits #PyTensor.dtype is the quantization Dtype information out.dtype = inp.dtype #PyTensor.qinvariant indicates whether the tensor is quantization invariant (like index values), and if it's True, the scale = 1.0, zerop=0 out.qinvariant = inp.qinvariant #PyTensor.qmin and PyTensor.qmax are the clamp boundaries when tensor is quantized out.qmin = inp.qmin out.qmax = inp.qmax ph0 = self.placeholders[0] ph0.qinvariant = False #q_bits_weight, q_bits_bias, q_bits_activationin in self.attrs are used to carry the quantization bits information from per-layer opt_config file ph0.qbits = self.get_attrs('q_bits_activation') #q_mode_weight, q_mode_bias, q_mode_activationin in self.attrs are used to carry the quantization mode (per-tensor or per-channel, symmetric or asymmetric) information from per-layer opt_config file q_mode_activation = self.get_attrs('q_mode_activation') #get_linear_quant_params_from_tensor is a utility function in AIPUBuilder.Optimizer.utils and you can access many other utility functions here ph0.scale, ph0.zerop, ph0.qmin, ph0.qmax, ph0.dtype = get_linear_quant_params_from_tensor(ph0, q_mode_activation, ph0.qbits, is_signed = True) #you can set simple parameters to self.params which will be wrote to IR when serialize the model. self.params['whether_plus_one'] = True self.params['whether_minus_one'] = False #you can set complicated parameters like lookup tables to self.constants which will also be wrote to IR when serialize the model self.constants['lut'] = Tensor(self.name+"/lut", (torch.zeros(256)).cpu().numpy().astype(dtype2nptype(Dtype.UINT16)))
It should be added that after the optimizer initially reads the float IR, it will perform a normal forward to ensure that before the quantize function of each operator is called, its forward function must be called at least once (there is no guarantee that the forward function will be called before the forward function is called) The quantize function has been called), therefore, the attribute values ββsuch as placeholder or attrs correctly set in the forward function can be read smoothly in the quantize function, but not necessarily vice versa.For more detailed and practical examples, please refer toAIPUBuilder/Optimizer/ops
The built-in operators in the directory.
Dataset plugin writing
The Dataset plugin directly inherits fromtorch.utils.data.Dataset
class, three public interfaces need to be provided when implementing__init__, __len__ and __getitem__
see the following NumpyDatset class for specific examples:
from AIPUBuilder.Optimizer.framework import * from AIPUBuilder.Optimizer.logger import * from torch.utils.data import Dataset import numpy as np @register_plugin(PluginType.Dataset, '1.0') class NumpyDataset(Dataset): #when used as calibration dataset, label_file can be omitted. def __init__(self, data_file, label_file=None): self.data = None self.label = None try: self.data = np.load(data_file, mmap_mode='c') except Exception as e: OPT_FATAL('the data of NumpyDataset plugin should be Numpy.ndarray and allow_pickle=False.') if label_file is not None: try: self.label = np.load(label_file, mmap_mode='c') except ValueError: self.label = np.load(label_file, allow_pickle=True) def __len__(self): return len(self.data) def __getitem__(self, idx): #Assume that all preprocesses have been done before save to npy file. #If the graph has single input tensor, #the data part sample[0] will be passed to the input node as is, #if the graph has multiple input tensors, #the data part sample[0][i] should be consistent with input_tensors[i] in IR. #If the graph has multiple output tensors, #the label part sample[1][i] should be consistent with output_tensors[i] in IR. sample = [[self.data[idx]], float("-inf")] if self.label is not None: sample[1] = self.label[idx] return sample
The core of the Dataset plugin is through__len__
The interface informs OPT of the size of the corresponding data, and through__getitem__
returned by the interfacesample[0]
As the input of each forward operation of the model (the input data specification required by the user model is only fully known to the user),sample[1]
As the groundtruth label information will be provided by OPTPenetrateGive the corresponding Metric plugin (only the user is fully aware of the output and label meaning information corresponding to the user model). What needs special explanation is:
- The first parameter passed in during registration indicates the plugin category is
PluginType.Dataset
. The second parameter is the version number, that is, when there are multiple plugins of the same type with the same name, the one with a higher version number will actually be called. - When instantiating the Dataset plugin, two parameters will be passed in (specify their values ββin the relevant fields in the cfg file):
data_file
andlabel_file
. These two parameters can be either the path of the file that actually stores data or label, or the path of a plain text file that indirectly stores a series of other file information (the specific parsing process is all customized by the writer). - All data preprocessing operations can be done in advance and stored on the hard disk (this method is recommended), and only deserialized reading is performed during actual operation to speed up forward operation, or it can be performed in
__getitem__
It is performed inside the function (if the stored data is in NHWC format, but the model requires NCHW format, then after reading the data, perform the corresponding permute operation and then fill in the return; for example, the stored label index starts from 0, but the model requires from 1, after reading, do the corresponding offset operation and then fill in and return; if the stored data has not been normalized, but the model requires normalized data input, then do the corresponding normalization after reading Fill in and return after one operation). - When a model has multiple inputs or outputs,
__getitem__
returnsample
Need to be withfloat IR
The order of input and output defined in is the same, that is,sample[0]
The specified data list must be the same as in the IRinput_tensors
in the same order,sample[1]
The specified label list must be the same as in the IRoutput_tensors
The order is consistent (if the called metric plugin has other requirements for label data, follow the requirements of the metric plugin).
Metric plugin writing
Metric plugin needs to inherit fromOptBaseMetric
class, use @register_plugin(PluginType.Metric, version)register(versionIndicates the version number, when there is a plugin of the same type with the same name, the one with the higher version number will be called first), and implement__init__, __call__, reset, compute, report
interface. What each interface means and how to write it is explained with the following code examples:
from AIPUBuilder.Optimizer.framework import * from AIPUBuilder.Optimizer.logger import * import torch @register_plugin(PluginType.Metric, '1.0') class TopKMetric(OptBaseMetric): #you can pass any string parameters from cfg file, and parse it to what you really want #e.g. you can set 'metric = TopKMetric,TopKMetric(5),TopKMetric(10)' in cfg file to enable #calculate top1, top5 and top10 accuracy together def __init__(self, K='1'): self.correct = 0 self.total = 0 self.K = int(K) #will be called after every batch iteration, the pred is model's output_tensors (the same order in IR), #the target is the sample[1] generated by dataset plugin, #during quantize_forward the pred will be dequantized before calling metric def __call__(self, pred, target): _, pt = torch.topk(pred[0].reshape([pred[0].shape[0], -1]), self.K, dim=-1) #NHWC for i in range(target.numel()): if target[i] in pt[i]: self.correct += 1 self.total += target.numel() #will be called before every epoch iteration to reset the initial state def reset(self): self.correct = 0 self.total = 0 #will be called after every epoch iteration to get the final metric score def compute(self): try: acc = float(self.correct) / float(self.total) return acc except ZeroDivisionError: OPT_ERROR('zeroDivisionError: Topk acc total label = 0') return float("-inf") #will be called when outputing a string format metric report def report(self): return "top-%d accuracy is %f" % (self.K, self.compute())
What needs special explanation is:
- The Metric plugin supports passing construction parameters from the cfg file, but it is limited to the string type. When writing the plugin, you need to convert the string type parameters to the target type by yourself.
- The order of the model output results passed to the metric plugin is consistent with the order of output_tensors in the IR, and the target passed is the sample given in the Dataset plugin[1]the corresponding relationship between pred (when quant_forward will be automatically dequantized in advance) and target, as well as the logic of calculating indicators are completely controlled by the user.
code style
OPT useautopep8
to check code style.AIPUBuilder/Optimizer/scripts
There is an installation script to enable the automatic checking mechanism, please make sure it has been installedautopep8
And it can be called normally in the current development environment.
local test
Before submitting code, it is strongly recommended to conduct some local testing.The most straightforward and efficient test cases can be found inZhouyi Model ZooSampling in and modifying for multiplexing. If you modify an existing function or add a new function, remember to enable the corresponding function in the configuration file of the corresponding test case. If you modify or add an operator, remember to include the corresponding operator in the sampled or constructed test case.
#Zhouyi #Compass #Optimizer #Homepage #Documentation #Download #Compiler #News Fast Delivery