Releases: microsoft/Olive
Releases · microsoft/Olive
Olive-ai 0.3.3
Quick fix for v0.3.2
- Vitis AI quantization support ORT 1.16.1
- Add optional attention mask for text-generation task
Olive-ai 0.3.2
Examples
The following examples are added
- DirectML SDXL refiner #487
- Open Llama arc #582
- Enable Intel® Neural Compressor 4-bits weight-only quantization #614
- Add NCHW GroupNorm fusion to DirectML's SD examples #617
Passes (optimization techniques)
- QLoRA pass for torch model fine-tuning
- Intel® Neural Compressor 4-bits weight-only quantization
- OnnxModelOptimizer
- inserts a
Cast
operation for cases whereArgMax
input isn't supported on the device - Fuse consecutive Reshape operations when the latter results in flattening
- inserts a
Engine
- Summarize pass run history in table(install tabulate for better preview)
- Support to tune and evaluate models across different execution providers which are managed by Olive-ai.
Model
- Add model_loading_args, load_model and load_model_config to HFConfig.
- Add adapter_path to PyTorchModel
- Introduce model_attributes which can be used to simplify user's input for transformer_optimization
- Add AML curated model support
Dataset
- Auto-insertion of the input model (if it's a pytorch model with hf_config.dataset) data config in pass configs is removed. Use “input_model_data_config” if user want to use the input model's data config.
- Support a second type of dataset for
text-generation
tasks calledpair
- Support convert
olive dataset
to huggingfacedatasets.Dataset
Known Issues
- #571 Whisper gpu does not consume gpu resources
- #573 Distinguish pass instance with name not cls name
Dependencies:
- Support onnxruntime 1.16.1
- Drop python 3.7. Now you should ensure python >=3.8 to run Olive-ai optimization.
Olive-ai 0.3.1
Examples
The following examples are added
- Red Pajama Optimization with Optimum
- Stable Diffusion XL Optimization with DirectML
- GPT-J Optimization Using Intel® Neural Compressor
- BERT example using Intel Neural Compressor SmoothQuant
- Whisper example using Intel Neural Compressor
- Open LLaMA workflow example
Passes (optimization techniques)
- Introduce TorchTRTConversion
- Introduce SparseGPT pass for one-shot model pruning on large GPT like models using the algorithm proposed in https://arxiv.org/abs/2301.00774.
Systems
- Add AzureML sku support for AMLSystem
Evaluator
- Add metric_func config to custom metric. Olive will run the inference for custom eval func for user. User doesn't need to do inference by themselves.
- Add RawDataContainer:
SNPE evaluation and quantization now accept generic dataloaders such as torch dataloader
Metrics
- Add Perplexity metric for text-generation task
Engine
- Provide the interface to let user set the multi pass flows to run in save olive workflow
Olive-ai 0.2.1
Examples
The following examples are added
General
- Enable hardware accelerator for Olive. It introduced new config
accelerators
insystems
, for example,CPU
,GPU
etc. andexecution_providers
inengine
, for exampleCPUExecutionProvider
,CUDAExecutionProvider
etc.
Evaluator
- Support for evaluating distributed ONNX models
Metrics
- Extend metrics'
sub_type
to accept list input to gather the results in one evaluation job if possible, and addsub_type_for_rank
to sort/search strategy and etc.
Olive-ai 0.2.0
Examples
The following examples are added
- ResNet Optimization with Vitis-AI Quantization for CPU
- SqueezeNet Optimization with DirectML for GPU
- Stable Diffusion Optimization with DirectML for GPU
- MobileNet Optimization with QDQ Quantization for Qualcomm NPU
- Whisper Optimization for CPU
- BERT Optimization with Intel® Neural Compressor PTQ for CPU
General
- Simplify data load experience by adding transformers data config support. For transformer models, user can use hf_config.dataset to leverage the online huggingface datasets.
- Ease the process of setting up environment: user can run olive.workflows.run --config config.json --setup to install necessary packages required by passes.
Passes (optimization techniques)
- Integrate Intel® Neural Compressor into Olive: introduce new passes IncStaticQuantization, IncDynamicQuantization, and IncQuantization.
- Integrate Vitis-AI into Olive: intriduce new pass VitisAIQuantization.
- Introduce OnnxFloatToFloat16: converts a model to float16. It is based on onnxconverter-common.convert_float_to_float16.
- Introduce OrtMixedPrecision: converts model to mixed precision to retain a certain level of accuracy.
- introduce AppendPrePostProcessingOps: adds Pre/Post nodes to the input model.
- introduce InsertBeamSearch: chains two model components (for example, encoder and decoder) together by inserting beam search op in between them.
- Support external data for all ONNX passes.
- Enable transformer optimization fusion options in workflow file.
- Expose extra_options in ONNX quantization passes.
Models
- Introduce DistributedOnnxModel to support distributed inferencing
- Introduce CompositeOnnxModel to represent models with encoder and decoder subcomponents as individual OnnxModels.
- Add io_config to PytorchModel, including input_names, input_shapes, output_names and dynamic_axes
- Add MLFlow model loader
Systems
- Introduce PythonEnvironmentSystem: a python environment on the host machine. This system allows user to evaluate models using onnxruntime or pacakges installed in a different python environment.
Evaluator
- Remove target from the evaluator config.
- Introduce dummy dataloader for latency evaluation.
Metrics
- Introduce priority_rank: User needs to specify "priority_rank": rank_num for the metrics if you have multiple metrics. Olive will use the priority_ranks of the metrics to determine the best model.
Engine
- Introduce Olive Footprint: generate report json files, including footprints.json and Pareto frontier footprints, and dump frontier to html/image.
- Introduce Packaing Olive artifacts: pakcages CandidateModels, SampleCode and ONNXRuntimePackages in the output_dir folder if it is configured from Engine Configuration.
- Introduce log_severity_level.
Olive-ai 0.1.0
This is the pre-release of the next version of Olive as a hardware-aware model optimization solution. It mainly includes:
- A unified optimization framework based on modular design. Details
- More optimizations integrated including ONNX Runtime transformer optimization, ONNX post training quantization with accuracy tuning, PyTorch quantization aware training, OpenVINO toolkit and SNPE toolkit. Details
- Easy-of-use interface for contributors to plug in new optimization innovations. Details
- Flexible model evaluation through both local devices and Azure Machine Learning. Details
olive1
To archive the old version of Olive