-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Adds guide for writing custom quantization compatible layers #2195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds guide for writing custom quantization compatible layers #2195
Conversation
Summary of ChangesHello @JyotinderSingh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new, detailed guide to the Keras documentation, demonstrating how to build custom layers that seamlessly integrate with Keras's post-training quantization (PTQ) framework. It outlines the necessary API hooks and serialization mechanisms, empowering developers to create efficient, quantized custom layers for INT8 and INT4 precision. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new guide on writing quantization-compatible layers in Keras. The guide is provided in three formats: a Jupyter notebook, a markdown file, and a Python script. The guide is well-structured and covers the main aspects of making a custom layer compatible with post-training quantization.
I've found a few issues in the code snippets and explanations that should be addressed to improve correctness and clarity for the reader. My main findings are:
- A bug in an intermediate code snippet for the
quantize
method that would cause a runtime error. - A misleading statement in the "Practical tips" section.
- A missing explanation on how the quantized call method (
_int8_call
) is invoked by the framework.
I've added detailed comments with suggestions for each of these issues in all three file formats to ensure consistency.
|
||
1. The INT8 variables should be `trainable=False` since PTQ does not involve | ||
further training. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you make them trainable and do model.fit()
do you get an error?
Doing trainable=False
is a bit of a double edge sword. If you do train, what will happen is that it will only train non-quantized layers, which could be what you want, but maybe not. Maybe an error is better so that you know you can't train.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning an error might not be the right thing to do, since that might break the fine-tuning (for non-quantized layers) and QLoRA (for quantized layers) flow. I've modified the comment to be clearer that it's the quantized variables that should no longer be modified, and not the rest of the model.
Thanks for catching this.
- Save and load quantization metadata with the model. | ||
|
||
In this guide, we'll implement a simple layer that supports INT8 PTQ. The same | ||
patterns generalize to INT4 quantization and FP8 mixed-precision training. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later in the guide, you explain how to do INT4, and in particular you mention _int4_build
the _int4_call
.
However, you never talk about FP8 anymore, and in particular, the names _float8_build
and _float8_call
are not mentioned. I think you should at least mention those 2 method names. Initially I thought a table format might be good to show FP8 / INT8 / INT4 but I'll leave it up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FP8 is a mixed precision training technique that happens to do a quantization step in the middle. It includes more sophisticated variable history tracking and fake-quant injection which might be irrelevant to the particular audience of this guide.
9ab01fa
to
629ede3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the guide!
No description provided.