Skip to content

Conversation

JyotinderSingh
Copy link
Collaborator

No description provided.

Copy link
Contributor

Summary of Changes

Hello @JyotinderSingh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new, detailed guide to the Keras documentation, demonstrating how to build custom layers that seamlessly integrate with Keras's post-training quantization (PTQ) framework. It outlines the necessary API hooks and serialization mechanisms, empowering developers to create efficient, quantized custom layers for INT8 and INT4 precision.

Highlights

  • New Guide for Quantization-Compatible Layers: A new comprehensive guide has been added to the Keras documentation, detailing how to create custom layers that support post-training quantization (PTQ).
  • Core Hooks Implementation: The guide explains the implementation of essential methods for quantization compatibility, including quantize(), _int8_build(), and _int8_call(), using a SimpleScale layer as a practical example.
  • Serialization Support: It covers how to add serialization support for quantized layers by implementing save_own_variables() and load_own_variables(), and modifying the build() method to handle different quantization modes.
  • INT4 Quantization & Practical Tips: The guide provides insights into extending support for INT4 quantization and offers a section with practical tips for developing PTQ-friendly layers, covering aspects like build-time vs. call-time responsibilities, metadata recording, and performance hygiene.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new guide on writing quantization-compatible layers in Keras. The guide is provided in three formats: a Jupyter notebook, a markdown file, and a Python script. The guide is well-structured and covers the main aspects of making a custom layer compatible with post-training quantization.

I've found a few issues in the code snippets and explanations that should be addressed to improve correctness and clarity for the reader. My main findings are:

  • A bug in an intermediate code snippet for the quantize method that would cause a runtime error.
  • A misleading statement in the "Practical tips" section.
  • A missing explanation on how the quantized call method (_int8_call) is invoked by the framework.

I've added detailed comments with suggestions for each of these issues in all three file formats to ensure consistency.

Comment on lines 163 to 165

1. The INT8 variables should be `trainable=False` since PTQ does not involve
further training.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you make them trainable and do model.fit() do you get an error?

Doing trainable=False is a bit of a double edge sword. If you do train, what will happen is that it will only train non-quantized layers, which could be what you want, but maybe not. Maybe an error is better so that you know you can't train.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning an error might not be the right thing to do, since that might break the fine-tuning (for non-quantized layers) and QLoRA (for quantized layers) flow. I've modified the comment to be clearer that it's the quantized variables that should no longer be modified, and not the rest of the model.

Thanks for catching this.

- Save and load quantization metadata with the model.

In this guide, we'll implement a simple layer that supports INT8 PTQ. The same
patterns generalize to INT4 quantization and FP8 mixed-precision training.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later in the guide, you explain how to do INT4, and in particular you mention _int4_build the _int4_call.

However, you never talk about FP8 anymore, and in particular, the names _float8_build and _float8_call are not mentioned. I think you should at least mention those 2 method names. Initially I thought a table format might be good to show FP8 / INT8 / INT4 but I'll leave it up to you.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FP8 is a mixed precision training technique that happens to do a quantization step in the middle. It includes more sophisticated variable history tracking and fake-quant injection which might be irrelevant to the particular audience of this guide.

@JyotinderSingh JyotinderSingh force-pushed the custom-quantization-layers branch from 9ab01fa to 629ede3 Compare October 17, 2025 20:45
Copy link
Contributor

@hertschuh hertschuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the guide!

@hertschuh hertschuh merged commit 1ee6842 into keras-team:master Oct 17, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants