Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
ed97271
ImageText detector preprocessor for Differential Binarization model
mehtamansi29 Feb 12, 2025
d97f362
db_utils functions and testfile
mehtamansi29 Mar 11, 2025
de3aaae
Diffbin utils function and test file
mehtamansi29 Mar 11, 2025
9a3cf2a
diffbin utils function and testfile
mehtamansi29 Apr 8, 2025
93ad1ba
diffbin preprocessing function
mehtamansi29 May 12, 2025
7268535
diffbin postprocessing function
mehtamansi29 May 14, 2025
f1c3734
diffbin postprocessing function_1
mehtamansi29 May 14, 2025
d3c74c9
diffbin postprocessing function_2
mehtamansi29 May 14, 2025
aafef9e
diffbin postprocessing function_3
mehtamansi29 May 14, 2025
352a089
Merge branch 'keras-team:master' into diffbin
mehtamansi29 May 20, 2025
d94a2e6
diffbin preocessing and db_utils completed
mehtamansi29 May 20, 2025
0028b90
Merge branch 'keras-team:master' into diffbin
mehtamansi29 May 26, 2025
d4724d9
diffbin_backbone model creation and backboone test for diffbin segmen…
mehtamansi29 May 26, 2025
3c75f47
Merge branch 'keras-team:master' into diffbin
mehtamansi29 Jun 2, 2025
d41dc34
modifited diffbin _textdetector
mehtamansi29 Jun 2, 2025
4b602c4
Updates image_text_detector preprocessor
mehtamansi29 Jun 3, 2025
ee2dced
Updates image_text_detector preprocessor with ignores argument
mehtamansi29 Jun 3, 2025
736b0c9
Updates image_text_detector preprocessor,db_utils and formatting with…
mehtamansi29 Jun 4, 2025
fcfed6a
Updates image_text_detector_1
mehtamansi29 Jun 4, 2025
98e2fbc
Updates image_text_detector_1
mehtamansi29 Jun 4, 2025
5fcaefc
Updates image_text_detector_3
mehtamansi29 Jun 4, 2025
19c4e79
Updates image_text_detector_3
mehtamansi29 Jun 4, 2025
a5516dc
Updates image_text_detector_4
mehtamansi29 Jun 4, 2025
b46db73
Updates image_text_detector_5
mehtamansi29 Jun 4, 2025
8c42e56
Updates image_text_detector_6
mehtamansi29 Jun 4, 2025
6b528a2
Updates image_text_detector_7
mehtamansi29 Jun 4, 2025
df67b6c
annotation size
mehtamansi29 Jun 5, 2025
34cc866
Merge branch 'keras-team:master' into diffbin
mehtamansi29 Jun 5, 2025
876f1af
fill poly keras chages
mehtamansi29 Jun 5, 2025
9a4a3d6
fill poly keras changes revert
mehtamansi29 Jun 5, 2025
4bbbbb8
diffbin_imagetextdetector import changes
mehtamansi29 Jun 5, 2025
5acaaca
diffbin_imagetextdetector changes
mehtamansi29 Jun 5, 2025
9b7d7c4
diffbin_imagetextdetector and precommit changes
mehtamansi29 Jun 6, 2025
38eab50
diffbin_imagetextdetector and precommit changes
mehtamansi29 Jun 6, 2025
9865bc0
diffbin_textdetector_1
mehtamansi29 Jun 9, 2025
e57d280
diffbin_textdetector_2
mehtamansi29 Jun 9, 2025
1e85236
diffbin_textdetector_3
mehtamansi29 Jun 10, 2025
5007488
diffbin_textdetector_4
mehtamansi29 Jun 10, 2025
55dd899
Merge branch 'keras-team:master' into diffbin
mehtamansi29 Jun 16, 2025
39ae6c3
diffbin_backbon_image_shape
mehtamansi29 Jun 16, 2025
dafbaac
diffbin_backbon_image_preprocessor
mehtamansi29 Jun 30, 2025
bacee3a
diffbin textdetector update
mehtamansi29 Jul 1, 2025
0bf423e
diffbin textdetector update_1
mehtamansi29 Jul 1, 2025
623f6c6
new loss function added
mehtamansi29 Jul 2, 2025
c103746
loss updated in diffbin_textdetector
mehtamansi29 Jul 2, 2025
a5118ae
loss updated in diffbin_textdetector_1
mehtamansi29 Jul 2, 2025
3db380f
loss updated in diffbin_textdetector_2
mehtamansi29 Jul 2, 2025
7520308
diffbi_text_detector_update_1
mehtamansi29 Jul 2, 2025
5b7e11a
Update DB loss function
mehtamansi29 Jul 2, 2025
885c8e2
Update DB loss function_1
mehtamansi29 Jul 2, 2025
be8e5f3
Update DB loss function_2
mehtamansi29 Jul 2, 2025
f77a4f0
Update DB loss function_3
mehtamansi29 Jul 2, 2025
20d9ed3
Update DB loss function_4
mehtamansi29 Jul 2, 2025
872fe4d
Update DB loss function_5
mehtamansi29 Jul 2, 2025
e30d41e
Update DB loss function_6
mehtamansi29 Jul 2, 2025
5e52568
Update DB loss function_7
mehtamansi29 Jul 2, 2025
0a797a9
Update DB loss function_8
mehtamansi29 Jul 2, 2025
9e01b53
Update DB loss function_9
mehtamansi29 Jul 2, 2025
c64af71
Update DB loss function_8
mehtamansi29 Jul 8, 2025
78ce606
Update diffbin loss function and test file for loss function_1
mehtamansi29 Jul 14, 2025
ff56956
Update diffbin loss function_1
mehtamansi29 Jul 14, 2025
22dc3bf
Update diffbin loss function_2
mehtamansi29 Jul 14, 2025
6c666a2
Update diffbin loss function_3
mehtamansi29 Jul 14, 2025
6a823d3
Update diffbin loss function_4
mehtamansi29 Jul 14, 2025
80b3c56
Gemini Suggested changes
mehtamansi29 Jul 21, 2025
020d629
Resolved conflict
mehtamansi29 Jul 22, 2025
bb76db9
Fix PaliGemmaCausalLM example. (#2302)
hertschuh Jun 17, 2025
e53aeb0
Routine HF sync (#2303)
divyashreepathihalli Jun 17, 2025
025371f
incorrect condition on self.sliding_window_size (#2289)
laxmareddyp Jun 18, 2025
d22c615
Bump the python group with 2 updates (#2282)
dependabot[bot] Jun 18, 2025
2b21c6c
Modify TransformerEncoder masking documentation (#2297)
sonali-kumari1 Jun 18, 2025
94b40e5
Fix Gemma3InterleaveEmbeddings JAX inference error by ensuring indice…
pctablet505 Jun 19, 2025
b3d18cc
update preset versions (#2307)
laxmareddyp Jun 23, 2025
4c5bcfb
Fix Mistral conversion script (#2306)
laxmareddyp Jun 23, 2025
e39b128
Bump the python group with 6 updates (#2317)
dependabot[bot] Jul 10, 2025
c3deb47
Qwen3 causal lm (#2311)
kanpuriyanawab Jul 10, 2025
c91ca35
Update JAX GPU version (#2319)
sachinprasadhs Jul 10, 2025
c99e86d
support flash-attn at torch backend (#2257)
pass-lin Jul 11, 2025
02f3561
Add HGNetV2 to KerasHub (#2293)
harshaljanjani Jul 11, 2025
98df372
Qwen3 presets register (#2325)
laxmareddyp Jul 11, 2025
3eeba26
diffbin_imagetextdetector and precommit changes
mehtamansi29 Jun 6, 2025
445f537
Update diffbin loss function and test file for loss function_1
mehtamansi29 Jul 14, 2025
afd6251
Resolved conflict
mehtamansi29 Jul 22, 2025
848abd0
Revert "Gemini Suggested changes"
mehtamansi29 Jul 22, 2025
5820ccc
Resolving Conflicts_1
mehtamansi29 Jul 22, 2025
5546ac8
resolving conflict___1
mehtamansi29 Jul 23, 2025
f099d84
resolving conflict___2
mehtamansi29 Jul 23, 2025
9593170
resolving conflict___3
mehtamansi29 Jul 23, 2025
fbb0eed
Merge remote-tracking branch 'upstream/master' into diffbin
mehtamansi29 Jul 25, 2025
8d9baf4
Merge branch 'keras-team:master' into diffbin
mehtamansi29 Jul 28, 2025
56e4e50
Resolve backend failing testcases
mehtamansi29 Jul 29, 2025
46778b4
Merge branch 'keras-team:master' into diffbin
mehtamansi29 Aug 11, 2025
25177e1
Merge branch 'keras-team:master' into diffbin
mehtamansi29 Sep 11, 2025
30f9d2d
Resolve failing testcases
mehtamansi29 Sep 11, 2025
5bb119e
Merge remote-tracking branch 'upstream/master' into diffbin
mehtamansi29 Sep 11, 2025
e85e4ae
Merge branch 'diffbin' of https://github.com/mehtamansi29/keras-hub i…
mehtamansi29 Sep 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions keras_hub/api/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,12 @@
from keras_hub.src.models.densenet.densenet_image_classifier_preprocessor import (
DenseNetImageClassifierPreprocessor as DenseNetImageClassifierPreprocessor,
)
from keras_hub.src.models.diffbin.diffbin_backbone import (
DiffBinBackbone as DiffBinBackbone,
)
from keras_hub.src.models.diffbin.diffbin_textdetector import (
DiffBinTextDetector as DiffBinImageTextDetector,
)
from keras_hub.src.models.dinov2.dinov2_backbone import (
DINOV2Backbone as DINOV2Backbone,
)
Expand Down
Empty file.
344 changes: 344 additions & 0 deletions keras_hub/src/models/diffbin/diffbin_backbone.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,344 @@
import keras
from keras import layers

from keras_hub.src.api_export import keras_hub_export
from keras_hub.src.models.backbone import Backbone


@keras_hub_export("keras_hub.models.DiffBinBackbone")
class DiffBinBackbone(Backbone):
"""Differentiable Binarization architecture for scene text detection.

This class implements the Differentiable Binarization architecture for
detecting text in natural images, described in
[Real-time Scene Text Detection with Differentiable Binarization](
https://arxiv.org/abs/1911.08947).

The backbone architecture in this class contains the feature pyramid
network and model heads.

Args:
image_encoder: A `keras_hub.models.ResNetBackbone` instance.
fpn_channels: int. The number of channels to output by the feature
pyramid network. Defaults to 256.
head_kernel_list: list of ints. The kernel sizes of probability map and
threshold map heads. Defaults to [3, 2, 2].
dtype: `None` or str or `keras.mixed_precision.DTypePolicy`. The dtype
to use for the model's computations and weights.
"""

def __init__(
self,
image_encoder,
fpn_channels=256,
head_kernel_list=None,
image_shape=(None, None, 3),
dtype=None,
**kwargs,
):
if head_kernel_list is None:
head_kernel_list = [3, 2, 2]
if image_shape is None or None in image_shape:
image_shape = (640, 640, 3)

if not isinstance(image_encoder, keras.Model):
raise ValueError(
"Argument image_encoder must be a keras.Model instance, "
"Received instead "
f"{image_encoder} of type {type(image_encoder)}."
)

enc_input_shape = None
if getattr(image_encoder, "inputs", None):
try:
raw_shape = image_encoder.inputs[0].shape.as_list()[1:]
except Exception:
raw_shape = getattr(image_encoder, "input_shape", None)

if raw_shape:
cleaned = tuple(d for d in raw_shape if d is not None)
if cleaned:
enc_input_shape = cleaned

if enc_input_shape is None:
enc_input_shape = (*image_shape[:2], 3)

image_data_format = keras.config.image_data_format()

if image_data_format == "channels_first":
inputs = keras.layers.Input(
shape=(3, *enc_input_shape[:2]), # (C, H, W)
name="inputs",
)
else:
inputs = keras.layers.Input(
shape=(*enc_input_shape[:2], 3), # (H, W, C)
name="inputs",
)
fpn_model = keras.Model(
inputs=image_encoder.inputs,
outputs=image_encoder.pyramid_outputs,
dtype=dtype,
)
try:
encoder_input_shape = image_encoder.inputs[0].shape.as_list()[1:]
except Exception:
encoder_input_shape = getattr(image_encoder, "input_shape", None)

encoder_channels_last = False
if encoder_input_shape:
encoder_channels_last = encoder_input_shape[-1] == 3

current_channels_last = image_data_format == "channels_last"

if encoder_input_shape and (
encoder_channels_last != current_channels_last
):
if current_channels_last:
preproc = layers.Permute(
(3, 1, 2), name="permute_to_channels_first"
)(inputs)
else:
preproc = layers.Permute(
(2, 3, 1), name="permute_to_channels_last"
)(inputs)

raw_fpn_output = fpn_model(preproc)

converted_fpn_output = {}
for k, v in raw_fpn_output.items():
if encoder_channels_last and not current_channels_last:
converted_fpn_output[k] = layers.Permute(
(3, 1, 2), name=f"permute_{k}_to_channels_first"
)(v)
elif not encoder_channels_last and current_channels_last:
converted_fpn_output[k] = layers.Permute(
(2, 3, 1), name=f"permute_{k}_to_channels_last"
)(v)
else:
converted_fpn_output[k] = v
fpn_output = converted_fpn_output
else:
fpn_output = fpn_model(inputs)
x = diffbin_fpn_model(
fpn_output, out_channels=fpn_channels, dtype=dtype
)

probability_maps = diffbin_head(
x,
in_channels=fpn_channels,
kernel_list=head_kernel_list,
name="head_prob",
dtype=dtype,
)
threshold_maps = diffbin_head(
x,
in_channels=fpn_channels,
kernel_list=head_kernel_list,
name="head_thresh",
dtype=dtype,
)

outputs = {
"probability_maps": probability_maps,
"threshold_maps": threshold_maps,
}

super().__init__(inputs=inputs, outputs=outputs, dtype=dtype, **kwargs)

# === Config ===
self.image_encoder = image_encoder
self.fpn_channels = fpn_channels
self.head_kernel_list = head_kernel_list
self.image_shape = image_shape

def get_config(self):
config = super().get_config()
config.update(
{
"fpn_channels": self.fpn_channels,
"head_kernel_list": self.head_kernel_list,
# Use keras.saving.serialize_keras_object for custom Models
"image_encoder": keras.saving.serialize_keras_object(
self.image_encoder
),
}
)
return config

@classmethod
def from_config(cls, config):
config = config.copy()
resnet_config = config.pop("image_encoder")
image_encoder = keras.saving.deserialize_keras_object(resnet_config)
config["image_encoder"] = image_encoder
return cls(**config)


def diffbin_fpn_model(inputs, out_channels, dtype=None):
# lateral layers composing the FPN's bottom-up pathway using
# pointwise convolutions of ResNet's pyramid outputs
image_data_format = keras.config.image_data_format()
channel_axis = -1 if image_data_format == "channels_last" else 1
p2, p3, p4 = inputs["P2"], inputs["P3"], inputs["P4"]

lateral_p2 = layers.Conv2D(
out_channels,
kernel_size=1,
use_bias=False,
name="neck_lateral_p2",
dtype=dtype,
data_format=image_data_format,
)(p2)
lateral_p3 = layers.Conv2D(
out_channels,
kernel_size=1,
use_bias=False,
name="neck_lateral_p3",
dtype=dtype,
data_format=image_data_format,
)(p3)
lateral_p4 = layers.Conv2D(
out_channels,
kernel_size=1,
use_bias=False,
name="neck_lateral_p4",
dtype=dtype,
data_format=image_data_format,
)(p4)

# top-down fusion
topdown_p4 = lateral_p4
topdown_p3 = layers.Add()(
[
resize_like(topdown_p4, lateral_p3, image_data_format, dtype),
lateral_p3,
]
)
topdown_p2 = layers.Add(name="neck_topdown_p2")(
[
resize_like(topdown_p3, lateral_p2, image_data_format, dtype),
lateral_p2,
]
)

# construct merged feature maps for each pyramid level
featuremap_p4 = layers.Conv2D(
out_channels // 4,
kernel_size=3,
padding="same",
use_bias=False,
name="neck_featuremap_p4",
dtype=dtype,
data_format=image_data_format,
)(topdown_p4)
featuremap_p3 = layers.Conv2D(
out_channels // 4,
kernel_size=3,
padding="same",
use_bias=False,
name="neck_featuremap_p3",
dtype=dtype,
data_format=image_data_format,
)(topdown_p3)
featuremap_p2 = layers.Conv2D(
out_channels // 4,
kernel_size=3,
padding="same",
use_bias=False,
name="neck_featuremap_p2",
dtype=dtype,
data_format=image_data_format,
)(topdown_p2)

final_p4 = resize_like(
featuremap_p4, featuremap_p2, image_data_format, dtype
)
final_p3 = resize_like(
featuremap_p3, featuremap_p2, image_data_format, dtype
)
final_p2 = featuremap_p2
featuremap = layers.Concatenate(axis=channel_axis, dtype=dtype)(
[final_p4, final_p3, final_p2]
)
return featuremap


def diffbin_head(inputs, in_channels, kernel_list, name, dtype):
image_data_format = keras.config.image_data_format()

channel_axis = -1 if image_data_format == "channels_last" else 1

x = layers.Conv2D(
in_channels // 4,
kernel_size=kernel_list[0],
padding="same",
use_bias=False,
name=f"{name}_conv0_weights",
dtype=dtype,
data_format=image_data_format,
)(inputs)
x = layers.BatchNormalization(
beta_initializer=keras.initializers.Constant(1e-4),
gamma_initializer=keras.initializers.Constant(1.0),
name=f"{name}_conv0_bn",
dtype=dtype,
axis=channel_axis,
)(x)
x = layers.ReLU(name=f"{name}_conv0_relu")(x)
x = layers.Conv2DTranspose(
in_channels // 4,
kernel_size=kernel_list[1],
strides=2,
padding="same",
bias_initializer=keras.initializers.RandomUniform(
minval=-1.0 / (in_channels // 4 * 1.0) ** 0.5,
maxval=1.0 / (in_channels // 4 * 1.0) ** 0.5,
),
name=f"{name}_conv1_weights",
dtype=dtype,
data_format=image_data_format,
)(x)
x = layers.BatchNormalization(
beta_initializer=keras.initializers.Constant(1e-4),
gamma_initializer=keras.initializers.Constant(1.0),
name=f"{name}_conv1_bn",
dtype=dtype,
axis=channel_axis,
)(x)
x = layers.ReLU(name=f"{name}_conv1_relu")(x)
x = layers.Conv2DTranspose(
1,
kernel_size=kernel_list[2],
strides=2,
padding="same",
activation="sigmoid",
bias_initializer=keras.initializers.RandomUniform(
minval=-1.0 / (in_channels // 4 * 1.0) ** 0.5,
maxval=1.0 / (in_channels // 4 * 1.0) ** 0.5,
),
name=f"{name}_conv2_weights",
dtype=dtype,
data_format=image_data_format,
)(x)
if keras.config.image_data_format() == "channels_first":
x = layers.Permute((2, 3, 1), name=f"{name}_permute_output")(x)
return x


def resize_like(x, target, data_format, dtype=None):
# Prefer static shape if available
th, tw = (
target.shape[1:3]
if data_format == "channels_last"
else target.shape[2:4]
)
if th is None or tw is None:
# Fallback to dynamic symbolic shape
if data_format == "channels_last":
th, tw = keras.ops.shape(target)[1], keras.ops.shape(target)[2]
else:
th, tw = keras.ops.shape(target)[2], keras.ops.shape(target)[3]
return layers.Resizing(
th, tw, interpolation="nearest", data_format=data_format, dtype=dtype
)(x)
Loading
Loading