PARSeq Model #2089

sineeli · 2025-02-10T22:36:45Z

No description provided.

keras_hub/src/models/parseq/parseq_tokenizer.py

abheesht17 · 2025-02-20T16:01:27Z

@sineeli - which parts of the PR are ready for review? Asking because it's still marked as draft

sineeli · 2025-02-20T18:52:00Z

Sure @abheesht17

First preprocessing and tokenizer these parts I think are good for reviewing, as they are the primary steps.

keras_hub/src/models/parseq/parseq_tokenizer.py
keras_hub/src/models/text_recognition_preprocessor.py

abheesht17

Thanks for the PR! Left some comments on the tokeniser. Will take a look at the text recognition preprocessor soon.

Sorry for the delay in reviewing

abheesht17 · 2025-02-25T01:41:14Z

keras_hub/src/models/parseq/parseq_tokenizer.py

+        "keras_hub.models.PARSeqTokenizer",
+    ]
+)
+class PARSeqTokenizer(tokenizer.Tokenizer):


Please add a doc-string here, with examples. Makes it easier to review when we have examples :P

Let's add unit tests as well

Yes, will add them

keras_hub/src/models/parseq/parseq_tokenizer.py

abheesht17 · 2025-02-25T02:24:03Z

keras_hub/src/models/parseq/parseq_tokenizer.py

+        self.char_to_id = tf.lookup.StaticHashTable(
+            initializer=tf.lookup.KeyValueTensorInitializer(
+                keys=list(self._stoi.keys()),
+                values=list(self._stoi.values()),
+                key_dtype=tf.string,
+                value_dtype=tf.int32,
+            ),
+            default_value=0,
+        )
+        self.id_to_char = tf.lookup.StaticHashTable(
+            initializer=tf.lookup.KeyValueTensorInitializer(
+                keys=list(self._stoi.values()),
+                values=list(self._stoi.keys()),
+                key_dtype=tf.int32,
+                value_dtype=tf.string,
+            ),
+            default_value=self.pad_token,
+        )


The defaults don't match. EOS is the 0th token, and pad is the len(vocabulary) - 1th token

I recognized the same in the original code, but seems they are using EOS -> 0, BOS->len(vocabulary), but while padding they are doing BOS first and then EOS at the end.

abheesht17 · 2025-02-25T02:24:23Z

keras_hub/src/models/parseq/parseq_tokenizer.py

+            ),
+            default_value=0,
+        )
+        self.id_to_char = tf.lookup.StaticHashTable(


Do we need this? We aren't using it anywhere

But in case if user wants to bulk change the token ids to characters it will be helpful

keras_hub/src/models/parseq/parseq_tokenizer.py

abheesht17 · 2025-02-25T02:29:14Z

keras_hub/src/models/parseq/parseq_tokenizer.py

+            label = tf.strings.upper(label)
+
+        label = tf.strings.regex_replace(label, self.unsupported_regex, "")
+        label = tf.strings.substr(label, 0, self.max_label_length)


Why are we truncating the input to 25 characters?

While preparing the dataset in the preprocessing itself if the label is above 25 they jus ignore that datapoint itself. Instead I truncated and we can start and end tokens instead.

Ref: https://github.com/baudm/parseq/blob/1902db043c029a7e03a3818c616c06600af574be/strhub/data/dataset.py#L112

keras_hub/src/models/parseq/parseq_tokenizer.py

…e step

sineeli added 13 commits January 31, 2025 11:11

Base for parseq model

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

528d3a4

make it vit compatiable with diff height and width sizes

3bf11cd

correct vit conv scripts

a8fb177

make class token optional in backbone by default its included

6f4363a

add flags to adjust vit network

d1cece0

add test case for without class_token

92b2745

Merge branch 'master' into parseq

ed00b73

decoder file

25f661c

parseq tokenizer base

f97fab1

add api for parseq tokenizer

d424210

Add missing arg max_label_length.

3f3ad0d

nit

bb4457e

Merge branch 'master' into parseq

Loading
Loading status checks…

68829f8

sineeli commented Feb 10, 2025

View reviewed changes

keras_hub/src/models/parseq/parseq_tokenizer.py Show resolved Hide resolved

sineeli added 5 commits February 11, 2025 15:28

add missing normalization step using tf_text

Loading
Loading status checks…

1bde466

add missing config for preprocessor

Loading
Loading status checks…

e6c5379

add default start, pad and end tokens

5b08c93

nit

Loading
Loading status checks…

49260ef

correct special token order

Loading
Loading status checks…

b4150ed

abheesht17 self-assigned this Feb 18, 2025

divyashreepathihalli requested a review from abheesht17 February 18, 2025 17:20

sineeli added 3 commits February 18, 2025 10:33

return padding mask as well

Loading
Loading status checks…

ed8b9d7

use proper keras ops

Loading
Loading status checks…

4e4511c

nit

Loading
Loading status checks…

9222331

abheesht17 requested changes Feb 25, 2025

View reviewed changes

sineeli added 3 commits March 3, 2025 11:42

add decoder for parseq

Loading
Loading status checks…

78a07a0

Build unbuilt layers for model validation

Loading
Loading status checks…

decc12c

fix forward pass and decoder

Loading
Loading status checks…

7aa2b67

sineeli added 22 commits March 25, 2025 11:35

add missing mlp forward pass

Loading
Loading status checks…

82be527

add generate prprocess and generate step

Loading
Loading status checks…

c0bf528

Merge remote-tracking branch 'origin/master' into parseq

3a862bb

nit

Loading
Loading status checks…

b6991be

add generate_step to parseq causal lm

Loading
Loading status checks…

40df2ea

minor fixes for jax backend and config fix

Loading
Loading status checks…

9ce7c62

update decoder layer with caching mechanism which is used for generat…

Loading
Loading status checks…

b1cb2ca

…e step

modify generate step including cache

Loading
Loading status checks…

3cd87cd

re structure code to make jax backend compatiable

Loading
Loading status checks…

57a5054

add postprocess step into preprocessor

Loading
Loading status checks…

3adad55

test only forward pass

Loading
Loading status checks…

b7be4dd

nit

Loading
Loading status checks…

103ee5c

test build cache

Loading
Loading status checks…

c9487ae

test generate step only build cache

Loading
Loading status checks…

d0b3906

correct class name

Loading
Loading status checks…

9dfecc1

correct dropout

Loading
Loading status checks…

a7619c6

remove slicing in forward pass

Loading
Loading status checks…

4cb3c65

nit

Loading
Loading status checks…

dd4f8aa

use python style slicing

Loading
Loading status checks…

c473f6d

support jax for generate step

Loading
Loading status checks…

456ba1d

Merge branch 'master' into parseq

Loading
Loading status checks…

78f319a

Merge branch 'master' into parseq

Loading
Loading status checks…

ac30b4b

sachinprasadhs added the WIP label Apr 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PARSeq Model #2089

PARSeq Model #2089

sineeli commented Feb 10, 2025

abheesht17 commented Feb 20, 2025

sineeli commented Feb 20, 2025 •

edited

Loading

abheesht17 left a comment •

edited

Loading

abheesht17 Feb 25, 2025

abheesht17 Feb 25, 2025

sineeli Feb 27, 2025

abheesht17 Feb 25, 2025

sineeli Feb 27, 2025

abheesht17 Feb 25, 2025

sineeli Feb 27, 2025

abheesht17 Feb 25, 2025

sineeli Feb 27, 2025

PARSeq Model #2089

Are you sure you want to change the base?

PARSeq Model #2089

Conversation

sineeli commented Feb 10, 2025

abheesht17 commented Feb 20, 2025

sineeli commented Feb 20, 2025 • edited Loading

abheesht17 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sineeli commented Feb 20, 2025 •

edited

Loading

abheesht17 left a comment •

edited

Loading