Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems regarding @orama/tokenizers #883

Open
fuma-nama opened this issue Feb 9, 2025 · 1 comment
Open

Problems regarding @orama/tokenizers #883

fuma-nama opened this issue Feb 9, 2025 · 1 comment

Comments

@fuma-nama
Copy link

Describe the bug

I was trying to upgrade from Orama v2 to v3, and the tokenizers module was a few problems that blocks me from upgrading.

  1. package.json points to invalid paths:
    ./build/tokenizer-mandarin/tokenizer.mjs doesn't exist, only ./build/tokenizer-mandarin/tokenizer.js exists
    ./build/tokenizer-mandarin/tokenizer.d.ts doesn't exist, only ./build/tokenizer-mandarin/tokenizer.ts exists
{
  "exports": {
    "./japanese": {
      "types": "./build/tokenizer-japanese/tokenizer.d.ts",
      "import": "./build/tokenizer-japanese/tokenizer.mjs",
      "require": "./build/tokenizer-japanese/tokenizer.js"
    },
    "./mandarin": {
      "types": "./build/tokenizer-mandarin/tokenizer.d.ts",
      "import": "./build/tokenizer-mandarin/tokenizer.mjs",
      "require": "./build/tokenizer-mandarin/tokenizer.js"
    }
  },
}
Image
  1. Tokenizer's tokenize() method returns a promise (it's an async function), but it's not treated as a promise. There's several calls without awaiting the result.

    It resulted in errors like:

file:///Users/xred/dev/orama-test/node_modules/.pnpm/@[email protected]/node_modules/@orama/orama/dist/esm/components/index.js:130
                for (const token of tokens) {
                                    ^

TypeError: tokens is not iterable
    at <anonymous> (/Users/xred/dev/orama-test/node_modules/.pnpm/@[email protected]/node_modules/@orama/orama/src/components/index.ts:239:29)
    at Object.insert (/Users/xred/dev/orama-test/node_modules/.pnpm/@[email protected]/node_modules/@orama/orama/src/components/index.ts:279:12)
    at indexAndSortDocumentSync (/Users/xred/dev/orama-test/node_modules/.pnpm/@[email protected]/node_modules/@orama/orama/src/methods/insert.ts:243:17)
    at innerInsertSync (/Users/xred/dev/orama-test/node_modules/.pnpm/@[email protected]/node_modules/@orama/orama/src/methods/insert.ts:130:3)
    at insert (/Users/xred/dev/orama-test/node_modules/.pnpm/@[email protected]/node_modules/@orama/orama/src/methods/insert.ts:37:10)
    at <anonymous> (/Users/xred/dev/orama-test/src/index.ts:13:7)

Node.js v22.13.1
 ELIFECYCLE  Command failed with exit code 1.

If you patch the package and log the tokens variable:

function insertScalarBuilder(implementation, index, prop, internalId, language, tokenizer, docsCount, options) {
    return (value) => {
        const { type, node } = index.indexes[prop];
        switch (type) {
            case 'Bool': {
                node[value ? 'true' : 'false'].add(internalId);
                break;
            }
            case 'AVL': {
                const avlRebalanceThreshold = options?.avlRebalanceThreshold ?? 1;
                node.insert(value, internalId, avlRebalanceThreshold);
                break;
            }
            case 'Radix': {
                const tokens = tokenizer.tokenize(value, language, prop, false);
                implementation.insertDocumentScoreParameters(index, prop, internalId, tokens, docsCount);
                console.log(tokens) // added this
                for (const token of tokens) {
                    implementation.insertTokenScoreParameters(index, prop, internalId, tokens, token);
                    node.insert(token, internalId);
                }
                break;
            }
            case 'Flat': {
                node.insert(value, internalId);
                break;
            }
            case 'BKD': {
                node.insert(value, [internalId]);
                break;
            }
        }
    };
}

It's indeed a promise, but there's no await in the code.

Image

To Reproduce

  1. Clone https://github.com/fuma-nama/orama-test
  2. Run pnpm dev
  3. It should give you errors

Expected behavior

No error should be thrown in my example above.

Environment Info

Node: v22.13.1
Orama: 3.0.6

Affected areas

Search

Additional context

No response

@micheleriva
Copy link
Member

On it, thanks for spotting this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants