Add Qwen3-0.6B language model support to pythainlp.lm and improve type annotations#1217
Add Qwen3-0.6B language model support to pythainlp.lm and improve type annotations#1217
Conversation
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
|
@copilot move test to tests/extra |
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: bact <128572+bact@users.noreply.github.com>
|
@copilot There are changes in dev branch to fix the errors occured in Python 3.13. Merge the changes and run the tests again. |
…ilot/add-qwen3-0-6b-model
|
@copilot apply changes based on the comments in this thread |
1. Add 'import torch' in generate() and chat() methods (pythainlp/lm/qwen3.py) - torch.inference_mode() requires torch to be imported in method scope - Previously torch was only imported in load_model() method - Both generate() (line 147) and chat() (line 235) now import torch locally 2. Change AutoTokenizer type annotation to PreTrainedTokenizerBase (pythainlp/phayathaibert/core.py:221) - AutoTokenizer.from_pretrained() returns PreTrainedTokenizerBase instance - Matches type annotations on lines 320 and 365 for consistency - More accurate type annotation (AutoTokenizer is a factory class, not instantiated directly) These changes fix import scope issues and improve type annotation accuracy. Co-authored-by: bact <128572+bact@users.noreply.github.com>
Removed AutoTokenizer import from transformers.
|
@copilot dev branch is updated. It reorganized the grouping inside test suites. Test cases that import heavy imports should be in the "noauto" suite. Please rearrange tests for Qwen into suitable test suite. |
…ilot/add-qwen3-0-6b-model
…ranch 1. Sync with dev branch (54 commits merged) - Test suite reorganization: noauto tests now grouped by dependency ecosystem - Added noauto_torch, noauto_tensorflow, noauto_onnx, noauto_cython, noauto_network suites - Added braille module and comprehensive translation improvements 2. Move Qwen3 tests to appropriate suite - Moved from tests/extra/testx_lm.py to tests/noauto_torch/testn_lm_torch.py - Changed test class suffix from TestCaseX to TestCaseN (noauto convention) - Registered in tests/noauto_torch/__init__.py test_packages list - Removed from tests/extra/__init__.py test_packages list Reason: Qwen3 uses torch and transformers (PyTorch ecosystem) which are heavy dependencies (~2-3 GB). Per the updated test suite organization, tests with heavy ML framework dependencies belong in noauto_torch suite, not extra suite. Co-authored-by: bact <128572+bact@users.noreply.github.com>
Moved Qwen3 tests from The tests now follow the PyTorch ecosystem grouping in the noauto suite, alongside other PyTorch-based tests (spell, tag, tokenize). Test class renamed from Commit: 91303d3 |
|
@copilot apply changes based on the comments in this thread |
1. Add type annotation to __all__ in pythainlp/lm/__init__.py - Changed from __all__ = [...] to __all__: list[str] = [...] - Matches codebase convention (e.g., pythainlp/chat/__init__.py) 2. Remove try/except wrapper around Qwen3 import in __init__.py - The wrapper was ineffective since qwen3.py uses lazy imports - Users would get raw ModuleNotFoundError instead of helpful message - Better to handle missing deps in the methods that actually need them 3. Add dependency checking in Qwen3 methods (qwen3.py) - load_model(): Wrap torch/transformers imports in try/except - generate(): Wrap torch import in try/except - chat(): Wrap torch import in try/except - All raise ImportError with helpful message: "Install them with: pip install 'pythainlp[qwen3]'" - Catches both ImportError and ModuleNotFoundError for robustness These changes ensure users get clear, actionable error messages when trying to use Qwen3 without installing the optional dependencies, rather than cryptic import errors. Co-authored-by: bact <128572+bact@users.noreply.github.com>
Applied all changes from the review:
Commit: 432c2ab |
|


What do these changes do
Adds support for Alibaba Cloud's Qwen3-0.6B language model to the
pythainlp.lmmodule, providing a lightweight Apache 2.0 licensed model option for Thai NLP tasks. Additionally, replaces genericAnytype annotations with precise types from the transformers library across multiple model classes for improved type safety and IDE support.What was wrong
Anytype annotations, providing no type information for static analysis__all__type annotations across modulesHow this fixes it
New Qwen3 wrapper class (
pythainlp/lm/qwen3.py):load_model(): Loads Qwen3-0.6B from HuggingFace with configurable device and dtypeImportError/ModuleNotFoundErrorfor missing torch/transformers and provides install instructionsdevice_mapparameter for device placement (consistent with WangChanGLM)generate(): Basic text generation from prompts with input validation and dependency checkingchat(): Chat-based generation with message history support, input validation, and dependency checkingOptional[PreTrainedModel]andOptional[PreTrainedTokenizerBase]Type annotation improvements across codebase:
PreTrainedModelandPreTrainedTokenizerBasefrom transformers__all__: list[str]type annotation for consistency with other modulesWangChanGLMtype instead ofAny, removed redundant cast operationsPipeline,PreTrainedTokenizerBase,AutoTokenizer,AutoModelForMaskedLM, andAutoModelForTokenClassificationTYPE_CHECKINGguards to avoid runtime overheadModule integration:
pythainlp.lmwithout try/except wrapper (dependency checking done in methods)qwen3optional dependency group (torch>=1.9.0,transformers>=4.22.1) with correct minimum versionsTests:
tests/noauto_torch/testn_lm_torch.pyfollowing updated test suite conventionstestn_prefix and class usesLMTestCaseNsuffix per noauto naming conventionstests/noauto_torch/__init__.pytest_packages listnoauto_torchsuite for PyTorch-based tests with heavy dependencies (~2-3 GB)Code quality improvements:
__all__declarationsUsage example:
Your checklist for this pull request
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.