Add Qwen3-0.6B language model support to pythainlp.lm and improve type annotations by Copilot · Pull Request #1217 · PyThaiNLP/pythainlp

Copilot · 2026-01-19T14:51:17Z

What do these changes do

Adds support for Alibaba Cloud's Qwen3-0.6B language model to the pythainlp.lm module, providing a lightweight Apache 2.0 licensed model option for Thai NLP tasks. Additionally, replaces generic Any type annotations with precise types from the transformers library across multiple model classes for improved type safety and IDE support.

What was wrong

The PyThaiNLP library lacked support for small, efficient language models suitable for resource-constrained tasks
Model and tokenizer instance variables used Any type annotations, providing no type information for static analysis
Inconsistent type annotations across different model classes (WangChanGLM, ChatBotModel, PhayaThaiBERT)
Torch import at module level created unnecessary runtime dependencies
Missing dependency checking led to unclear error messages when optional dependencies were not installed
Inconsistent __all__ type annotations across modules

How this fixes it

New Qwen3 wrapper class (pythainlp/lm/qwen3.py):

load_model(): Loads Qwen3-0.6B from HuggingFace with configurable device and dtype
- Early CUDA availability check before model loading to fail fast and save resources
- Comprehensive error handling for network errors, invalid paths, and memory issues
- Dependency checking with helpful error messages: catches ImportError/ModuleNotFoundError for missing torch/transformers and provides install instructions
- Tokenizer cleanup on model load failure to prevent inconsistent state
- Uses device_map parameter for device placement (consistent with WangChanGLM)
- Lazy torch import - only imported when methods are called, not at module level
- Informative error messages preserving original exception details
generate(): Basic text generation from prompts with input validation and dependency checking
chat(): Chat-based generation with message history support, input validation, and dependency checking
Complete type annotations using Optional[PreTrainedModel] and Optional[PreTrainedTokenizerBase]
Proper documentation for sampling parameters behavior

Type annotation improvements across codebase:

pythainlp/lm/qwen3.py: Uses PreTrainedModel and PreTrainedTokenizerBase from transformers
pythainlp/lm/init.py: Added __all__: list[str] type annotation for consistency with other modules
pythainlp/chat/core.py: Uses WangChanGLM type instead of Any, removed redundant cast operations
pythainlp/phayathaibert/core.py: Uses Pipeline, PreTrainedTokenizerBase, AutoTokenizer, AutoModelForMaskedLM, and AutoModelForTokenClassification
All transformers imports use TYPE_CHECKING guards to avoid runtime overhead

Module integration:

Clean import from pythainlp.lm without try/except wrapper (dependency checking done in methods)
Helpful error messages when dependencies missing, with install instructions
Added qwen3 optional dependency group (torch>=1.9.0, transformers>=4.22.1) with correct minimum versions
Lazy importing pattern - torch only loaded when methods are called, not at module import

Tests:

Tests added in tests/noauto_torch/testn_lm_torch.py following updated test suite conventions
Test file uses testn_ prefix and class uses LMTestCaseN suffix per noauto naming conventions
Registered in tests/noauto_torch/__init__.py test_packages list
Tests gracefully skip when optional dependencies are not installed
Comprehensive test coverage including input validation edge cases
Properly categorized in noauto_torch suite for PyTorch-based tests with heavy dependencies (~2-3 GB)
Copyright headers follow project convention (2016-2026)

Code quality improvements:

Removed redundant type checks (device parameter already typed as str)
Added documentation explaining sampling parameter behavior with greedy decoding
Preserved multi-line message content in chat fallback template
Added safety documentation for tensor operations
Complete type annotation coverage including __all__ declarations
Lazy torch imports reduce module-level dependencies
Clear dependency error messages guide users to install required packages

Usage example:

from pythainlp.lm import Qwen3
import torch

# Install with: pip install pythainlp[qwen3]

model = Qwen3()
model.load_model(device="cpu", torch_dtype=torch.bfloat16)

# Basic generation
result = model.generate("สวัสดี")
print(result)

# Chat format
messages = [{"role": "user", "content": "สวัสดีครับ"}]
response = model.chat(messages)
print(response)

Your checklist for this pull request

Passed code styles and structures
Passed code linting checks and unit test

Original prompt

This section details on the original issue you should resolve

<issue_title>Add Qwen/Qwen3-0.6B to pythainlp.lm</issue_title>
<issue_description>I found Qwen/Qwen3-0.6B. It is good enough for small tasks. I think we can add Qwen/Qwen3-0.6B to pythainlp.lm.

Not using

Model: https://huggingface.co/Qwen/Qwen3-0.6B

Apache license 2.0</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Add Qwen/Qwen3-0.6B to pythainlp.lm #1112

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>

wannaphong · 2026-01-19T15:04:43Z

@copilot move test to tests/extra

Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>

Copilot · 2026-01-19T15:09:20Z

@copilot move test to tests/extra

Done. Moved Qwen3 tests to tests/extra/testx_lm.py and reverted tests/core/test_lm.py to its original state. Commit: 2018813

pythainlp/lm/qwen3.py

pyproject.toml

Co-authored-by: bact <128572+bact@users.noreply.github.com>

bact · 2026-01-19T23:55:55Z

@copilot There are changes in dev branch to fix the errors occured in Python 3.13. Merge the changes and run the tests again.

…ilot/add-qwen3-0-6b-model

Copilot · 2026-01-20T00:00:01Z

@copilot There are changes in dev branch to fix the errors occured in Python 3.13. Merge the changes and run the tests again.

Merged dev branch with Python 3.13 fixes. All tests pass. Merge commit: 0d104dd

coveralls · 2026-01-20T00:05:26Z

coverage: 64.837% (-0.5%) from 65.355%
when pulling f529a9b on copilot/add-qwen3-0-6b-model
into d55e615 on dev.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

pythainlp/lm/qwen3.py

pythainlp/phayathaibert/core.py

bact · 2026-02-06T10:29:41Z

@copilot apply changes based on the comments in this thread

1. Add 'import torch' in generate() and chat() methods (pythainlp/lm/qwen3.py) - torch.inference_mode() requires torch to be imported in method scope - Previously torch was only imported in load_model() method - Both generate() (line 147) and chat() (line 235) now import torch locally 2. Change AutoTokenizer type annotation to PreTrainedTokenizerBase (pythainlp/phayathaibert/core.py:221) - AutoTokenizer.from_pretrained() returns PreTrainedTokenizerBase instance - Matches type annotations on lines 320 and 365 for consistency - More accurate type annotation (AutoTokenizer is a factory class, not instantiated directly) These changes fix import scope issues and improve type annotation accuracy. Co-authored-by: bact <128572+bact@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.

Removed AutoTokenizer import from transformers.

bact · 2026-02-06T11:04:17Z

@copilot dev branch is updated. It reorganized the grouping inside test suites.

Test cases that import heavy imports should be in the "noauto" suite.
The "noauto" is further subgrouped according to family/ecosystem of dependencies (this tend to be around machine learning framework).

Please rearrange tests for Qwen into suitable test suite.

…ilot/add-qwen3-0-6b-model

…ranch 1. Sync with dev branch (54 commits merged) - Test suite reorganization: noauto tests now grouped by dependency ecosystem - Added noauto_torch, noauto_tensorflow, noauto_onnx, noauto_cython, noauto_network suites - Added braille module and comprehensive translation improvements 2. Move Qwen3 tests to appropriate suite - Moved from tests/extra/testx_lm.py to tests/noauto_torch/testn_lm_torch.py - Changed test class suffix from TestCaseX to TestCaseN (noauto convention) - Registered in tests/noauto_torch/__init__.py test_packages list - Removed from tests/extra/__init__.py test_packages list Reason: Qwen3 uses torch and transformers (PyTorch ecosystem) which are heavy dependencies (~2-3 GB). Per the updated test suite organization, tests with heavy ML framework dependencies belong in noauto_torch suite, not extra suite. Co-authored-by: bact <128572+bact@users.noreply.github.com>

Copilot · 2026-02-06T11:06:43Z

@copilot dev branch is updated. It reorganized the grouping inside test suites.

Test cases that import heavy imports should be in the "noauto" suite.
...

Moved Qwen3 tests from tests/extra/testx_lm.py to tests/noauto_torch/testn_lm_torch.py.

The tests now follow the PyTorch ecosystem grouping in the noauto suite, alongside other PyTorch-based tests (spell, tag, tokenize). Test class renamed from LMTestCaseX to LMTestCaseN per noauto naming convention.

Commit: 91303d3

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

pythainlp/lm/qwen3.py

pythainlp/lm/__init__.py

bact · 2026-02-06T11:32:57Z

@copilot apply changes based on the comments in this thread

1. Add type annotation to __all__ in pythainlp/lm/__init__.py - Changed from __all__ = [...] to __all__: list[str] = [...] - Matches codebase convention (e.g., pythainlp/chat/__init__.py) 2. Remove try/except wrapper around Qwen3 import in __init__.py - The wrapper was ineffective since qwen3.py uses lazy imports - Users would get raw ModuleNotFoundError instead of helpful message - Better to handle missing deps in the methods that actually need them 3. Add dependency checking in Qwen3 methods (qwen3.py) - load_model(): Wrap torch/transformers imports in try/except - generate(): Wrap torch import in try/except - chat(): Wrap torch import in try/except - All raise ImportError with helpful message: "Install them with: pip install 'pythainlp[qwen3]'" - Catches both ImportError and ModuleNotFoundError for robustness These changes ensure users get clear, actionable error messages when trying to use Qwen3 without installing the optional dependencies, rather than cryptic import errors. Co-authored-by: bact <128572+bact@users.noreply.github.com>

Copilot · 2026-02-06T11:35:10Z

@copilot apply changes based on the comments in this thread

Applied all changes from the review:

Added list[str] type annotation to __all__
Removed ineffective try/except wrapper (qwen3 imports successfully without deps due to lazy loading)
Added proper dependency checking with helpful error messages in load_model(), generate(), and chat() methods

Commit: 432c2ab

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.

sonarqubecloud · 2026-02-06T15:22:51Z

Quality Gate failed

Failed conditions
17.5% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Initial plan

183748c

Copilot AI assigned Copilot and wannaphong Jan 19, 2026

Copilot started work on behalf of wannaphong January 19, 2026 14:51 View session

Copilot AI and others added 4 commits January 19, 2026 14:56

Add Qwen3 language model support to pythainlp.lm

9654117

Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>

Address code review feedback: Fix type annotations and docstrings

0e4b368

Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>

Fix device validation and improve docstring clarity

e53c777

Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>

Fix device handling: use .to() instead of device_map parameter

fe937d2

Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add Qwen/Qwen3-0.6B to pythainlp.lm~~ Add Qwen3-0.6B language model support to pythainlp.lm Jan 19, 2026

Copilot AI requested a review from wannaphong January 19, 2026 15:06

Copilot finished work on behalf of wannaphong January 19, 2026 15:06

Copilot started work on behalf of wannaphong January 19, 2026 15:06 View session

Move Qwen3 tests from tests/core to tests/extra

2018813

Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>

Copilot finished work on behalf of wannaphong January 19, 2026 15:10

bact requested changes Jan 19, 2026

View reviewed changes

pythainlp/lm/qwen3.py Show resolved Hide resolved

pyproject.toml Outdated Show resolved Hide resolved

Copilot started work on behalf of bact January 19, 2026 15:49 View session

Update copyright year and sort qwen3 dependency alphabetically

69fbb42

Co-authored-by: bact <128572+bact@users.noreply.github.com>

Copilot AI requested a review from bact January 19, 2026 15:52

Copilot finished work on behalf of bact January 19, 2026 15:52

Update copyright year in testx_lm.py

c816818

Copilot started work on behalf of bact January 19, 2026 23:56 View session

Merge branch 'dev' of https://github.com/PyThaiNLP/pythainlp into cop…

0d104dd

…ilot/add-qwen3-0-6b-model

Copilot finished work on behalf of bact January 20, 2026 00:00

bact marked this pull request as ready for review January 20, 2026 08:00

bact added the enhancement enhance functionalities label Jan 20, 2026

Copilot AI reviewed Feb 6, 2026

View reviewed changes

pythainlp/lm/qwen3.py Show resolved Hide resolved

pythainlp/phayathaibert/core.py Outdated Show resolved Hide resolved

Copilot started work on behalf of bact February 6, 2026 10:30 View session

Copilot finished work on behalf of bact February 6, 2026 10:33

bact requested a review from Copilot February 6, 2026 10:35

Copilot started reviewing on behalf of bact February 6, 2026 10:35 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

Remove AutoTokenizer import from core.py

ffb73ee

Removed AutoTokenizer import from transformers.

Copilot started work on behalf of bact February 6, 2026 11:04 View session

Copilot AI and others added 2 commits February 6, 2026 11:05

Merge branch 'dev' of https://github.com/PyThaiNLP/pythainlp into cop…

2a9f6e9

…ilot/add-qwen3-0-6b-model

Copilot AI changed the title ~~Add Qwen3-0.6B language model support to pythainlp.lm~~ Add Qwen3-0.6B language model support to pythainlp.lm and improve type annotations Feb 6, 2026

Copilot finished work on behalf of bact February 6, 2026 11:08

bact requested a review from Copilot February 6, 2026 11:16

Copilot started reviewing on behalf of bact February 6, 2026 11:16 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

pythainlp/lm/qwen3.py Outdated Show resolved Hide resolved

pythainlp/lm/__init__.py Outdated Show resolved Hide resolved

pythainlp/lm/__init__.py Outdated Show resolved Hide resolved

Copilot started work on behalf of bact February 6, 2026 11:33 View session

Copilot finished work on behalf of bact February 6, 2026 11:36

bact requested a review from Copilot February 6, 2026 12:55

Copilot started reviewing on behalf of bact February 6, 2026 12:55 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

Reorder import statements in __init__.py

f529a9b

bact approved these changes Feb 6, 2026

View reviewed changes

Conversation

Copilot AI commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What do these changes do

What was wrong

How this fixes it

Your checklist for this pull request

Comments on the Issue (you are @copilot in this section)

Uh oh!

wannaphong commented Jan 19, 2026

Uh oh!

Copilot AI commented Jan 19, 2026

Uh oh!

Uh oh!

Uh oh!

bact commented Jan 19, 2026

Uh oh!

Copilot AI commented Jan 20, 2026

Uh oh!

coveralls commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

bact commented Feb 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

bact commented Feb 6, 2026

Uh oh!

Copilot AI commented Feb 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bact commented Feb 6, 2026

Uh oh!

Copilot AI commented Feb 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

sonarqubecloud bot commented Feb 6, 2026

Quality Gate failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Jan 19, 2026 •

edited

Loading

coveralls commented Jan 20, 2026 •

edited

Loading