Skip to content

Commit 0ec6e9b

Browse files
committed
Partial sync of codebase
1 parent 9f7f69d commit 0ec6e9b

16 files changed

+296
-139
lines changed

.github/workflows/build_wheels.yml

+5-5
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,13 @@ jobs:
1515
matrix:
1616
# cibuildwheel builds linux wheels inside a manylinux container
1717
# it also takes care of procuring the correct python version for us
18-
os: [ubuntu-latest, windows-latest, macos-13]
19-
python-version: [38, 39, 310, 311, 312]
18+
os: [ubuntu-latest, windows-latest, macos-latest]
19+
python-version: [39, 310, 311, 312, 313]
2020

2121
steps:
2222
- uses: actions/checkout@v4
2323

24-
- uses: pypa/cibuildwheel@v2.18.0
24+
- uses: pypa/cibuildwheel@v2.21.2
2525
env:
2626
CIBW_BUILD: "cp${{ matrix.python-version}}-*"
2727

@@ -37,7 +37,7 @@ jobs:
3737
fail-fast: false
3838
matrix:
3939
os: [ubuntu-latest]
40-
python-version: [38, 39, 310, 311, 312]
40+
python-version: [39, 310, 311, 312, 313]
4141

4242
steps:
4343
- uses: actions/checkout@v4
@@ -48,7 +48,7 @@ jobs:
4848
platforms: arm64
4949

5050
- name: Build wheels
51-
uses: pypa/cibuildwheel@v2.18.0
51+
uses: pypa/cibuildwheel@v2.21.2
5252
env:
5353
CIBW_BUILD: "cp${{ matrix.python-version}}-*"
5454
CIBW_ARCHS: aarch64

CHANGELOG.md

+28-2
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,26 @@
22

33
This is the changelog for the open source version of tiktoken.
44

5+
## [v0.8.0]
6+
7+
- Support for `o1-` and `chatgpt-4o-` models
8+
- Build wheels for Python 3.13
9+
- Add possessive quantifiers to limit backtracking in regular expressions, thanks to @l0rinc!
10+
- Provide a better error message and type for invalid token decode
11+
- Permit tuples in type hints
12+
- Better error message for passing invalid input to `get_encoding`
13+
- Better error messages during plugin loading
14+
- Add a `__version__` attribute
15+
- Update versions of `pyo3`, `regex`, `fancy-regex`
16+
- Drop support for Python 3.8
17+
518
## [v0.7.0]
19+
620
- Support for `gpt-4o`
721
- Performance improvements
822

923
## [v0.6.0]
24+
1025
- Optimise regular expressions for a 20% performance improvement, thanks to @paplorinc!
1126
- Add `text-embedding-3-*` models to `encoding_for_model`
1227
- Check content hash for downloaded files
@@ -16,14 +31,17 @@ This is the changelog for the open source version of tiktoken.
1631
Thank you to @paplorinc, @mdwelsh, @Praneet460!
1732

1833
## [v0.5.2]
34+
1935
- Build wheels for Python 3.12
2036
- Update version of PyO3 to allow multiple imports
2137
- Avoid permission errors when using default cache logic
2238

2339
## [v0.5.1]
40+
2441
- Add `encoding_name_for_model`, undo some renames to variables that are implementation details
2542

2643
## [v0.5.0]
44+
2745
- Add `tiktoken._educational` submodule to better document how byte pair encoding works
2846
- Ensure `encoding_for_model` knows about several new models
2947
- Add `decode_with_offets`
@@ -32,23 +50,28 @@ Thank you to @paplorinc, @mdwelsh, @Praneet460!
3250
- Update versions of dependencies
3351

3452
## [v0.4.0]
53+
3554
- Add `decode_batch` and `decode_bytes_batch`
3655
- Improve error messages and handling
3756

3857
## [v0.3.3]
58+
3959
- `tiktoken` will now make a best effort attempt to replace surrogate pairs with the corresponding
40-
Unicode character and will replace lone surrogates with the Unicode replacement character.
60+
Unicode character and will replace lone surrogates with the Unicode replacement character.
4161

4262
## [v0.3.2]
63+
4364
- Add encoding for GPT-4
4465

4566
## [v0.3.1]
67+
4668
- Build aarch64 wheels
4769
- Make `blobfile` an optional dependency
4870

4971
Thank you to @messense for the environment variable that makes cargo not OOM under emulation!
5072

5173
## [v0.3.0]
74+
5275
- Improve performance by 5-20%; thank you to @nistath!
5376
- Add `gpt-3.5-turbo` models to `encoding_for_model`
5477
- Add prefix matching to `encoding_for_model` to better support future model versions
@@ -57,16 +80,19 @@ Thank you to @messense for the environment variable that makes cargo not OOM und
5780
- Add packaging metadata
5881

5982
## [v0.2.0]
60-
- Add ``tiktoken.encoding_for_model`` to get the encoding for a specific model
83+
84+
- Add `tiktoken.encoding_for_model` to get the encoding for a specific model
6185
- Improve portability of caching logic
6286

6387
Thank you to @fritzo, @arvid220u, @khanhvu207, @henriktorget for various small corrections
6488

6589
## [v0.1.2]
90+
6691
- Avoid use of `blobfile` for public files
6792
- Add support for Python 3.8
6893
- Add py.typed
6994
- Improve the public tests
7095

7196
## [v0.1.1]
97+
7298
- Initial release

Cargo.toml

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "tiktoken"
3-
version = "0.7.0"
3+
version = "0.8.0"
44
edition = "2021"
55
rust-version = "1.57.0"
66

@@ -9,7 +9,7 @@ name = "_tiktoken"
99
crate-type = ["cdylib"]
1010

1111
[dependencies]
12-
pyo3 = { version = "0.20.0", features = ["extension-module"] }
12+
pyo3 = { version = "0.22.2", default-features = false, features = ["extension-module", "macros"] }
1313

1414
# tiktoken dependencies
1515
fancy-regex = "0.13.0"

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -128,3 +128,4 @@ setup(
128128

129129
Then simply `pip install ./my_tiktoken_extension` and you should be able to use your
130130
custom encodings! Make sure **not** to use an editable install.
131+

pyproject.toml

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
[project]
22
name = "tiktoken"
3-
version = "0.7.0"
3+
version = "0.8.0"
44
description = "tiktoken is a fast BPE tokeniser for use with OpenAI's models"
55
readme = "README.md"
66
license = {file = "LICENSE"}
77
authors = [{name = "Shantanu Jain"}, {email = "[email protected]"}]
88
dependencies = ["regex>=2022.1.18", "requests>=2.26.0"]
99
optional-dependencies = {blobfile = ["blobfile>=2"]}
10-
requires-python = ">=3.8"
10+
requires-python = ">=3.9"
1111

1212
[project.urls]
1313
homepage = "https://github.com/openai/tiktoken"
@@ -24,7 +24,7 @@ build-verbosity = 1
2424

2525
linux.before-all = "curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y"
2626
linux.environment = { PATH = "$PATH:$HOME/.cargo/bin" }
27-
macos.before-all = "rustup target add aarch64-apple-darwin"
27+
macos.before-all = "rustup target add aarch64-apple-darwin x86_64-apple-darwin"
2828

2929
skip = [
3030
"*-manylinux_i686",

0 commit comments

Comments
 (0)