Skip to content

Commit f678d10

Browse files
committed
Merge pull request #12 from jquast/0.1.5
Release 0.1.5 for @philipc's PR #11
2 parents f2ae915 + 426d748 commit f678d10

File tree

15 files changed

+517
-483
lines changed

15 files changed

+517
-483
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,4 @@ docs/_build
1212
htmlcov
1313
.coveralls.yml
1414
data
15+
.DS_Store

.travis.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
language: python
2+
sudo: false
23

34
env:
45
- TOXENV=py26
@@ -19,6 +20,9 @@ install:
1920

2021
script:
2122
- tox -e $TOXENV
23+
- if [[ $TOXENV == "py34" ]]; then
24+
tox -esa;
25+
fi
2226

2327
after_success:
2428
- if [[ $TOXENV == "py34" ]]; then

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
The MIT License (MIT)
2+
3+
Copyright (c) 2014 Jeff Quast <[email protected]>
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.rst

Lines changed: 71 additions & 130 deletions
Original file line numberDiff line numberDiff line change
@@ -26,58 +26,31 @@
2626
Introduction
2727
============
2828

29-
This API is mainly for Terminal Emulator implementors -- any python program
30-
that attempts to determine the printable width of a string on a Terminal. It
31-
is implemented in python (no C library calls) and has no 3rd-party dependencies.
32-
33-
It is certainly possible to use your Operating System's ``wcwidth(3)`` and
34-
``wcswidth(3)`` calls if it is POSIX-conforming, but this would not be possible
35-
on non-POSIX platforms, such as Windows, or for alternative Python
36-
implementations, such as jython. It is also commonly many releases older
37-
than the most current Unicode Standard release files, which this project
38-
aims to track.
39-
40-
The most current release of this API is based from Unicode Standard release
41-
*7.0.0*, dated *2014-02-28, 23:15:00 GMT [KW, LI]* for table generated by
42-
file ``EastAsianWidth-7.0.0.txt`` and *2014-02-07, 18:42:08 GMT [MD]* for
43-
``DerivedCombiningClass-7.0.0.txt``.
29+
This API is mainly for Terminal Emulator implementors, or those writing
30+
programs that expect to interpreted by a terminal emulator and wish to
31+
determine the printable width of a string on a Terminal.
4432

45-
Installation
46-
------------
33+
Usually, the length of the string is equivalent to the number of cells
34+
it occupies except that there are are also some categories of characters
35+
which occupy 2 or even 0 cells. POSIX-conforming systems provide
36+
``wcwidth(3)`` and ``wcswidth(3)`` of which this module's interface mirrors
37+
precisely.
4738

48-
The stable version of this package is maintained on pypi, install using pip::
39+
This library aims to be forward-looking, portable, and most correct. The most
40+
current release of this API is based from Unicode Standard release files:
4941

50-
pip install wcwidth
42+
``EastAsianWidth-8.0.0.txt``
43+
*2015-02-10, 21:00:00 GMT [KW, LI]*
5144

52-
Problem
53-
-------
54-
55-
You may have noticed some characters especially Chinese, Japanese, and
56-
Korean (collectively known as the *CJK Unified Ideographs*) consume more
57-
than 1 terminal cell. If you ask for the length of the string, ``u'コンニチハ'``
58-
(Japanese: Hello), it is correctly determined to be a length of **5** using
59-
the ``len()`` built-in.
60-
61-
However, if you were to print this to a Terminal Emulator, such as xterm,
62-
urxvt, Terminal.app, PuTTY, or iTerm2, it would consume **10** *cells* (columns).
63-
This causes problems for many of the text-alignment functions, such as ``rjust()``.
64-
On an 80-wide terminal, the following would wrap along the margin, instead
65-
of displaying it right-aligned as desired::
66-
67-
>>> text = u'コンニチハ'
68-
>>> print(text.rjust(80))
69-
コン
70-
ニチハ
45+
``DerivedGeneralCategory-8.0.0.txt``
46+
*2015-02-13, 13:47:11 GMT [MD]*
7147

72-
Solution
73-
--------
48+
Installation
49+
------------
7450

75-
This API allows one to determine the printable length of these strings,
76-
that the length of ``wcwidth(u'コ')`` is reported as ``2``, and
77-
``wcswidth(u'コンニチハ')`` as ``10``.
51+
The stable version of this package is maintained on pypi, install using pip::
7852

79-
This allows one to determine the printable effects of displaying *CJK*
80-
characters on a terminal emulator.
53+
pip install wcwidth
8154

8255
wcwidth, wcswidth
8356
-----------------
@@ -89,39 +62,45 @@ To Display ``u'コンニチハ'`` right-adjusted on screen of 80 columns::
8962
>>> from wcwidth import wcswidth
9063
>>> text = u'コンニチハ'
9164
>>> print(u' ' * (80 - wcswidth(text)) + text)
92-
コンニチハ
9365

66+
Return Values
67+
-------------
68+
69+
``-1``
70+
Indeterminate (not printable).
9471

95-
Values
96-
------
72+
``0``
73+
Does not advance the cursor, such as NULL or Combining.
9774

98-
A general overview of return values:
75+
``2``
76+
Characters of category East Asian Wide (W) or East Asian
77+
Full-width (F) which are displayed using two terminal cells.
9978

100-
- ``-1``: indeterminate (see Todo_).
101-
- ``0``: do not advance the cursor, such as NULL.
102-
- ``2``: East_Asian_Width property values W and F (Wide and Full-width).
103-
- ``1``: all others.
79+
``1``
80+
All others.
10481

10582
``wcswidth()`` simply returns the sum of all values along a string, or
106-
``-1`` if it has occurred for any value returned by ``wcwidth()``. A more
107-
exacting list of conditions and return values may be found in the docstring
108-
for ``wcwidth()``.
83+
``-1`` in total if any part of the string results in -1. A more exact
84+
list of conditions and return values may be found in the docstring::
85+
86+
$ pydoc wcwidth
10987

110-
Discrepacies
111-
------------
11288

113-
There may be discrepancies with the determined printable width of of characters
114-
by *wcwidth* and the results of any given terminal emulator -- most commonly,
115-
emulators are using your Operating System's ``wcwidth(3)`` implementation which
116-
is often based on tables much older than the most current Unicode Specification.
117-
Python's determination of non-zero combining_ characters may also be based on an
118-
older specification.
89+
Discrepancies
90+
-------------
11991

120-
You may determine an exacting list of these discrepancies using files
121-
`wcwidth-libc-comparator.py`_ and `wcwidth-combining-comparator.py`_
92+
This library does its best to return the most appropriate return value for a
93+
very particular terminal user interface where a monospaced fixed-cell
94+
rendering is expected. As the POSIX Terminal programming interfaces do not
95+
provide any means to determine the unicode support level, we can only do our
96+
best to return the *correct* result for the given codepoint, and not what any
97+
terminal emulator particular does.
12298

123-
.. _`wcwidth-libc-comparator.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-libc-comparator.py
124-
.. _`wcwidth-combining-comparator.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-combining-comparator.py
99+
Python's determination of non-zero combining_ characters may also be based on
100+
an older specification.
101+
102+
You may determine an exacting list of these discrepancies using the project
103+
files `wcwidth-libc-comparator.py <https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-libc-comparator.py>`_ and `wcwidth-combining-comparator.py <https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-combining-comparator.py>`_.
125104

126105

127106
==========
@@ -140,22 +119,20 @@ Updating Tables
140119
The command ``python setup.py update`` will fetch the following resources:
141120

142121
- http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt
143-
- http://www.unicode.org/Public/UNIDATA/extracted/DerivedCombiningClass.txt
122+
- http://www.unicode.org/Public/UNIDATA/extracted/DerivedGeneralCategory.txt
144123

145-
And generate the table files `wcwidth/table_wide.py`_ and `wcwidth/table_comb.py`_.
124+
And generates the table files:
146125

147-
.. _`wcwidth/table_wide.py`: https://github.com/jquast/wcwidth/tree/master/wcwidth/table_wide.py
148-
.. _`wcwidth/table_comb.py`: https://github.com/jquast/wcwidth/tree/master/wcwidth/table_comb.py
126+
- `wcwidth/table_wide.py <https://github.com/jquast/wcwidth/tree/master/wcwidth/table_wide.py>`_
127+
- `wcwidth/table_zero.py <https://github.com/jquast/wcwidth/tree/master/wcwidth/table_zero.py>`_
149128

150129
wcwidth.c
151130
---------
152131

153132
This code was originally derived directly from C code of the same name,
154-
whose latest version is available at: `wcwidth.c`_ And is authored by
155-
Markus Kuhn -- 2007-05-26 (Unicode 5.0)
156-
157-
.. _`wcwidth.c`: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
158-
133+
whose latest version is available at
134+
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c And is authored by Markus Kuhn,
135+
2007-05-26 (Unicode 5.0).
159136

160137
Examples
161138
--------
@@ -167,85 +144,49 @@ This library is used in:
167144
- `jonathanslenders/python-prompt-toolkit`_, a Library for building powerful
168145
interactive command lines in Python.
169146

170-
Additional tools for displaying and testing wcwidth is found in the ``bin/``
171-
folder of this project (github link: `wcwidth/bin`_). They are not distributed
172-
as a script or part of the module.
147+
Additional tools for displaying and testing wcwidth are found in the `bin/
148+
<https://in.linkedin.com/in/chiragjog>`_ folder of this project. They are not
149+
distributed as a script or part of the module.
173150

174151
.. _`jquast/blessed`: https://github.com/jquast/blessed
175152
.. _`jonathanslenders/python-prompt-toolkit`: https://github.com/jonathanslenders/python-prompt-toolkit
176-
.. _`wcwidth/bin`: https://github.com/jquast/wcwidth/tree/master/bin
177-
178-
Todo
179-
----
180-
181-
Though some of the most common ("zero-width") `combining`_ characters
182-
are understood by wcswidth, there are still many edge cases that need
183-
to be covered, especially certain kinds of sequences such as those
184-
containing Control-Sequence-Inducer (CSI).
185-
186-
187-
License
188-
-------
189153

190-
The original license is as follows::
191-
192-
Permission to use, copy, modify, and distribute this software
193-
for any purpose and without fee is hereby granted. The author
194-
disclaims all warranties with regard to this software.
195-
196-
No specific licensing is specified, and Mr. Kuhn resides in the UK which allows
197-
some protection from Copyrighting. As this derivative is based on US Soil,
198-
an OSI-approved license that appears most-alike has been chosen, the MIT license::
199-
200-
The MIT License (MIT)
201-
202-
Copyright (c) 2014 <[email protected]>
203-
204-
Permission is hereby granted, free of charge, to any person obtaining a copy
205-
of this software and associated documentation files (the "Software"), to deal
206-
in the Software without restriction, including without limitation the rights
207-
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
208-
copies of the Software, and to permit persons to whom the Software is
209-
furnished to do so, subject to the following conditions:
210-
211-
The above copyright notice and this permission notice shall be included in
212-
all copies or substantial portions of the Software.
213-
214-
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
215-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
216-
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
217-
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
218-
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
219-
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
220-
THE SOFTWARE.
221154

222155
Changes
223156
-------
224157

225-
0.1.4
158+
0.1.5 *2015-09-13 Alpha*
159+
* **Bugfix**:
160+
Resolution of "combining character width", most especially
161+
those that previously returned -1 now often (correctly) return 0.
162+
resolved by `Philip Craig`_ via `PR #11`_.
163+
164+
0.1.4 *2014-11-20 Pre-Alpha*
226165
* **Feature**: ``wcswidth()`` now determines printable length
227166
for (most) combining characters. The developer's tool
228167
`bin/wcwidth-browser.py`_ is improved to display combining_
229168
characters when provided the ``--combining`` option
230169
(`Thomas Ballinger`_ and `Leta Montopoli`_ `PR #5`_).
231170
* added static analysis (prospector_) to testing framework.
232171

233-
0.1.3
172+
0.1.3 *2014-10-29 Pre-Alpha*
234173
* **Bugfix**: 2nd parameter of wcswidth was not honored.
235-
(`Thomas Ballinger`_, `PR #4`).
174+
(`Thomas Ballinger`_, `PR #4`_).
236175

237-
0.1.2
176+
0.1.2 *2014-10-28 Pre-Alpha*
238177
* **Updated** tables to Unicode Specification 7.0.0.
239-
(`Thomas Ballinger`_, `PR #3`).
178+
(`Thomas Ballinger`_, `PR #3`_).
240179

241-
0.1.1
180+
0.1.1 *2014-05-14 Pre-Alpha*
242181
* Initial release to pypi, Based on Unicode Specification 6.3.0
243182

244183
.. _`prospector`: https://github.com/landscapeio/prospector
245184
.. _`combining`: https://en.wikipedia.org/wiki/Combining_character
246185
.. _`bin/wcwidth-browser.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-browser.py
247186
.. _`Thomas Ballinger`: https://github.com/thomasballinger
248187
.. _`Leta Montopoli`: https://github.com/lmontopo
188+
.. _`Philip Craig`: https://github.com/philipc
249189
.. _`PR #3`: https://github.com/jquast/wcwidth/pull/3
250190
.. _`PR #4`: https://github.com/jquast/wcwidth/pull/4
251191
.. _`PR #5`: https://github.com/jquast/wcwidth/pull/5
192+
.. _`PR #11`: https://github.com/jquast/wcwidth/pull/11

bin/wcwidth-browser.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737
import signal
3838

3939
# local
40-
from wcwidth import wcwidth, table_comb
40+
from wcwidth.wcwidth import _bisearch, wcwidth, COMBINING
4141

4242
# 3rd-party
4343
from blessed import Terminal
@@ -116,6 +116,7 @@ def __init__(self, width=2):
116116
self.characters = (unichr(idx)
117117
for idx in xrange(LIMIT_UCS)
118118
if wcwidth(unichr(idx)) == width
119+
and not _bisearch(idx, COMBINING)
119120
)
120121

121122
def __iter__(self):
@@ -152,13 +153,13 @@ def __init__(self, width=1):
152153
"""
153154
self.characters = []
154155
letters_o = (u'o' * width)
155-
for boundaries in table_comb.NONZERO_COMBINING:
156+
for boundaries in COMBINING:
156157
for val in [_val for _val in
157158
range(boundaries[0], boundaries[1] + 1)
158159
if _val <= LIMIT_UCS]:
159160
self.characters.append(letters_o[:1] +
160161
unichr(val) +
161-
letters_o[1:])
162+
letters_o[wcwidth(unichr(val))+1:])
162163
self.characters.reverse()
163164

164165
def __iter__(self):
@@ -647,8 +648,7 @@ def text_entry(self, ucs, name):
647648
delimiter = style.attr_minor(style.delimiter)
648649
if len(ucs) != 1:
649650
# determine display of combining characters
650-
val = ord(next((_ucs for _ucs in ucs
651-
if wcwidth(_ucs) == -1)))
651+
val = ord(ucs[1])
652652
# a combining character displayed of any fg color
653653
# will reset the foreground character of the cell
654654
# combined with (iTerm2, OSX).

0 commit comments

Comments
 (0)