Discussion:
[Tkinter-discuss] Inserting a unicode zero-width nonbreaking space into a Text widget, from Tkinter, on a Mac
Kenneth McDonald
2004-03-25 01:46:20 UTC
Permalink
As the subject says, I am attempting to insert a particular unicode
character
(\ufeff) into a Text widget, from Python. This is not quite working
correctly,
and I'm not sure if the problem is between Python and Tcl, or is a
problem of
OS X not properly knowing the character.

This character is supposed to be a 0-width space, and using Python's
unicodedata.name function confirms that feff is in fact the proper
unicode
sequence. The character is construction using a one-character python
string:

zws = u"\ufeff"

And then passed to Text's insert command. Unfortunately, what shows
up on screen is a complex (perhaps Chinese) asian ideograph.
"Python in a Nutshell" indicates that all communication between
Tkinter and Tk is in unicode, so I had hoped this would transfer
correctly.

Not sure if I need to do some conversion or not...could there be
a conflict between a straight Unicode (16-bit) representation
and a UTF-8 representation? Or do I need to do an OS setting
to make sure the OS and the internal representation are on
the same wavelength?

Thanks,
Ken
Stewart Midwinter
2004-03-25 01:57:42 UTC
Permalink
A zero-width space? Is that an oxymoron like "military intelligence"? I'm
curious, what would be the usual application for such a beast?
--
Stewart Midwinter
Calgary, Alberta
stewart 'at' midwinter 'dot' ca
This character is supposed to be a 0-width space, ...
Jeff Epler
2004-03-25 04:54:48 UTC
Permalink
In *theory*,
# t = Tkinter.Text()
t.insert(Tkinter.END, u"a\ufeffb")
should work just fine. However, I doubt Tk's text engine has enough
advanced text layout to respond appropriately to a zero-width
nonbreaking space. (heck, I just learned my mozilla doesn't properly
render \u200b, a breaking zero-width space)

Anyway, Tk has some complicated machinery to look through a number of
operating system fonts in various encodings to find a font where the
encoding of the character maps onto an existing glyph. When this fails,
it shows a placeholder (empty rectangle---what I got in my test, on
a Linux system) or an \xXXXX escape code. If it shows an incorrect
character, I'd be tempted to blame either a font with a character
that u"\ufeff" (erroneously) encodes to, or a bad Tcl encoding that
(erroneously) encodes u'\ufeff' to the wrong thing. It doesn't make
things easier that Tcl has its own encoding machinery and list of fonts
to try.

Jeff
Justin Ezequiel
2004-11-03 12:50:38 UTC
Permalink
Post by Jeff Epler
It doesn't make
things easier that Tcl has its own encoding machinery and list of fonts
to try.
Jeff
Can you explain more about this or can you point me to where I can read more
about this?
Jeff Epler
2004-11-04 04:18:00 UTC
Permalink
Post by Justin Ezequiel
Post by Jeff Epler
It doesn't make
things easier that Tcl has its own encoding machinery and list of fonts
to try.
Can you explain more about this or can you point me to where I can read more
about this?
Well, there's the source:
http://cvs.sourceforge.net/viewcvs.py/tktoolkit/tk/generic/tkFont.c
http://cvs.sourceforge.net/viewcvs.py/tktoolkit/tk/unix/tkUnixFont.c
http://cvs.sourceforge.net/viewcvs.py/tktoolkit/tk/unix/tkUnixRFont.c
http://cvs.sourceforge.net/viewcvs.py/tktoolkit/tk/win/tkWinFont.c
http://cvs.sourceforge.net/viewcvs.py/tktoolkit/tk/macosx/tkMacOSXFont.c

The "list of fonts to try" I mentioned is in the generic tkFont.c.

I'm the most familiar with tkUnixFont, and I don't know how similar the
others are.

For each character, Tk searches for an X font from the same "family"
that has that character. Since X fonts can be in a variety of
encodings, this gets even more exciting. There are rules about what
fonts and encodings are preferred, but before giving up on finding a
particular character, Tk will try almost any font on the system.

Jeff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mail.python.org/pipermail/tkinter-discuss/attachments/20041103/dad866da/attachment.pgp
Benjamin Riefenstahl
2004-03-25 12:55:57 UTC
Permalink
Hi Kenneth,
Post by Kenneth McDonald
And then passed to Text's insert command. Unfortunately, what shows
up on screen is a complex (perhaps Chinese) asian ideograph.
Current Mac OS X text rendering implementation is based on QuickDraw,
which is non-Unicode and very limited for stuff outside of your
current locale. The effect you observe is common with that
implementation.

If you can recompile, try the ATSU patch from
<http://sourceforge.net/tracker/?group_id=12997&atid=312997&func=detail&aid=638966>,
that should work for you. One of its biggest issues at the moment is
speed. I am actively working on that.


benny
Jeff Epler
2004-03-25 13:59:48 UTC
Permalink
[I removed Cc: tcl-***@lists.sourceforge.net because I got a letter
about moderator approval the last time I sent a message. Apologies to
everyone who won't see this message]

Thanks for the info, Benjamin. I've just created a page on the Wiki
about unicode, and included a reference to this post. If you'd like to
add more information about the situation on OS X, please drop by.
http://tkinter.unpy.net/wiki/UnicodeSupport

Jeff
Loading...