On dim, 2003-01-12 at 07:07, Tomasz Wegrzanowski wrote:
On Sun, Jan 12, 2003 at 06:48:08AM -0800, Toby Bartels
wrote:
As for the forbidden numerical character entites
from € to š,
we can interpret them as if they came from Micro$oft (most likely)
and convert them to whatever they should be (by table).
(If any other forbidden numerical entities have common nonstandard uses,
then we can adopt those as well as long as they translate to good Unicode.)
They translate to Unicode 128-154.
Unicode 0-255 is identical with ISO-8859-1.
There are many other Unicode (and ISO-8859) characters that mean nothing,
so this is not a problem.
Well, it's a problem when people trying to write the euro symbol, the
French oe-ligature, or Slovene/Czech accented letters get mysterious
high control characters instead of the characters that they typed
legally in CP1252 (even though their browser shouldn't have given them
the option, since it was told to use ISO 8859-1, it's not the users'
fault).
I'd rather silently do input conversion from CP1252 to UTF-8 (thus
preserving those nasty Microsoft extentions as good Unicode characters),
and output conversion to ISO 8859-1 to keep with standards.
There's no legitimate use of ISO 8859-1's or Unicode's 128-154 range
that I know of except conceivably in terminal control. In plaintext on
the web, they're 100% useless, so if they show up it's safe to assume
they're really CP1252.
-- brion vibber (brion @
pobox.com)