"The Natural Philosopher" wrote
| Does any one else suspect that this post is utter bunk? UTF 8 is
| multibyte character sequences and its not necessarily compatible with
| HTML which uses straight, not curly, brackets.
|
I didn't say curly brackets. I said curly quotes. If you look
at a typical English-language webpage online you'll find it's
usually pure ASCII. When it's not, the only non-ASCII is typically
a few things like curly quotes or space characters rendered in
UTF-8. I'm guessing some editors do that, since it's not
an easy job to write an article with curly quotes when straight
quotes work just as well. What I'm saying is that most of the
Internet still doesn't need more than ASCII, and ANSI in Europe.
HTML is just text. It started out as ASCII and ended up
with META tags to specify charset. So browsers could
accommodate non-English languages. But most of it was
English, and much of it still is. So it was 1 byte per character.
That worked fine for most situations; everyhing but DBCS
languages.
As the Internet expanded and computing became mainstream
around the world, we needed to adapt. ANSI was working for
most languages but not for Chinese, Japanese, etc. So, what
to do? It could go to unicode-16, but that still wouldn't cover
all characters and it would require a radical shift to 2-byte
characters, breaking the Internet and breaking computing.
Text files on Windows still default to ANSI.
It could go to 4-byte characters. That would work, but it
would still break everything. Editors and browsers would need
to all be rewritten.
UTF-8 provided a smooth, easy, solution. It accommodates
the millions of pages and files that are still essentially ASCII.
Unlike with unicode 16 or 32, we don't have to add a null byte
to every character in order to encode it.
UTF-8 allows ANSI character sets to still be used. But it also
provides a way to fully support multi-byte characters only
where necessary. It's the one solution to support all languages
without changing the default of 1 character to 1 byte.
| UTF8 is a layer above HTML. Its down to the browser and it's access to
| fonts to render it correctly and the server to specify that its in use.
|
UTF-8 is not a layer. It's character encoding. The HTML is
plain text. The META content type tag specifies how that
text is encoded. However it's done, it's still plain text. Fonts
is a whole other kettle of fish.
--- SoupGate-Win32 v1.05
* Origin: Agency HUB, Dunedin - New Zealand | FidoUsenet Gateway (3:770/3)
|