| TIP: Click on subject to list as thread! | ANSI |
| echo: | |
|---|---|
| to: | |
| from: | |
| date: | |
| subject: | Multi-byte character sets |
LE>> Frankly, I think we should just go to Unicode. JdBP>> By Unicode I presume you mean UCS-2. That would mean a new PKT JdBP>> file format, of course. It would also be highly inefficient for JdBP>> text that was mostly ISO 8859-1, since every other byte would be JdBP>> zero. (Although I do wonder how much of that would be eliminated JdBP>> by ZIP, RAR, ARJ, ARC, and suchlike.) JdBP>> UTF-8 would be better. LE> I've been told there's a format where you give an "intro code" that LE> IDs the character subset, (essentially that first byte) and then only LE> have to use 16-byte chars for stuff that *isn't* in that set. Sort of LE> a "condensed mode" I very much doubt that what you describe exists. There would be no way to distinguish 8-bit characters from 16-bit characters. As I said, UTF-8 would be better than UCS-2. Aside from the storage inefficiency and the problems with all of those zero bytes, there's the problem of endianism to consider with UCS-2, as well. UTF-8 doesn't suffer from any of these. LE> Also, from what I've seen of Unicode, a message that was in full LE> 16-bit format and mostly *ASCII* is where the high byte would be LE> zero. The characters present in ISO 8859-1 that aren't present in LE> ASCII are spread over *several* unicode "sets". I don't know where you read that, but it's wrong. The ISO 8859-1 character set occupies positions 0 to 255 of the Unicode character set. It was deliberately designed this way. Because of this, messages written in Cyrillic will have non-zero high bytes, as will (parts of) messages that use line drawing and box drawing characters, but the majority of messages written in Western European languages will have every second byte set to zero if using UCS-2. ¯ JdeBP ® --- FleetStreet 1.22 NR* Origin: JdeBP's point, using Squish (2:257/609.3) SEEN-BY: 201/0 100 200 209 300 329 400 407 411 505 600 203/600 204/450 700 SEEN-BY: 205/0 206/0 396/1 490/21 633/267 270 @PATH: 257/609 255/3 1 396/1 201/505 633/267 |
|
| SOURCE: echomail via fidonet.ozzmosis.com | |
Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.