TIP: Click on subject to list as thread! ANSI
echo: ftsc_public
to: Ozz Nixon
from: mark lewis
date: 2019-06-29 01:25:12
subject: not all is lost but far too much for far too long

On 2019 Jun 28 21:23:36, you wrote to Maurice Kinal:

 ON> FTN Header versus actual message body conveying Unicode.

 ON> When I telnet to a SQL server that speaks Unicode only, it always
 ON> returns the following characters (pascal): #239#187#191

that's a UTF8 BOM (Byte Order Mark)...

 ON> When I telnet to a web page that speaks Unicode, it too returns
 ON> #239#187#191 plus the  etc.

i'm sure you know what it is but if not, it is a magic number that may
appear at the start of a text stream... it signals at least one of several
things to program processing the stream...

  1. the byte order, or endianness, of the stream
  2. that the text stream's encoding is unicode
  3. which nnicode encoding the stream is using

 ON> So... would it not stand true that systems that are posting UTF8 do
 ON> the same introduction on the message body?

they could but it is not required... it actually interfers with software
using UTF8 that do not expect non-ascii bytes at the beginning of a
stream...

 ON> Then authors *know* it potentially has Unicode

see above... it does indicate that the stream is unicode... not potentially...

 ON> and leave it damn well alone, and also parse it based upon UTF8
 ON> instead of 8bit char...

it is an idea except that everyone else that uses plain ascii will be
saying, what's that garbage at the beginning of these messages?

 ON> This is how I am coding things here, just based upon NexusSQL,
 ON> PremierSQL, MS SQL, Apache and Nexus Web Service. I do not have access
 ON> to my Oracle box nor the MySQL 5 server to see if they do the same
 ON> during the initial connection negotiation(s).

it is probably a configuration option... apache shouldn't care as it just
sends whatever is in the file... i don't know about nexus...

 ON> A quick google: It's the utf8 byte order mark. Some editors save the
 ON> BOM inside the file (in order to be used as a header) which regularly
 ON> causes confusion because it is optional.

ahh, you found it :)

 ON> So, if we wanted to help enforce at a reader (or even tosser level)
 ON> how to handle, I would offer this up as a required BOM to the message
 ON> body that is UTF8.

tossers shouldn't be modifying message bodies anyway... that's in the
specs... the problem is how some coders interpreted "ignore"...
the funny thing is if they chose to ignore the problematic character by
stripping it, they actually added code to remove it... if they had selected
the other form of ignore and left it in the stream, their code would be
(slightly) smaller and faster... it is kinda funny on the one hand...

)\/(ark

Always Mount a Scratch Monkey
Do you manage your own servers? If you are not running an IDS/IPS yer doin'
it wrong...
... You know you're in YK when: you have to break your dog loose from the tree.
---
* Origin: (1:3634/12.73)
SEEN-BY: 1/19 120 14/6 16/0 18/0 200 103/705 120/544 123/0 25 50 115 130 131
SEEN-BY: 123/150 755 132/174 135/300 153/7715 154/10 203/0 221/0 229/426
SEEN-BY: 240/2100 5138 5832 5853 5890 261/1 38 275/100 280/464 5003 5006 5555
SEEN-BY: 292/854 310/31 320/119 219 322/0 396/45 423/120 633/267 280 640/1384
SEEN-BY: 712/620 848 770/1 2432/390 2452/250 2454/119 3634/0 12 15 24 27 50
SEEN-BY: 3634/119 5020/545
@PATH: 3634/12 320/219 240/5832 280/464 712/848 633/267

SOURCE: echomail via fidonet.ozzmosis.com

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.