TIP: Click on subject to list as thread! ANSI
echo: utf-8
to: August Abolins
from: Michiel van der Vlist
date: 2020-01-25 17:18:00
subject: UTF-8 nodelist report

Hello August,

On Friday January 24 2020 03:29, you wrote to me:

 MvdV>> File: dailyutf.024
 MvdV>>
 MvdV>> ,208,-=P=I=X=-,S?ve,Bj▀ÿýrn_Felten,46-31-960447,9600,CM,XA,V34
 MvdV>> 1 line found with an ill formed UTF-8 sequence.

 AA> When I look a that nodelist, the line above looks fine as:

 AA> ,208,-=P=I=X=-,Säve,Björn_Felten,46-31-960447,9600,CM,XA,V34

If you look at it with a viewer configured for Latin-1...

 AA> In otherwords, I see the a with the 2 dots in "Säve", and the o with
 AA> the 2 dots in "Björn" as they should be.

 AA> But the 2nd line following the Zone,2 line *does* look like a fail:

 AA> Zone,2,Eur_(024),B,Ward_Dossche,-Unpublished-,300,CM,MO,INA:many-glaci
 AA> er... ,2,Fidonews_Editor,Sweden,BjÃörn_Felten,46-31-960447,33600,CM,XA
 AA> ,V34,INA...

If looks OK if you view it with a viewer configured for UTF-8.

 AA> Seems to me that your program should be reporting *that* line.

Nope. Read on...

 AA> (I'll try to send a copy of this to you in email incase the jamnntp
 AA> breaks something in the chars conversion.)

Got it.

If you look at the regular nodelist, the ASCII omly one, you will see two 
queation marks in the line for 2:203/208.

Let me explain.

1) For $reasons Björn does not participate in the UTF-8 nodelist project.

2) He nevertheless insists that his name is spelled "properly" in the nodelist.

3) He also insists that every sysop should be alleowed to use the encoding of 
his choice. Björn uses Latin-1 for his "ASCII only" segment.

4) For regions participating in the UTF-8 nodelist project the procedure is as 
follows:

4a) The RC sends two files to the ZC. One is the classic refion segment, 
containing ASCII only.

4b) The second file is UTFRxx.jjj. In this file UTF-8 characters are allowed.

4c) How the RC assembles these files is up to him/her, but one way is to use 
MakeNL with "ALLOW8BIT 0" for the ASCII file and "ALLOW8BIT 1" for the UFT-8 
version.

4d) The RC in quesetions informs the ZC about the second file and the ZC 
configures his nodelist processing to use the second file for the dailyutf.

5) For RCs not participating in the UTF-8 project, the RC sends just one file, 
the ASCII version.

6) For the Z2 nodelist, the ZC prodices two versions.

6a) The ASCII only version for which the uses the ASCII only segments from the 
RCs compiled with "ALLOW8BIT 0". If a non ASCII character is encounterd, MakeNl 
converts it to a '?'.

6b) The UTF-8 version. For this version he uses the UTF-8 segments from those 
RCs that participate in the poject and the ASCII version from those that do not 
participate. It is assembled with "ALLOW8BIT 1". Characters with bit 8 set, are 
passed on "as is". There are no checks and no conversion is done.

7) Because of the combination of 1), 2) and 3), the line for 2:203/208 has two 
questions marks in the ASCII nodelist and in the UTF-8 list the line is flagged 
by my nodelist checker as having non well formatted UTF-8 sequences.

8) The line for 2:2/2 is not derived from a file submitted by the RC, but 
originates directly from the ZCs system. Therefore this line contains proper 
UTF-8.

I hope this clearifies things...


Cheers, Michiel

--- GoldED+/W32-MSVC 1.1.5-b20170303
                                                             
* Origin: http://www.vlist.eu (2:280/5555)

SOURCE: echomail via QWK@pharcyde.org

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.