TIP: Click on subject to list as thread! ANSI
echo: linux
to: Holger Granholm
from: Maurice Kinal
date: 2019-03-05 22:17:00
subject: Character codes

Hola Holger!

 HG> OK, the code 218 128 162 that i interpreted as hyphen actually
 HG> is the longer 'dash'.

I am not sure what you mean but using 218 (DA) as the leading byte means you 
are restricted to a 2 byte or 16 bit character and not a 24 bit character that 
is required for euro sign in utf8.  The way the leading byte works is like 
this;

dec 218 = bin 11011010
                ^
The first zero shows that there are two leading ones which means there is only 
one trailing byte following.  So that means either 218 128 and 162 is ignored.  
A 24 bit character *must* be prefixed by at least 11100000 which is dec 224 or 
E0.  For the utf8 euro character the prefix is;

dec 226 = bin 11100010
                 ^
and as you can see the first zero yields three leading ones which is three 
bytes or 24 bits.

For the record 218 128 is U+0680 which we already know to be a 16 bit Arabic 
character.  Also for the record is that all trailing byte(s) must be in the 
range of 80 - BF or dec 128 to dec 191 which both of your posted trailing bytes 
are despite the leading byte could only use one.

 HG> God natt min vän

Thank you.  Buenas noches mi amigo.  :-)

La vida es buena,
Maurice

... Un Møøse una vez mordió a mi hermana ...
--- GNU bash, version 5.0.2(1)-release (aarch64-raspi3b+-linux-gnu)
                   
* Origin: Little Mikey's EuroPoint - Ladysmith BC, Canada (2:280/464.113)

SOURCE: echomail via QWK@dmine.net

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.