TIP: Click on subject to list as thread! ANSI
echo: qedit
to: ALL
from: DIETER KOESSL
date: 1997-09-01 05:44:00
subject: re: DOS vs WIN95 code pages (high_ascii 05:44:3209/01/97

From: Dieter Koessl 
Hi,
> Is there any way of adjusting the DOS code page so that TSE and
> the WINDOWS programs agree on what they are seeing as high ascii
> characters?
Short answer: NO!
DOS and Windows use different character sets (code pages aka fonts), =
and there is no DOS code page which corresponds to the ANSI character =
set Windows uses.
Dieter
PS: Here's a bit of explanation I've written some time ago in response
to a similar request.
The ASCII character set (it's an ANSI standard, but I've forgotten the
number) is a 7-bit code, which includes ten numbers, the basic lower and
upper case letters, most of the punctuation marks and control characters
(e.g. CR, LF and TAB). It does not include any accented characters.
Control characters (#0..#31 and #127), by definition, are non-typeable
characters and as such have _no_ visual counterpart.
This all changed with the advent of the IBM-PC, which was targeted at an
international market. IBM extended the ASCII character set to 8 bits and
filled the empty space with accented characters, block graph and some
greek characters. It also defined visual counterparts for the control
characters. This became what now is known to be CP 437. Also text
oriented apps usually depend heavy on this extended character set. It
also quickly became clear that the accented characters included into the
new ASCII character set didn't suffice for many languages, thus new code
pages were invented, especially the "international" CP 850, which
sacrificed some block graph and greek characters for a more complete set
of accented characters. All these character sets or code pages later
collectively became to be known as _the_ OEM character set.
Things again changed with the advent of Windows, which exclusively used
the new ANSI characters set--an extension of the old ANSI 7-bit ASCII
character set. This new ANSI characters set includes an extensive set of
accented characters and additional punctuation marks, but... it doesn't
include any block graph and greek characters.
Now, if you use a GUI editor, e.g. notepad, it will store the characters
you have typed using the ANSI character set. Text oriented editors on
the other hand, e.g. TSE, will use the OEM character set. This means,
depending on which kind of editor you use, different numbers (bytes)
respresenting the same visual character will be stored within the file
on disk. This also means that the extended characters will be
interpreted to be something different entirely, if you open a file
written with the other kind of editor, e.g. open a file with TSE written
with notepad. To summarize what is stored on disk are only numbers in
the range of 0..255 and what is displayed on screen depends on how your
program interprets these numbers, e.g. which character set it uses.
Finally, the windows clipboard can be used to store a lot of things
including plain text. But this isn't so plain after all, because the
clipboard understands two kinds text (you guessed it!) ANSI text and OEM
text. Now if you stuff in ANSI text, say via notepad, and retrieve OEM
text, say via TSE, windows will _automatically_ transform one character
set into the other. It does this as best as it can and will fail on
certain characters, since either set includes characters which the other
doesn't. If windows encounters such a character, it will produce a block
sign in ANSI or an underscore in OEM.
---
---------------
* Origin: apana>>>>>fidonet [sawasdi.apana.org.au] (3:800/846.13)

SOURCE: echomail via exec-pc

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.