TIP: Click on subject to list as thread! ANSI
echo: os2prog
to: Mike Ruskai
from: Pete Zieger
date: 1994-12-27 18:07:00
subject: UU algorithm

MR> Anyone have the algorithm for uuen/decoding files?  I'd like to 
 MR> write en/decoders in REXX.

Condensed from a doc file for a UUENCODE/DECODE program by 
 Richard Marks - 931 Sulgrave Lane - Bryn Mawr, PA 19010
------------------------------------------------------------------
UU-encoding is a way to code a file which may contain any characters into
a standard character set that can be reliably sent over diverse networks.

THE CHARACTER ENCODING:
The basic scheme is to break groups of 3 eight bit characters (24 bits) 
into 4 six bit characters and then add 32 (a space) to each six bit 
character which maps it into the readily transmittable character.  
Another way of phrasing this is to say that the encoded 6 bit characters 
are mapped into the set:
`!"#$%&'()*+,-./012356789:;?{at}ABC...XYZ[\]^_
for transmission over communications lines.

As some transmission mechanisms compress or remove spaces, spaces are 
changed into back-quote characters (a 96).  (A better scheme might be 
to use a bias of 33 so the space is not created, but this is not done.)

Another newer less popular encoding method, called XX-encoding uses the 
set:  +-01..89ABC...XYZabc...xyz

In my opinion, XX-encoding is superior to UU-encoding because it uses more
"normal" characters that are less likely to get corrupted.  In fact several
of the special characters in the UU set do not get thru an EBCDIC to ASCII
translation correctly.  Conversely, an advantage of the UU set is that it 
does not use lower case characters.  Now-a-days both upper and lower case 
are sent with no problems; maybe in the communications dark ages, there was 
a problem with lower case.

COMPOSING A LINE OF ENCODED CHARACTERS:
A small number of eight bit characters are encoded into a single line and a
count is put at the start of the line.  (Most lines in an encoded file have 
45 encoded characters.  When you look at a UU-encoded file note that most 
lines start with the letter "M".  "M" is decimal 77
which, minus the 32 
bias, is 45.)

This encode program puts a check character at the end of each line.  The 
check is the sum of all the encoded characters, before adding the mapping, 
modulo 64.

Note: Horton 9/1/87 UUENCODE has a bug in the line check algorithm; it uses 
the sum of the original, not the encoded characters.  

PACKAGING THE LINES INTO FILES:
The lines of encoded data can be preceded by comments and by network 
addressing information.  The encoded data is directly preceded by a 
line containing: begin  

The final end of encoded data is an encoded line with zero encoded 
characters (a back-quote), followed by a line containing "end".

SPLITING UP LONG FILES:
Long files are broken into several sections before transmission.  This is
done because very large files are cumbersome to handle and because some
networks require files to be less than 64K bytes.

TABLE LINES:
Some encoded files but the mapping used at the front of the encoded file,
just in front of the "begin" line.  The format for this is:
table
first 32 characters
second 32 characters

All this starts in column 1.
------------------------------------------------------------------

Hope this helps!  
-Pete-  FidoNet: 1:2614/205  Internet: solution2bbs.com

... Windows 3.1 - It's still wishing it was an OS/too.

--- WILDMAIL!/WC v4.10 
* Origin: The Solution II BBS*Quakertown,Pa*(215)529-9501 (1:2614/205.0)
SEEN-BY: 12/2442 620/243 624/50 632/348 640/820 690/660 711/409 410 413 430
SEEN-BY: 711/807 808 809 934 942 949 712/353 515 713/888 800/1
@PATH: 2614/205 3615/50 229/2 12/2442 711/409 808 809 934

SOURCE: echomail via fidonet.ozzmosis.com

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.