Hi y'all;
Awhile back someone posted a text compression/decompression routine
that was simple and easy to implement. I have made some improvements
and cleaned it up so that you can better understand how it works.
The origional claim was that it packs/unpacks strings to about 52% of
their original size. This is the theoritical maximum comprression and
in actuallity will not occure. I got a typical compression figure of
more like 60%. Still, close enough to a 2:1 ratio to possible make it
worth while.
I'm not going to use this routine, myself. I am developing something
that should provide much better compression, but still lends itself to
file scanning. It will be optimized for text files, and may be
slightly lossy (like JPEGs) to compensate for mispellings and odd
punctuation.
Wish me luck....
Anyway, here is my contribution to the PowerBASIC echo for now.
======= cut this and name it COMPRESS.BAS - it's ready to run =========
'the first part is the main body for testing.
'The second part is the actual sub.
$DIM ALL DECLARE SUB PackTxt (text AS STRING)
'Notes:
' 1) The SUB adds CHR$(255) to a packed string to indicate that
' it is packed. (For automatic packing/unpacking)
' 2) The SUB checks if the values of the characters in the string are
' between 32 and 127 to reduce errors while packing.
' It ignores control codes and hi-ASCII.
'Usage is very simple. Take a string (A$, for example) and call PackTxt:
' A$ = "This is a test string for the text packer."
' PackTxt A$
'The string will be packed and CHR$(255) added to the beginning to indicate
'that it is packed. To unpack a string just call PackTxt again:
' PackTxt A$
DIM TLine AS STRING
DIM Test AS STRING
TLine = "PackTxt SUB begins here"+_
"This sub packs strings to about 52% of their original size. The sub"+_
" automatically checks if the string is packed or not and
npacks/packs"+_
" it accordingly. (neat, huh?)"+_
" Don't play around with the dictionary or you can lower the packing"+_
" matio (when I got it there was a single space missing and that"+_
" reduced the ratio by about 10%!)."+_
" NOTE: text$ may only contain ASCII characters with the values
2-127!"+_
" Usage is very simple. Take a string (A$, for example) and call
PackTxt:"+_
" A$ = This is a test string for the text packer."+_
" PackTxt A$"+_
" The string will be packed and CHR$(255) added to the beginning to
indicate"+_
" that it is packed. To unpack a string just call PackTxt again:"
'This is the test section of this program.
?:? LEN(TLine);"{"+TLine+"}"
Test = TLine
PackTxt TLine
?:?:? (1 - LEN(TLine) / LEN(Test))*100;" percent reduction."
?:? LEN(TLine);"{"+TLine+"}"
PackTxt TLine
?:? LEN(TLine);"{"+TLine+"}"
IF TLine Test THEN BEEP:? "Strings not the same!!!"
?:? "Done...."
'PackTxt SUB begins here
'This sub packs strings to about 52% of their original size. The sub
' automatically checks if the string is packed or not and unpacks/packs
' it accordingly. (neat, huh?)
' Don't play around with the dictionary or you can lower the packing
' ratio (when I got it there was a single space missing and that
' reduced the ratio by about 10%!).
' NOTE: text$ may only contain ASCII characters with the values 32-127!
SUB PackTxt (text AS STRING)
DIM Pair AS LOCAL STRING
DIM temp AS LOCAL STRING
DIM char AS STRING * 2
DIM v AS WORD
DIM x AS WORD
DIM cnt AS WORD
'The dictionary... it may look like junk, but it is pretty much the heart
' of the packer (so don't mess it up :) )
'old=" e as tinthouerhet anreesr d onn or o i y wo tontyo. neisarte"+_
' "ed, ctiy bat snd fal pensestvengitu talehaurllcousa mf dfoof "+_
' "siril hmeg om Icehironsasiossbedepe rli Tetel nicho lilprcactut"+_
' "Thpaeceachh wige ebuaisursulmawaotowtsmploI solyee Cunm rtieno S"+_
' "diwhs.rafincademe.irplk ury Pwoacos gams,duayavucColamowe Aoopu"
'This is a new string that renders slightly better compression using
'my mail reader packet database.
Pair=" e ts thinhe at d n eronanatre otey or. , en indr esti salto w"+_
"o neouedarstngis--itntvehaf l cof brimeio pliroealeg seco fasom"+_
" hicllde meta drahicacens rtaelchbeh : ussiilla Im di eurec Sfo"+_
"utpeun lpoma gotacaiwarthonactolrstr Ass19lyiggenots Cid "+CHR$(34)+"pr
nie"+_
"ncsoulniirad Temwee.wheyayeeThevloia ys."+CHR$(34)+" amowyoI vi
Pewplcimi M"
IF LEN(Pair) 320 THEN 'just to check...
PRINT "PACKING ERROR: Dictionary has the wrong size"
BEEP
EXIT SUB
END IF
'check if the string is packed
IF LEFT$(text, 1) = CHR$(255) THEN 'unpack this string
temp= RIGHT$(text, LEN(text) - 1) 'remove CHR$(255)
text = "" 're-init text$
'prepair the string
FOR x = 1 TO LEN(temp)
v = ASC(MID$(temp, x, 1))
IF v > 95 THEN 'we have an index to a 'pair'
text = text + MID$(Pair, (v - 96) * 2 + 1, 2)
ELSE 'all we have is the character-32
text = text + CHR$(ASC(MID$(temp, x, 1)) + 32)
END IF
NEXT
ELSE 'the string isn't packed so let's pack it...
IF LEN(text) < 4 THEN EXIT SUB 'no use with strings less than 4
hars.
'check if there are character values out of range
FOR x = 1 TO LEN(text)
IF (ASC(MID$(text,x,1)) XOR &h80) < &hA0 THEN EXIT SUB 'these
CANNOT be packed
NEXT
cnt = 1
DO
char = MID$(text, cnt,2)
x = 1 'read pointer in Dict$
'if a pair is found, it must be dividable by 2
'(it needs to be divided so it can be packed)
ReDo:
x = INSTR(x, Pair, char)
IF x AND ISFALSE(x AND 1) THEN
INCR x
GOTO ReDo
END IF
IF x THEN 'the 2 characters from text$ are in Dict$
'append the instr of the characters in Dict$ to temp$
'(chars shouldn't be compressed twice it's stored so that it's more than 95)
temp = temp + CHR$((x\2)+96)
INCR cnt
ELSE 'the chars aren't found, store the first char
'(note that it's less than 95 and that cnt is only moved up 1)
temp = temp + CHR$(ASC(MID$(text, cnt, 1)) - 32)
END IF
INCR cnt 'pointer in text$
LOOP WHILE cnt <= LEN(text)
'copy temp$ into text$ and add CHR$(255) to indicate a packed string
text = CHR$(255) + temp
END IF
END SUB
--- Maximus/2 3.01
---------------
* Origin: Madman BBS * Chico, California * 916-893-8079 * (1:119/88)
|