TIP: Click on subject to list as thread! ANSI
echo: public_domain
to: Kieran Haughey
from: Lewin Edwards
date: 1996-01-20 18:13:40
subject: Sorting

KH> I plan on getting a 250meg hdd for $120-145 brand new.. soon :)..

Considering a 1Gb drive is now only about $400, if "soon" isn't
"this evening", wait just a little longer, save, and buy
something bigger.

 PE>>> You don't have a choice.  How were you planning on sorting
 PE>>> something larger than the size of memory without having to
 PE>>> read it into memory from disk all the time?  There may be
 KH>>> Multiple passing??.. I was originally thinking of a method where you

Imagine you have a database with, say, 1000000000 words in it. You want to
get all the words into order. Your machine has only 4K of RAM, but lots of
hard disk space.

First, read the first letter of every word. Create a temporary file for
each start-letter you find. (I mean, create a temporary file for each
starting letter, but don't create a temporary file if there are no words to
go into it. For example, if there are no words in your original file which
start with "Q", don't create a temporary file for Q-words).

As you go down the main database, throw all the "A" words into
the "A" file, the "B" words into the "B"
file, and so on, until you reach the end. Then, delete the original file,
you don't need it any more. Note that you can do all of this in just one
pass over the master file.

Now you work on each letter-file. Depending on how big these are, you may
be able to read the entire small file into memory. Sorting these will be
correspondingly -FAST-. If those files are still too big to work with, you
look at the second letter. Hurl everything into sub-files again, so you
wind up with an "AA", "AB", "AC",
"AD"..... "ZZ" file, but again don't create empty
tempfiles. Delete the "A", "B", "C" ... files
again. Repeat the chopping process until you feel the individual files are
small enough to handle better with some other algorithm, or until the
subfiles contain just one element each.

This algorithm is obviously best implemented iteratively.

(The same principle can be applied to sorting numbers, flowers, or
autographed pictures of Victor Hugo, if you so desire).

-- Lewin A.R.W. Edwards [Team OS/2]  Tel 0419320415 * 0412809805 * 0414927056

--- GoldED 2.41+
* Origin: ZWSBBS +61-3-98276881/+61-3-98276277/Sysop Private Node (3:634/396)
SEEN-BY: 50/99 632/348 998 633/371 634/384 396 635/402 503 544 639/252
SEEN-BY: 640/230 690/718 711/401 410 413 430 808 809 934 713/888 800/1
SEEN-BY: 7877/2809
@PATH: 634/396 384 635/503 50/99 711/808 809 934

SOURCE: echomail via fidonet.ozzmosis.com

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.