TIP: Click on subject to list as thread! ANSI
echo: z3_pascal
to: Bek Oberin
from: Frank Malcolm
date: 1995-08-21 20:35:02
subject: Randomizing strings?

Hi, Bek.

BO>  FM> Someone
BO>  FM> else posted an idea which I initially dismissed (plus there were bugs
BO>  FM> his pseudo-code) but thinking about it, what's wrong with treating it
BO>  FM> a file of byte, randomise into the whole file size, read backwards to
BO>  FM> crlf or BOF, grab from there to the terminating crlf as your quote.
BO>  FM> The only thing I can see against it is that longer quotes will have a
BO>  FM> greater chance of being selected (possibly a negligible effect), and
BO>  FM> it's heaps faster than any sequential thing and doesn't require a
BO>  FM> separate index file or a unique first record.

BO> I have a very large (43k-ish) file of quotes here, ranging in length
BO> from one to about twenty plus lines long.  Each quote is separated from the
BO> next by two crlf pairs.

BO> Because of the large difference in length, I think the method you just
BO> described wouldn't be really great for this file.  What I've been doing

OK, if the samples you put in your message to Ron are mixed with a few
one-liners, the one-liners would have a significantly less chance of
being selected using my suggestion above.

BO> is reading through the file counting quotes, then generating a
BO> random number and re-reading though the file.  It takes just
BO> about second to come up with a quote doing that which is
BO> acceptible at the moment, but hardly ideal.

BO> Got any ideas?

A few. :-) I'll assume your quotes file is relatively static, but
sometimes you add more quotes using, probably, a text editor or some
cut-and-paste process. IOW it's not totally fixed so an index file
generated once won't do.

Now, one thing I don't know is how many times the quote file is *used*,
cf how often it's updated. That's important, and may make an index
appropriate because the frequency of needing to update it (the index)
could be far less than needing to use it.

The possibilities I'd consider are...

a) Generate an index file. A quickie prog could do this every time you
updated the quotes file, but you'd have to remember to run it. If you
forgot, the result could be a mess so I'd include a couple of safety
checks. (more on this below)

b) Re-index every time the prog that presents the quotes is run. Doesn't
save anything unless you only do it when necessary. Also see below.

c) At least eliminate the first pass which you mention above, by
including a count of quotes in the first record. The tagline feature of
OLX which I use to read mail does this.

My preference is b), but there may be some reason why you don't want a
separate file so let me first describe how I think OLX works, ie
alternative c).

The first record contains just a number (probably a longint in Pascal)
which says how many quotes are in the file. You read that record, make
sure it does contain a longint, use it to generate a random number then
read through the file as you're doing now.

Now I reckon that's all the checking OLX does, but I'd be inclined to
also record the file's date/timestamp in that record, so that when you
add some quotes later your prog knows it has to regenerate that record.

That in itself is an interesting exercise. I reckon you'd probably write
out a new first record with dummy values (say, xxxxx xx/xx/xx xx:xx:xx),
then copy the rest of the original file with readln/writeln, then open
the file again as an untyped file and overwrite the x.

If, OTOH, you can use an index file (alternative b)), then I guess it
would have the date/timestamp of the actual quote file, followed by a
lot of longints pointing to each quote in the quote file. You'd need to
read that one as an untyped file from the pointer from the index file
until the next double crlf.

In both cases you save time in the (presumably more common) case that
the quote file hasn't been changed, and when it has you only take about
as long as you do now.


Now I've just re-read all that and I'm not sure I've really explained
well what I'm thinking of. Tell me the bits that don't make sense. Or do
you want to see some code?

Regards, FIM.

 * * No person ever became wicked all at once.
@EOT:

---
* Origin: Pedants Inc. (3:711/934.24)
SEEN-BY: 633/267 270
@PATH: 711/809 808 50/99 635/503 633/371 252 267

SOURCE: echomail via fidonet.ozzmosis.com

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.