(Continued from previous message)
up a 386SX program. But that doesn't give you the best benifits on
non-386SX computers.
Since there are so few 386SXs left, you could aim for a generic 386, and
the cached 386 would just run it faster, because everybody KNOWS a cache
always makes a program run faster. Wrong. There are many cases where
cache thrashing can cause a cached computer to run slower than a
non-cached system. And a good solution for a non-cached system can very
well turn out to not run well on a cached system. You can often change
your code to allow for caches, but then that impacts non-cached systems.
(In fact, I've even written a program that runs faster with my L2 cache
turned OFF rather than using the cache. The problem is inherent in the
way some caches are designed and that large FFTs are going to be far
larger than the cache, meaning every access will cause a cache check,
fail, main memory access. There is no locality of data and no data
re-usage, so there is no chance for the cahce to be effective.)
Or you could just forget about 386 computers. There aren't many left.
Probably fewer than 2% of all PCs left in the world are 386. So, you
could aim for the 486. Well, which 486? Intel 486SX, Intel 486DX,
Intel 486DX/66, Cyrix 486/66, AMD 486/66 etc.? They all behave
differently. A program optimized to the way a Cyrix behaves, could take
25% longer to run on an Intel 486/66, while at the same time, the
reverse is also true. They all behave differently, with different cycle
times. Some have better FPU performance, too. So, should you avoid the
FPU, even though it would speed up the program on 80%+ of the
computers but slow it down on the other 20%?
Or you could optimize for a Pentium. The code could still be generic
386, so it would at least run on the rare 386SX, but still take
advantage of the way a Pentium+ behaves. And even many Pentium
optimizations will help on 486 computers, too. Of course, you have to
have a Pentium and a compiler that can optimize for it. DJGPP can't.
Of course, if it's a portable program, then there is no telling what
type of computer it'll be run on. Everything from a 68000 to a Sparc to
a Power PC, to no telling what. And they all behave differently.
Once you get beyond general algorithmic improvements, there are very
very few optimizations that are general across multiple platforms and
CPUs. Every optimization of any sort has a price, and that price will
be paid by somebody else running the program on a platform different
from yours.
Once you do all the algorithmic improvements you know of, you _have_ to
depend on the compiler, because only the compiler is in charge of
translating your HLL program into executable. And you don't have any
way of knowing in advance what your program is going to be compiled
with, and if it's a portable program, on what platform / CPU.
(I should also point out that with video programming, the only really
general rule you can follow is to avoid video memory, and minimize the
number of accesses when you do have to use it. Everything else will
require specific knowledge of the chipset, accelerator, etc.)
HS> Obviously, it's no wonder I see so much sloppy code, if you actually
HS> took notice that not everyone has the latest machines to date you
There are several reasons poor code exists. 1) too many people try to
optmize it so much it ends up unreadable etc. 2) it may not be _worth_
optimizing. (for example, if you've got 100 items and you are going to
sort them only once, then it isn't going to matter in the slightest if
you use a stupid bubble sort instead of a quick sort. And if it only
takes 10 seconds even on an XT, then on most computers around, it'll be
'Good Enough') 3) There are a lot of beginners who don't know much
about programming, and know even less about optimization and what their
compiler really does do.
HS> Obviously, it's no wonder I see so much sloppy code, if you actually
HS> took notice that not everyone has the latest machines to date you
HS> might consider trying to make the fastest code possible, not relying
HS> on the compiler to do it.
On the contrary!! I only have a 486 computer. Three quarters of all the
other PCs today have at least a 586 (pentium) in them. And it was only
two years or so ago that my home machine went from a 25mhz 386SX to a
486. And before that, I used to regularly use an 8 bit micro running a
1.8Mhz. As a result, I am very much aware that you have to consider
slower machines, ranging from 286 computers all the way up to Pentium
Pro+ machines. And clock rates from 16mhz to 200 mhz. And 16 bit and 32
bit computers. And x86 and 68k machines. And machines with 640k to
64meg. And DOS to Unix to OS/2 to Win to Linux, etc.
That's been the point of what I've been trying to tell you.... There is
a limit to what you can optimize, because there is a good chance that
other optimizations will be harmful instead of helpful on many
platforms. That if you are distributing source, the best you can do is
make generic algorithmic improvements. If you are distributing
executable, you can either make it nice and generic, or aim towards a
nice fat target, or aim towards what most likely have, say a 486 or 586,
and since most 486s are better than the generic 486, it's usually safe
to make 586 type optimizations but not 586 code. You can probably hit
80% performance on 90% of the PCs with last part, but since there is
such a wide variety of PCs, you're unlikly to make both percentages
higher at the same time. Or, you can tune it for a fairly specific type
of platform and make it a 'strongly suggestion requirement' for the
program to run.
Sometimes you can 'spend memory' to gain performance, but that can cause
problems for computers with less memory. You can use virtual memory,
(Continued to next message)
--- QScan/PCB v1.19b / 01-0162
---------------
* Origin: Jackalope Junction 501-785-5381 Ft Smith AR (1:3822/1)
|