(Continued from previous message)
1) although on your computer and compiler, pointers are faster than
direct indexing, they may not be on other peoples computers and
compilers. This will depend on compiler optimizations available, the
register assigning the compiler makes, and the individual CPU itself.
2) You can become so involved in the low level details of your
particular implementation, that you miss a better one (like switching to
moving words instead of bytes. Or you are optimizing a bubble sort
instead of simply switching to a quicksort.)
3) You can become so involved in the 'low level' stuff that you can miss
the obvious. This includes things like: using memcpy() (which is
likely to be optimized anyway), or reducing the number of times
something needs to be done, rather than improving the code (like in
finding a prime number, you only have to go to the square root of the
number, instead of all the way.) Or, continuing #2 above, that maybe
your data doesn't even need to be sorted anyway.
4) if you aren't careful, an optimization that works fine on one
computer could be fatal on another (such as misaligned data on non-x86
computers. The word at a time copy is an example where that could
occur.)
5) it's very easy to waste your time optimizing something that doesn't
need to be optimized, or where simply using a standard library function
is already better than what you have.
6) When I look at my data showing the results of different size buffers
and the cases where they are small enough to fit entirely into the L2
cache, and then the L1 cache, and write back vs. write through, etc., I
can see that much of the variation in results deal with data latency,
rather than code efficiency.
7) And when you do finally do lower level optimizations, you have to
first ask yourself 1) what platform, 2) what CPU, and 3) what version of
that CPU.
Wanting your code to run fast is an admirable trait in a programmer.
It's just that optimizing these days is simply not an absolute. You
can't depend on cycle counting because they vary from processor to
processor, and with newer processors, there are so many conditions and
notes in the programmers manual as to what can effect those counts,
there is no easy or sure way to know what it will do in every situation.
What can be an improvement in one situation can be a harmful in
another. This is especially true when you consider 1) all the compilers
that are around and the different type and quality of code they generate
2) the wide variety of x86 clones, and the different levels of
performance they have, and 3) the wide variety of CPUs (68k, sparc, x86,
Power-PC, etc.) that the code may be run on. It just isn't simple
anymore. If you limit the code to the x86 line, then you can make some
more improvements. If you further limit it to some particular version
of an x86 cpu, then you can make even more. Otherwise you have to aim
somewhere towards the middle.
That's why I keep saying you are better off doing algorithmic
improvements. With everything else, there are _NO_ absolutes. It'll
depend very heavily on the compiler, the CPU, and the cache, etc. And
all of those vary from person to person. You could even give me an
executable compiled and tuned for your 486 system, but it'll run
differently on mine.
--- QScan/PCB v1.19b / 01-0162
---------------
* Origin: Jackalope Junction 501-785-5381 Ft Smith AR (1:3822/1)
|