TIP: Click on subject to list as thread! ANSI
echo: c_plusplus
to: HERMAN SCHONFELD
from: CAREY BLOODWORTH
date: 1997-04-22 21:28:00
subject: DJGPP OPTIMIZATIONS 1/3

I'm in the process of moving, so I haven't had time to make this reply
shorter.  It's considerably longer than it should be, and I repeat
myself.  But I don't have any spare time to edit it.
HS>Funny? I don't see how improving ones code can considered funny.
Because 'improvements' are often only improvements on one particular
platform.  If you took that same program to a different computer (say
from x86 to 68k) or to a similar computer with a different CPU (say 386
to a Pentium Pro), then all those careful optimizations you made before
could actually turn out to be ANTI-optimizations on the second platform.
It could actually cause the program to run significantly slower.
You can tune a program nice and tight for your particular system, but
then when it's run on a different system, discover that the performance
sucks big time.  That you are only getting, maybe half or less, of the
performance you could have otherwise gotten by not making processor
specific optimizations and then running the program on a different
processor.
The same can be true of compiler switches.  Some may work better for one
program than they do another.  Some may be more effective on one
compiler or processor than another.  If you are distributing an
executable, you either aim 'towards the middle' or use the switches that
aim for the more popular platforms (486+).
HS>My comment merely said "Maybe some people don't rely on compilers 
ompiling
HS>that fastest code possible".
HS>Lets see now, where did I say that a compiler doesn't optimize?
HS>Does a compiler change a *256 to a <<8?
HS>Obviously not, so why not optimize it yourself?
Because whether something like that is optimal depends HEAVILY on the
particular platform you are running on.  On one platform, doing *256 may
be faster but on another, doing <<8 may be faster.  A multiply by 256
may take 2-3 cycles, but a shift by 8 might take 8 or more (1 cycle for
each shift.)  It depends on the particular processor.
Things like that are very very platform dependant.
And many compilers _do_ make that optimization.  WHEN and ONLY when it
is appropriate for the CPU they are compiling for.  You obviously don't
have much experience with optimizors, or you wouldn't have used that as
an example.  Things like that are called Strength Reduction and have
been around since the early days of compilers and optimizations.
Throughout C's life, certainly.  It's very rare for even a 15 year old
compiler for any language to not do simple optimizatons like that.
Both Turbo C 3.0 and GNU C will make that change.  Of course, as an
example of an optimization that is inappropriate, on my CPU, doing the
shift can actually be _slower_ than if they had just done a multiply by
256.
HS>Ofcourse, everyone depends on the compiler to compile code. But sometimes 
yo
HS>have to make an optimizations by hand because the compiler can't do it.
But only to an extent.  What you don't seem to realize that is that a
program that is 'optimal' on a 386 class computer could be running at
only 1/4th speed it might otherwise run on a cached 386, or 486 or
Pentium, etc.  Now, true that will benefit those people running a 386,
but there aren't many of them left.  It's usually better to just aim
towards the middle, which these days is a 486.  (Also, since most 486
are higher peformance 486s, Pentium type optimizations can usually
help.)
Are using pointers more efficient than using array indexing?  Naturally
it depends on the particular code, but it also depends on how smart the
compiler is and the architecture of target CPU.  For some CPUs, array
indexing is faster, while for others, it's faster to deal with pointers
all the time.
If you are writing a portable program, then you have no control over
what compiler everybody else will be using.  You will also not have
control over what CPU or what version CPU they will be using.
You can make generic algorithmic improvements (bubble sort to a quick
sort, etc.), but beyond that, most optimizations are very very dependant
on the particular platform you are aiming towards.  And on other
platforms, your tinkering can do more harm than good because it may
_prevent_ the compiler from optimizing it appropriately for the new
platform.
And of course, above, your saying you should optimize it by hand begs
the question...: Optimize it for what platform?
Maybe a 8086 XT?  That's the lowest platform.  Of course, fewer than 1
out of 10,000 PC computers are an XT.
Same with a 286.
You could do a 386.  Of course, then you have to decide which 386.  The
386SX, the 386DX or a cached 386DX.  386SXs can outrun their memory when
running 32 bit code, and their memory is often organized so that
sequential access (like with program execution) are with minimal wait
states but that random accesses (like accessing variables) are done with
full wait states.  Avoiding 32 bit code and data can significantly speed
(Continued to next message)
--- QScan/PCB v1.19b / 01-0162
---------------
* Origin: Jackalope Junction 501-785-5381 Ft Smith AR (1:3822/1)

SOURCE: echomail via exec-pc

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.