TIP: Click on subject to list as thread! ANSI
echo: c_plusplus
to: CAREY BLOODWORTH
from: HERMAN SCHONFELD
date: 1997-04-05 21:16:00
subject: DJGPP optimizations

HS>Does anyone know the best combination of DJGPP optmization directives
CB>to
HS>compile a real-time texture mapper?
HS>I currently use :-
HS>gcc     -O3 -fexpensive-optimizations -fkeep-all-inlined-fuctions
HS>            -fthread-loops -funroll-all-loops
CB>About the only three really useful optimizations are the -O3, -m486, and
CB>-ffast-math.
CB>Everything else comes with conditions where they can generate massive
CB>code bloat, and even slow down the program (because of cache misses,
CB>etc.)
CB>For example, although a loop has loop overhead, unrolling the loop too
CB>much (or all the way) will cuase far more cache misses than what you
CB>might gain by removing the loop overhead.  The more complex the loop,
CB>the less benefit.  As a general rule, unrolling a loop 2-4 times will
CB>get the best result.  That is just a general rule.  There are
CB>exceptions.
CB>Same thing with inlining large functions and so on.  They can cause loss
CB>of cache locality and massive code bloat.  Once an inline function gets
CB>beyond a half dozen lines or so, function call overhead is minimal, plus
CB>since the function is now always at a fixed location, it might be
CB>cached, where as before, each new occurence would have to be read from
CB>slow main memory.
CB>The point is, there are conditions to most optimizations.  You shouldn't
CB>do them expecting 'the more optimization swithces the better'.
I am aware about the effects of caching. But I have tested my programs and 
they seem to execute alot faster with unrolled-loops. See report below.
CB>GCC is not the best at optimization.  Plus, it only generates 486 code,
CB>not 586+ code.  Plus, the 486 code tweaking it does is for a _generic_
CB>486.  What that means is that it aims roughly down the middle of the
CB>field.  But there are a lot of different 486s, and for some of them, the
CB>code GCC generates with -m486 can be less than optimal.
I don't see how using pentium opcodes would improve performance. There aren't 
that many of them, and I for sure have never used one. They perform rare 
tasks, I doubt a compiler would ever need them.
report :-
-------------------------------------------------------------------
   OPTIMIZATIONS                     |  Average Frames Per Second |
------------------------------------------------------------------|
none                                 |           130              |
-O3                                  |           220              |
-O3 -fexpensive-optimizations        |                            |
-fthread-loops -funroll-all-loops    |           321              |
                                     |                            |
-O3 -fexpensive-optimizations        |                            |
-fthread-loops -funroll-all-loops    |                            |
-m486 -fomit-frame-pointer           |           402              |
-------------------------------------------------------------------
Maximum FPS I could get with watcom was around the 380's.
If you ask me, djgpp is good at optimizing.
goodbye
... How come pizza gets to your house faster than the police?
--- Ezycom V1.48g0 01fd016b
---------------
* Origin: Fox's Lair BBS Bris Aus +61-7-38033908 V34+ Node 2 (3:640/238)

SOURCE: echomail via exec-pc

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.