HS>Does anyone know the best combination of DJGPP optmization directives to
HS>compile a real-time texture mapper?
HS>I currently use :-
HS>gcc -O3 -fexpensive-optimizations -fkeep-all-inlined-fuctions
HS> -fthread-loops -funroll-all-loops
About the only three really useful optimizations are the -O3, -m486, and
-ffast-math.
Everything else comes with conditions where they can generate massive
code bloat, and even slow down the program (because of cache misses,
etc.)
For example, although a loop has loop overhead, unrolling the loop too
much (or all the way) will cuase far more cache misses than what you
might gain by removing the loop overhead. The more complex the loop,
the less benefit. As a general rule, unrolling a loop 2-4 times will
get the best result. That is just a general rule. There are
exceptions.
Same thing with inlining large functions and so on. They can cause loss
of cache locality and massive code bloat. Once an inline function gets
beyond a half dozen lines or so, function call overhead is minimal, plus
since the function is now always at a fixed location, it might be
cached, where as before, each new occurence would have to be read from
slow main memory.
The point is, there are conditions to most optimizations. You shouldn't
do them expecting 'the more optimization swithces the better'.
HS>Also, can someone give me the best directives to compile for a :-
HS>Pentium Pro
HS>Pentium
HS>486 512 cache
HS>486 256 cache
Same switches.
GCC is not the best at optimization. Plus, it only generates 486 code,
not 586+ code. Plus, the 486 code tweaking it does is for a _generic_
486. What that means is that it aims roughly down the middle of the
field. But there are a lot of different 486s, and for some of them, the
code GCC generates with -m486 can be less than optimal.
--- QScan/PCB v1.19b / 01-0162
---------------
* Origin: Jackalope Junction 501-785-5381 Ft Smith AR (1:3822/1)
|