HS>Does anyone know the best combination of DJGPP optmization directives
CB>to
HS>compile a real-time texture mapper?
HS>I currently use :-
HS>gcc -O3 -fexpensive-optimizations -fkeep-all-inlined-fuctions
HS> -fthread-loops -funroll-all-loops
CB>About the only three really useful optimizations are the -O3, -m486, and
CB>-ffast-math.
CB>Everything else comes with conditions where they can generate massive
CB>code bloat, and even slow down the program (because of cache misses,
CB>etc.)
CB>For example, although a loop has loop overhead, unrolling the loop too
CB>much (or all the way) will cuase far more cache misses than what you
CB>might gain by removing the loop overhead. The more complex the loop,
CB>the less benefit. As a general rule, unrolling a loop 2-4 times will
CB>get the best result. That is just a general rule. There are
CB>exceptions.
CB>Same thing with inlining large functions and so on. They can cause loss
CB>of cache locality and massive code bloat. Once an inline function gets
CB>beyond a half dozen lines or so, function call overhead is minimal, plus
CB>since the function is now always at a fixed location, it might be
CB>cached, where as before, each new occurence would have to be read from
CB>slow main memory.
CB>The point is, there are conditions to most optimizations. You shouldn't
CB>do them expecting 'the more optimization swithces the better'.
I am aware about the effects of caching. But I have tested my programs and
they seem to execute alot faster with unrolled-loops. See report below.
CB>GCC is not the best at optimization. Plus, it only generates 486 code,
CB>not 586+ code. Plus, the 486 code tweaking it does is for a _generic_
CB>486. What that means is that it aims roughly down the middle of the
CB>field. But there are a lot of different 486s, and for some of them, the
CB>code GCC generates with -m486 can be less than optimal.
I don't see how using pentium opcodes would improve performance. There aren't
that many of them, and I for sure have never used one. They perform rare
tasks, I doubt a compiler would ever need them.
report :-
-------------------------------------------------------------------
OPTIMIZATIONS | Average Frames Per Second |
------------------------------------------------------------------|
none | 130 |
-O3 | 220 |
-O3 -fexpensive-optimizations | |
-fthread-loops -funroll-all-loops | 321 |
| |
-O3 -fexpensive-optimizations | |
-fthread-loops -funroll-all-loops | |
-m486 -fomit-frame-pointer | 402 |
-------------------------------------------------------------------
Maximum FPS I could get with watcom was around the 380's.
If you ask me, djgpp is good at optimizing.
goodbye
... How come pizza gets to your house faster than the police?
--- Ezycom V1.48g0 01fd016b
---------------
* Origin: Fox's Lair BBS Bris Aus +61-7-38033908 V34+ Node 2 (3:640/238)
|