DM>x+(y<<6)+(y<<8): 182
DM>x+y*320: 169
DM>Consistantly your shift is slower by about 6%. This is
DM>with standard optimizations and debugging. No fancy stuff
DM>here. Putting optimization to -O3 obviously destroys the
DM>entire thing. :-)
DG> Hmmm, wierd. Destroys? How?
As in, when you put on optimizations, the compiler "recognizes" that the code
in the loop is "useless" and eliminates it. :-)
DM> DG> On mine, a 686 120mhz, under windows 95, the
DM> DG> shifting method was faster
DM> DG> in every instance.
DM>Is this a Cyrix 686?
DG> Let me check.
DG> ...Two reboots later...
DG> Darn non-system disk...
:-)
DG> Yep.
Ok, I believe that Cyrix has slightly different cache optimizations than
"normal" Intel which would result in your "skewed" results.
DG> I guess really, so long as you don't need lots of frames per
DG> second (ie. not more than 10 or so), it doesn't really matter what
DG> method you use. They are both about the same. And given the speed of
DG> today's processors, and the compilers, it doesn't matter in terms of how
DG> smooth the video is (I mean, shifting or multiplying won't affect
DG> it). You should just not plot too many pixels. If you plot too many
DG> pixels then you should change the algorithim.
Therefore, what we come down to is algorithm and readability, and not actual
speed. So if there's no difference between (y<<6)+(y<<8) and y*320, why use
the less readable version? Perhaps if you were going for a placement in the
OCCC, we could understand it. :-)
inline void Plot(int x, int y, char colour)
{
// since 320*y is the same as (y<<6)+(y<<8), we can do:
video_buffer[x + (y<<6) + (y<<8)] = colour;
}
vs
inline void Plot(int x, int y, char colour)
{
video_buffer[x + 320*y] = colour;
}
The speed is the same. The former, however, is likely more code (i.e.,
bigger executable) and DEFINATELY is less readable. With two points in its
favour, we should opt for the second one.
inline void Plot(int x, int y, char colour)
{
//video_buffer[x + 320*y] = colour;
// the following does the same, but our benchmark shows it to be three
// times faster:
video_buffer[x + (y<<6) + (y<<8)] = colour;
}
If the comments were true, this would be an acceptable version. (However,
our benchmarks has shown this to be false, so obviously we wouldn't do it
here.)
--- Maximus/2 3.01
---------------
* Origin: Tanktalus' Tower BBS (PVT) (1:342/708)
|