Flemming Sondergaard wrote in a message to Tom Torfs:
TT> AFAIK, REP MOVSW is still the fastest way to move blocks from memory to
TT> memory (even faster than DMA). REP MOVSD is still faster, of course
TT> (provided you have a 32-bit databus).
FS> I once compared REP MOVSD to this method:
FS> loop_000:
FS> mov eax,ds:[si]
FS> add si,4
FS> mov es:[di],eax
FS> add di,4
FS> dec cx
FS> jnz loop_000
FS> And REP MOVSD _was_ the slowest of the two.
Sounds unlikely. Even by simply adding the separate instruction clock cycles
you can calculate that the long method has to be a lot slower, and that's
without taking into account the fact that the CPU has to fetch many more
instructions. You were moving 4-byte-aligned addresses weren't you ?
greetz,
Tom
tomtorfs@mail.dma.be
--- timEd/2 1.10+
---------------
* Origin: 80X86 BBS 32-15-24.62.32 V.34/V.FC (24h/24h) (2:292/516)
|