| TIP: Click on subject to list as thread! | ANSI |
| echo: | |
|---|---|
| to: | |
| from: | |
| date: | |
| subject: | movsb |
Hi, Paul.
PE> PE> One other thing that hasn't been taken into account here is that
PE> PE> dword accesses are meant to be faster if they're dword aligned.
PE> PE> There is no logic in here to access stuff on a dword boundary,
PE> PE> and do something else for the odd <= 3 bytes either end. Do
PE> PE> you think that would make a difference?
PE> Jesus fucking Christ! I've been doing extensive work and test
PE> on PDPCLIB today, and hours on one particular problem. PDPCLIB
PE> is absolutely creaming the opposition, except for ONE spot. And
PE> that was when I had a buffer of 4096, and was reading 1024 bytes
PE> at a time. When that happened, the opposition was getting in at
PE> 5% - 10% improvement in ELAPSED time of my whole program (reading
PE> from a RAM disk). This is straight reading from disk into
PE> an internal buffer, and then copying from memory to memory, a
PE> fixed length, unlike the previous problem.
PE> Anyway, after much trying and trying to get my code to go faster,
PE> I eventually tried, just on the offchance, I didn't really think
PE> it would make a difference, I tried padding my memory buffer to
PE> a doubleword boundary, and voila, problem solved!!! God only
PE> knows what the REAL % difference is of the REP MOVSD or MOVD or
PE> whatever it is!
It sure does make a difference. Here's another program. The minimum time
for Test 2 was 22.41 seconds for 50,000 iterations, the minimum for Test
3 was 31.96 seconds. This was running under Windows, I couldn't be
bothered exiting to DOS and running it.
Regards, FIM.
program PaulTst2;
uses Dos; { for GetTime }
const LoopMax = 50000;
var AlignedBuffer: array [1..4096] of char;
Filler: byte;
UnalignedBuffer: array [1..4096] of char;
var LoopCount: longint;
H1, M1, S1, Hund1: word;
H2, M2, S2, Hund2: word;
Elapsed: longint;
SPA, SPU: pointer;
Test: byte;
begin
WriteLn;
SPA := Addr (AlignedBuffer);
SPU := Addr (UnalignedBuffer);
Test := 0;
GetTime (H1, M1, S1, Hund1);
for LoopCount := 1 to LoopMax do
begin
{ Comment out all tests except the one you're testing.
You can easily do this by adding or removing the closing
brace at the end of the line describing the test. Also
set the 'Test' variable, so the output makes sense. }
{ Test 1 - nothing in the loop at all
asm
end; {}
{ Test 2 - copy aligned buffer to itself
asm
lds si,spa
les di,spa
mov cx,4096 / 4
{at}L1:
db $66; mov ax,[si]
db $66; mov [di],ax
add si,4
add di,4
loop {at}L1
end; {}
{ Test 3 - copy unaligned buffer to itself }
asm
lds si,spu
les di,spu
mov cx,4096 / 4
{at}L1:
db $66; mov ax,[si]
db $66; mov [di],ax
add si,4
add di,4
loop {at}L1
end; {}
end;
GetTime (H2, M2, S2, Hund2);
Elapsed := (H2 * 360000 + M2 * 6000 + S2 * 100 + Hund2) -
(H1 * 360000 + M1 * 6000 + S1 * 100 + Hund1);
WriteLn ('Test ', Test, ': ', Elapsed / 100:0:2, ' seconds for ',
LoopMax, ' iterations');
end.
* * God made the first garden, Cain the first city.
@EOT:
---
* Origin: Pedants Inc. (3:711/934.24)SEEN-BY: 690/718 711/809 934 |
|
| SOURCE: echomail via fidonet.ozzmosis.com | |
Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.