TIP: Click on subject to list as thread! ANSI
echo: public_domain
to: Paul Edwards
from: Frank Malcolm
date: 1996-01-09 10:07:20
subject: movsb

Hi, Paul.

PE> PE> One other thing that hasn't been taken into account here is that
PE> PE> dword accesses are meant to be faster if they're dword aligned.

PE> PE> There is no logic in here to access stuff on a dword boundary,
PE> PE> and do something else for the odd <= 3 bytes either end.  Do
PE> PE> you think that would make a difference?

PE> Jesus fucking Christ!  I've been doing extensive work and test
PE> on PDPCLIB today, and hours on one particular problem.  PDPCLIB
PE> is absolutely creaming the opposition, except for ONE spot.  And
PE> that was when I had a buffer of 4096, and was reading 1024 bytes
PE> at a time.  When that happened, the opposition was getting in at
PE> 5% - 10% improvement in ELAPSED time of my whole program (reading
PE> from a RAM disk).  This is straight reading from disk into
PE> an internal buffer, and then copying from memory to memory, a
PE> fixed length, unlike the previous problem.

PE> Anyway, after much trying and trying to get my code to go faster,
PE> I eventually tried, just on the offchance, I didn't really think
PE> it would make a difference, I tried padding my memory buffer to
PE> a doubleword boundary, and voila, problem solved!!!  God only
PE> knows what the REAL % difference is of the REP MOVSD or MOVD or
PE> whatever it is!

It sure does make a difference. Here's another program. The minimum time
for Test 2 was 22.41 seconds for 50,000 iterations, the minimum for Test
3 was 31.96 seconds. This was running under Windows, I couldn't be
bothered exiting to DOS and running it.

Regards, FIM.

program PaulTst2;
uses Dos; { for GetTime }
const LoopMax = 50000;
var AlignedBuffer: array [1..4096] of char;
    Filler: byte;
    UnalignedBuffer: array [1..4096] of char;

var LoopCount: longint;
    H1, M1, S1, Hund1: word;
    H2, M2, S2, Hund2: word;
    Elapsed: longint;
    SPA, SPU: pointer;
    Test: byte;
begin
WriteLn;
SPA := Addr (AlignedBuffer);
SPU := Addr (UnalignedBuffer);
Test := 0;
GetTime (H1, M1, S1, Hund1);
for LoopCount := 1 to LoopMax do
  begin
  { Comment out all tests except the one you're testing.
    You can easily do this by adding or removing the closing
    brace at the end of the line describing the test. Also
    set the 'Test' variable, so the output makes sense. }

  { Test 1 - nothing in the loop at all
  asm
  end; {}

  { Test 2 - copy aligned buffer to itself
  asm
  lds si,spa
  les di,spa
  mov cx,4096 / 4
  {at}L1:
  db $66; mov ax,[si]
  db $66; mov [di],ax
  add si,4
  add di,4
  loop {at}L1
  end; {}

  { Test 3 - copy unaligned buffer to itself }
  asm
  lds si,spu
  les di,spu
  mov cx,4096 / 4
  {at}L1:
  db $66; mov ax,[si]
  db $66; mov [di],ax
  add si,4
  add di,4
  loop {at}L1
  end; {}

  end;
GetTime (H2, M2, S2, Hund2);
Elapsed := (H2 * 360000 + M2 * 6000 + S2 * 100 + Hund2) -
           (H1 * 360000 + M1 * 6000 + S1 * 100 + Hund1);
WriteLn ('Test ', Test, ': ', Elapsed / 100:0:2, ' seconds for ',
         LoopMax, ' iterations');
end.

 * * God made the first garden, Cain the first city.
@EOT:

---
* Origin: Pedants Inc. (3:711/934.24)
SEEN-BY: 690/718 711/809 934

SOURCE: echomail via fidonet.ozzmosis.com

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.