PE> One other thing that hasn't been taken into account here is that
PE> dword accesses are meant to be faster if they're dword aligned.
PE> There is no logic in here to access stuff on a dword boundary,
PE> and do something else for the odd <= 3 bytes either end. Do
PE> you think that would make a difference?
Jesus fucking Christ! I've been doing extensive work and test
on PDPCLIB today, and hours on one particular problem. PDPCLIB
is absolutely creaming the opposition, except for ONE spot. And
that was when I had a buffer of 4096, and was reading 1024 bytes
at a time. When that happened, the opposition was getting in at
5% - 10% improvement in ELAPSED time of my whole program (reading
from a RAM disk). This is straight reading from disk into
an internal buffer, and then copying from memory to memory, a
fixed length, unlike the previous problem.
Anyway, after much trying and trying to get my code to go faster,
I eventually tried, just on the offchance, I didn't really think
it would make a difference, I tried padding my memory buffer to
a doubleword boundary, and voila, problem solved!!! God only
knows what the REAL % difference is of the REP MOVSD or MOVD or
whatever it is!
BFN. Paul.
@EOT:
---
* Origin: X (3:711/934.9)
|