| TIP: Click on subject to list as thread! | ANSI |
| echo: | |
|---|---|
| to: | |
| from: | |
| date: | |
| subject: | memmove |
Here's a message from a while back, in the internet newsgroup
comp.lang.asm.x86. The code demonstrates how to align to a doubleword
boundary before using REP MOVSD (appears that the source is better aligned,
if both are not already on dword boundaries).
This was a good summary and conclusion to a looong thread with much
disinformation and bickering.
Ä [211] Area COMP.LANG.ASM.X86 (3:635/727.1) ÄÄÄ
Msg : 200 of 500 -190 +202
From : qed{at}xenon.chromatic.com 3:633/243.100 Fri 28 Jul 95 05:35
To : All Sun 30 Jul 95 11:31
Subj : Re: rep movsw problem
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
{at}MSGID: 3:633/243.100 000077ba
{at}REPLYTO 3:633/243.100 UUCP
{at}REPLYADDR qed{at}xenon.chromatic.com
{at}PID GIGO unreg at foxy vsn 0.99.950303
From: Paul Hsieh
{at}Newsgroups: comp.lang.asm.x86,rec.games.programmer
Subject: Re: rep movsw problem
{at}Date: 27 Jul 1995 18:35:44 GMT
Organization: Zocalo Engineering - Berkeley, California, USA
{at}Lines: 62
{at}Message-Id:
{at}References:
{at}Nntp-Posting-Host: paulh.chromatic.com
{at}Mime-Version: 1.0
{at}Content-Type: text/plain; charset=us-ascii
{at}Content-Transfer-Encoding: 7bit
{at}X-Mailer: Mozilla 1.1N (Windows; I; 32bit)
{at}Xref: melbourne.DIALix.oz.au comp.lang.asm.x86:3037 rec.games.programmer:14155
My GOD! Everyone! This is an extremely trivial question. Let me
just state a few things:
(1) The original code as posted was *CORRECT*! Except that s/he
misspelled "movsw" as "mosvw". His/her problem
must be related
to something else.
(2) RCL is *NOT* a synonym or acronym for SHL. Any substitution of
ROR, or RCL or SAR or ROL or anything of that nature will cause
the originally posted program to have unexpected behavior (dependent
on the unstated value of the carry flag upon entry to the routine.)
(3) SHR and SHL *DO* set the carry flag with the value of the bit that
fell off the end.
(4) The value of CX after a rep movs operation has completed is guaranteed
to be 0. Hence "adc cx,cx" become equivalent to "movzx
cx,cf" (if you
get my meaning.)
(5) If CX=0 rep movs will copy 0 bytes (not 65536.)
What I want to know is, how can such a simply question cause so much net
traffic? From the posts I saw, it looked like the net was batting a
little under 50% there.
My final comment (since I have not acutally contributed anything that you
people shouldn't have already known) is that the 32 bit version of that
code which I use is as follows:
void rep_movsb(char * Src, char * Dest, unsigned int Len);
#pragma aux rep_movsb = \
" mov eax, ecx " \
" lea ecx, [edi+3] " \
" not ecx " \
" and ecx, 3 " \
" sub eax,ecx " \
" jle short LEndBytes " \
" rep movsb " \
" mov ecx, eax " \
" and eax, 3 " \
" shr ecx, 2 " \
" rep movsd " \
"LEndBytes: add ecx, eax " \
" rep movsb " \
parm [ESI] [EDI] [ECX] \
modify [EAX ECX ESI EDI];
Please excuse the WATCOM C/C++ style inline but I don't imagine anyone
will have difficulty with it. Its not nearly as elegant, but its break
neck fast. I posted this a while back hoping somebody might see some
improvements, but nobody has yet come forward. I submit this as another
opportunity. Otherwise I'll just have to live with its performance. :)
(I'll give you guys something to start you going: not ecx is generally a
slower command than xor ecx,FFFFFFFFh on pentiums ... )
--
Paul Hsieh
qed{at}xenon.chromatic.com
What I say and what my company says is not always the same
---
* Origin: Jelly-Bean software development, Melbourne AUST. (3:635/727.1)SEEN-BY: 50/99 632/103 348 998 633/371 634/384 635/402 503 544 727 638/102 SEEN-BY: 639/252 640/230 690/718 711/401 410 413 430 808 809 934 713/888 SEEN-BY: 800/1 7877/2809 @PATH: 635/727 632/348 635/503 50/99 711/808 809 934 |
|
| SOURCE: echomail via fidonet.ozzmosis.com | |
Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.