TIP: Click on subject to list as thread! ANSI
echo: rberrypi
to: RICHARD KETTLEWELL
from: THE NATURAL PHILOSOPHER
date: 2018-07-11 15:36:00
subject: Re: SIXTYFORTH?

On 11/07/18 14:01, Richard Kettlewell wrote:
> Gareth's Downstairs Computer
>  writes:
>> With oodles (Soon to be tera-oodles?) of RAM available
>> on the RPi3, and in future releases likely to be even more,
>> is there any point to continuing to cater for bytes, halfwords
>> and words, when everything, including CHAR, can be a 64 bit
>> quantity?
>
> Yes, performance.
>
OK. I'll bite, how would performance be affected?

I've watched int go from 16 to 64 bits and stuff just got faster..:-)

How is 'load a byte' DONE on a 64 bit processor other than load
[aligned] and shift/mask?.

How does a compiler treat
char *p,c;
for (i=0;i>327;i++)
 {
 c=*p++;
 echo(p);
}

Is it fetching *p as a 64 bit chunk and mnipulating it or is it
retrieving the same 64 bits of memory over and over and taking a
different bit. Or is it cached and cache aware? Or does the processor
itself have some magic whereby repeated calls to a pointer incrementing
a byte at a time are dealt with differently for 64 bit aligned and non
aligned addrtesses?

I honestly would like to know...

Loking at ARM it appears that address registers can be aligned on 8 bit
lines.

Is the retrieval of a byte slower in 'unaligned on 64 bit'  boundaries?

One deaqls with hardwqare so rarely these days

Stackexchange  found this



"Here's what the Intel x86/x64 Reference Manual says about alignments:

     4.1.1 Alignment of Words, Doublewords, Quadwords, and Double Quadwords

     Words, doublewords, and quadwords do not need to be aligned in
memory on natural boundaries. The natural boundaries for words, double
words, and quadwords are even-numbered addresses, addresses evenly
divisible by four, and addresses evenly divisible by eight,
respectively. However, to improve the performance of programs, data
structures (especially stacks) should be aligned on natural boundaries
whenever possible. The reason for this is that the processor requires
two memory accesses to make an unaligned memory access; aligned accesses
require only one memory access. A word or doubleword operand that
crosses a 4-byte boundary or a quadword operand that crosses an 8-byte
boundary is considered unaligned and requires two separate memory bus
cycles for access.

     Some instructions that operate on double quadwords require memory
operands to be aligned on a natural boundary. These instructions
generate a general-protection exception (#GP) if an unaligned operand is
specified. A natural boundary for a double quadword is any address
evenly divisible by 16. Other instructions that operate on double
quadwords permit unaligned access (without generating a
general-protection exception). However, additional memory bus cycles are
required to access unaligned data from memory.

Don't forget, reference manuals are the ultimate source of information
of the responsible developer and engineer, so if you're dealing with
something well documented such as Intel CPUs, just look up what the
reference manual says about the issue."

So that implies that whilst you vcan get 64 bit chunks alignbed on any
address, it pays not to.

For ARM
4.2.2. ARMv6 extensions

ARMv6 adds unaligned word and halfword load and store data access
support. When enabled, one or more memory accesses are used to generate
the required transfer of adjacent bytes transparently, apart from a
potentially greater access time where the transaction crosses a
word-boundary.

The memory management specification defines a programmable mechanism to
enable unaligned access support. This is controlled and programmed using
the CP15 register c1 U bit, bit 22.

Non word-aligned load and store multiple, double, semaphore,
synchronization, and coprocessor accesses always signal Data Abort with
an Alignment fault status code when the U bit is set.

Strict alignment checking is also supported in ARMv6, under control of
the CP15 register c1 A bit, [bit 1], and signals a Data Abort with an
Alignment fault status code if a 16-bit access is not halfword aligned
or a single 32-bit load/store transfer is not word aligned.

ARMv6 alignment fault detection is a mandatory function associated with
address generation rather than optionally supported in external memory
management hardware.

So 64 bit unaligned access are slower


What I havent found out is what a processor does with byte access.

Is it a case that e.g. it fetches 64 bits and uses the lower 3 addess
bits to index into the 64 bit quantity and shift it?

And does it repeat the memory access to get the next 8 bits or not?

It seems that addresses are always byte addtresses as far as code is
concerned, so 64 bit  computers must 'lose' the 3 LSBS when doing bus
accesses and sort the rest out in microcode.


--
In a Time of Universal Deceit, Telling the Truth Is a Revolutionary Act.

- George Orwell

--- SoupGate-Win32 v1.05
* Origin: Agency HUB, Dunedin - New Zealand | FidoUsenet Gateway (3:770/3)

SOURCE: echomail via QWK@docsplace.org

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.