TIP: Click on subject to list as thread! ANSI
echo: rberrypi
to: THE NATURAL PHILOSOPHER
from: MICHAEL J. MAHON
date: 2018-07-11 16:51:00
subject: Re: SIXTYFORTH?

The Natural Philosopher  wrote:
> On 11/07/18 14:01, Richard Kettlewell wrote:
>> Gareth's Downstairs Computer
>>  writes:
>>> With oodles (Soon to be tera-oodles?) of RAM available
>>> on the RPi3, and in future releases likely to be even more,
>>> is there any point to continuing to cater for bytes, halfwords
>>> and words, when everything, including CHAR, can be a 64 bit
>>> quantity?
>>
>> Yes, performance.
>>
> OK. I'll bite, how would performance be affected?
>
> I've watched int go from 16 to 64 bits and stuff just got faster..:-)
>
> How is 'load a byte' DONE on a 64 bit processor other than load
> [aligned] and shift/mask?.
>
> How does a compiler treat
> char *p,c;
> for (i=0;i>327;i++)
>  {
>  c=*p++;
>  echo(p);
> }
>
> Is it fetching *p as a 64 bit chunk and mnipulating it or is it
> retrieving the same 64 bits of memory over and over and taking a
> different bit. Or is it cached and cache aware? Or does the processor
> itself have some magic whereby repeated calls to a pointer incrementing
> a byte at a time are dealt with differently for 64 bit aligned and non
> aligned addrtesses?
>
> I honestly would like to know...
>
> Loking at ARM it appears that address registers can be aligned on 8 bit
> lines.
>
> Is the retrieval of a byte slower in 'unaligned on 64 bit'  boundaries?
>
> One deaqls with hardwqare so rarely these days
>
> Stackexchange  found this
>
>
>
> "Here's what the Intel x86/x64 Reference Manual says about alignments:
>
>     4.1.1 Alignment of Words, Doublewords, Quadwords, and Double Quadwords
>
>     Words, doublewords, and quadwords do not need to be aligned in
> memory on natural boundaries. The natural boundaries for words, double
> words, and quadwords are even-numbered addresses, addresses evenly
> divisible by four, and addresses evenly divisible by eight,
> respectively. However, to improve the performance of programs, data
> structures (especially stacks) should be aligned on natural boundaries
> whenever possible. The reason for this is that the processor requires
> two memory accesses to make an unaligned memory access; aligned accesses
> require only one memory access. A word or doubleword operand that
> crosses a 4-byte boundary or a quadword operand that crosses an 8-byte
> boundary is considered unaligned and requires two separate memory bus
> cycles for access.
>
>     Some instructions that operate on double quadwords require memory
> operands to be aligned on a natural boundary. These instructions
> generate a general-protection exception (#GP) if an unaligned operand is
> specified. A natural boundary for a double quadword is any address
> evenly divisible by 16. Other instructions that operate on double
> quadwords permit unaligned access (without generating a
> general-protection exception). However, additional memory bus cycles are
> required to access unaligned data from memory.
>
> Don't forget, reference manuals are the ultimate source of information
> of the responsible developer and engineer, so if you're dealing with
> something well documented such as Intel CPUs, just look up what the
> reference manual says about the issue."
>
> So that implies that whilst you vcan get 64 bit chunks alignbed on any
> address, it pays not to.
>
> For ARM
> 4.2.2. ARMv6 extensions
>
> ARMv6 adds unaligned word and halfword load and store data access
> support. When enabled, one or more memory accesses are used to generate
> the required transfer of adjacent bytes transparently, apart from a
> potentially greater access time where the transaction crosses a
> word-boundary.
>
> The memory management specification defines a programmable mechanism to
> enable unaligned access support. This is controlled and programmed using
> the CP15 register c1 U bit, bit 22.
>
> Non word-aligned load and store multiple, double, semaphore,
> synchronization, and coprocessor accesses always signal Data Abort with
> an Alignment fault status code when the U bit is set.
>
> Strict alignment checking is also supported in ARMv6, under control of
> the CP15 register c1 A bit, [bit 1], and signals a Data Abort with an
> Alignment fault status code if a 16-bit access is not halfword aligned
> or a single 32-bit load/store transfer is not word aligned.
>
> ARMv6 alignment fault detection is a mandatory function associated with
> address generation rather than optionally supported in external memory
> management hardware.
>
> So 64 bit unaligned access are slower
>
>
> What I havent found out is what a processor does with byte access.
>
> Is it a case that e.g. it fetches 64 bits and uses the lower 3 addess
> bits to index into the 64 bit quantity and shift it?
>
> And does it repeat the memory access to get the next 8 bits or not?
>
> It seems that addresses are always byte addtresses as far as code is
> concerned, so 64 bit  computers must 'lose' the 3 LSBS when doing bus
> accesses and sort the rest out in microcode.
>
>

Virtually all modern processors have multi-level caches.

A reference to an address not yet in cache will result in a cache fault at
all levels, causing a main memory access that transfers a cache line of
data to the primary (largest, slowest) cache, and the next level cache to
receive its (typically smaller) line of data containing the referenced
word(s).  This continues until the smallest, fastest level 0 cache is
loaded with the referenced word, which is usually bypassed directly to the
processor’s register file (which can be thought of as the ultimate cache,
managed by the compiler).

Every doubling of data size effectively halves the size of all caches and
data memory, so the performance cost is considerable for any program that
stresses any level of cache.

And don’t expect Moore’s “Law” to save you. We are past the point of
increasing clock frequency—now all the density improvements just deliver
more cores on a chip, so unless you love parallel algorithms, you’re out of
luck. ;-(

--
-michael - NadaNet 3.1 and AppleCrate II:  http://michaeljmahon.com

--- SoupGate-Win32 v1.05
* Origin: Agency HUB, Dunedin - New Zealand | FidoUsenet Gateway (3:770/3)

SOURCE: echomail via QWK@docsplace.org

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.