TIP: Click on subject to list as thread! ANSI
echo: rberrypi
to: DENNIS LEE BIEBER
from: MICHAEL J. MAHON
date: 2018-04-18 18:03:00
subject: Re: Hopefully one last qu

Dennis Lee Bieber  wrote:
> On Wed, 18 Apr 2018 09:46:30 +0100, Gareth's Downstairs Computer
>  declaimed the following:
>
>> In the days of DEC minicomputers, one had a
>> Programmer's Card which showed all the instructions
>> in their binary form. (I've still a couple for
>> the PDP11 in the archive somewhere)
>>
>> Is there such a summary available for the 64 bit
>> instruction set of the Pi3's A53 processors?
>>
>  Other than a few pages in the spec-sheet -- unlikely...
>
>  In a way, modern processors aren't meant to be programmed at the
> assembly level. One pretty much needs the backend of a compiler
> optimization to get anything usable -- with the result that only the
> compiler authors tend to need to know the target instruction set, and they
> may be working from a machine-readable specification to automate that step
> too.
>
>  It doesn't help that the ARM instruction set has become something that
> makes the Xerox Sigma-6 mainframe from my college look like a RISC
> processor! I mean, look at this ARM example
>
> ADD W0, W1, W2, LSL #3
>
> One destination register, TWO source registers, and a shift operation to be
> applied to the value of the second source register! The only thing making
> that remotely RISC is that none of the source/destination are memory
> addresses -- and the ARM requires separate load/store operations to touch
> memory
>


You are looking in the wrong place for simplicity.

“Reduced Instruction Set Computing” is an unfortunate misnomer, because it
focuses on the instruction set, not the “Reduced Complexity” of the
hardware that implements it.

Virtually all reduced complexity machines have data manipulation
instructions based on a register file with two read ports and one write
port, thus supporting three-register instructions which can be executed in
a single cycle. The data path includes an ALU and a barrel shifter, and the
instruction encoding typically allows anything useful that can be done with
them.

All data references are done with load and store instructions, permitting
compilers to schedule longer latency instructions well ahead of references
to the data values.

Since byte addressing is standard and data path width is 32- or 64-bits, it
is extremely common to multiply addresses by small powers of 2, independent
of the barrel shifter. Since only shifts of, say, 1 to 3 bits are common,
these shifts are usually performed by a multiplexer on the side of the ALU
which does not contain the complementer, thus balancing the delay of the
complementer.

The result is relatively simple hardware which is capable of many simple
operations in a single cycle instruction.

The simplicity of the data path is matched by the simplicity of the
cache/memory interface, which allows only accesses aligned with natural
data unit boundaries. This permits any memory transfer to be done in a
single cache/memory access.

Of course, now all high performance machines dispatch multiple instructions
per cycle and are almost universally MP-capable, so functional units and
register file ports are multiplied and caches are much cleverer, but even
these machines are much simpler than if they permitted, for example,
unaligned word references (as other than an exception).

It would have been much less confusing if Berkeley had chosen Reduced
Complexity Computer for the flagship implementation, but that’s history.

--
-michael - NadaNet 3.1 and AppleCrate II:  http://michaeljmahon.com

--- SoupGate-Win32 v1.05
* Origin: Agency HUB, Dunedin - New Zealand | FidoUsenet Gateway (3:770/3)

SOURCE: echomail via QWK@docsplace.org

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.