Gordon Henderson wrote:
> And now I'm thinking - does the Piv4 (and v3) have a 64-bit data bus, or
> does it need 2 memory cycles at 32-bit each to read a 64-bit value. I'd
> like to think it has a 64-bit (or wider) databus, but I really don't know.
All Pis still have a 32 bit wide external DRAM interface, so that aspect
doesn't change, although it goes from LPDDR2 to LPDDR4 on the Pi 4 (so
higher memory bandwidth).
That will feed the last-level cache. That's shared with the GPU and I'm not
sure if details are public, but as far as the CPU is concerned:
(cut and paste from Arm's descriptions)
Pi 2 v1.0: Cortex A7
Fixed [L2 cache] line length of 64 bytes [512 bits]
The L2 memory system interfaces with an AMBA AXI Coherency Extension (ACE)
interconnect on a 128-bit wide bus.
Pi 3: Cortex A53
512 bit wide fetch path from L2
A single 128-bit wide master interface to external memory
Pi 4: Cortex A72
Fixed [L2 cache] line length of 64 bytes [512 bits]
Configurable 128-bit wide ACE or 128-bit wide CHI interface [to external
memory]
The description of the Pi 1's ARM1176 is confusing as it appears the L2 has
four separate downstream ports, three 64 bit and one 32 bit. However I
think the best you could do is 64 bit fetches.
So no change across the Pi families, possibly excepting the Pi 1. As far as
the CPU core is concerned you can make 32, 64 or 128 bit accesses from cache
in a single cycle. The bottom of the L2 can make 128 bit wide requests from
DDR memory, which will cause a burst of four 32 bit requests back-to-back.
The Pi 4 has LPDDR4-3200, which means those 4 requests will take:
4*(1/(3200*10^6)) = 1.25ns
Generally the setup time for a DRAM is the expensive part, so once you have
the request in the extra cycles for fetching multiple words don't cost much
more.
(I can't seem to find the data sheet for the Pi 4's DRAM part to see what
the initial latency is)
> This is something that's making my retro implementation of BCPL on a
> 65816 somewhat slower than it might be - the '816 is touted as a 16 bit
> CPU, but it's still locked into an 8-bit data bus.
But you'll presumably win with not burning lots of cycles doing 32 bit
arithmetic. Although lack of registers will likely be a pain.
Theo
--- SoupGate-Win32 v1.05
* Origin: Agency HUB, Dunedin - New Zealand | FidoUsenet Gateway (3:770/3)
|