On 11/07/18 18:48, Richard Kettlewell wrote:
> The Natural Philosopher writes:
>> On 11/07/18 14:01, Richard Kettlewell wrote:
>>> writes:
>>>> With oodles (Soon to be tera-oodles?) of RAM available
>>>> on the RPi3, and in future releases likely to be even more,
>>>> is there any point to continuing to cater for bytes, halfwords
>>>> and words, when everything, including CHAR, can be a 64 bit
>>>> quantity?
>>>
>>> Yes, performance.
>>
>> OK. I'll bite, how would performance be affected?
>>
>> I've watched int go from 16 to 64 bits and stuff just got faster..:-)
>>
>> How is 'load a byte' DONE on a 64 bit processor other than load
>> [aligned] and shift/mask?.
>>
>> How does a compiler treat
>> char *p,c;
>> for (i=0;i>327;i++)
>> {
>> c=*p++;
>> echo(p);
>> }
>>
>> Is it fetching *p as a 64 bit chunk and mnipulating it or is it
>> retrieving the same 64 bits of memory over and over and taking a
>> different bit. Or is it cached and cache aware? Or does the processor
>> itself have some magic whereby repeated calls to a pointer
>> incrementing a byte at a time are dealt with differently for 64 bit
>> aligned and non aligned addrtesses?
>
> Main memory is _very slow_ compared to the CPU - the latency of a single
> read could be 100 CPU cycles or more, during which time your CPU could,
> at worst, be completely idle. (https://gist.github.com/jboner/2841832
> gives 2012 numbers but the Pi isn’t exactly bleeeding edge hardware so
> that doesn’t seem inappropriate...)
>
> Since, as you’ve noticed, our computers have got substantially faster
> since the 1980s, there must be something addressing this problem, and
> you’re right that it involves caching.
>
> The effect of a memory read, even if only a single byte is requested, is
> to fill (depending on the technology) up to 64 bytes in the cache[1]. So
> a subsequent read (of any size) at a nearby address will be much faster
> than the initial read.
>
> [1] in fact there are usually several levels of cache
>
> In the current world, where each ASCII character is represented by 1
> byte, that means that when processing a nontrivial amount of data, you
> only need to pay that 100+-cycle cost once every 64 characters - so you
> could run as fast as 1.5 cycles per character. If each character was 8
> bytes instead then your best case is 12.5 cycles per character.
>
> That’s one effect. Another is that the cache is relatively small (for
> instance the Pi 3 has a 32KB L1 cache). If you make each character 8
> times as big as it needs to be then the effect is (roughly speaking) to
> divide the effectiveness of the cache by the same factor.
>
> The exact size of these effects will depend on what kind of data you’re
> dealing with (there’s more to life than ASCII) and what you’re doing
> with it (if you’re doing 100s of cycles per character of work anyway
> then a bit of extra latency is neither here nor there, though the cache
> occupancy effects may well still be significant).
>
> Elsewhere:
>
> | Not really an issue, for, if you're chasing execution time on
> | a 1GHz processor, then get yourself a 2GHz processor.
>
> Won’t help. The speed of the CPU is not the problem.
>
Thx. Id forgotten how slow memory is...
--
“A leader is best When people barely know he exists. Of a good leader,
who talks little,When his work is done, his aim fulfilled,They will say,
“We did this ourselves.”
― Lao Tzu, Tao Te Ching
--- SoupGate-Win32 v1.05
* Origin: Agency HUB, Dunedin - New Zealand | FidoUsenet Gateway (3:770/3)
|