TIP: Click on subject to list as thread! ANSI
echo: rberrypi
to: PANCHO
from: THE NATURAL PHILOSOPHER
date: 2021-01-05 10:51:00
subject: Re: AI and decompilation?

On 04/01/2021 23:00, Pancho wrote:
> On 04/01/2021 22:50, Dan Espen wrote:
>> Pancho  writes:
>>
>>> On 04/01/2021 17:51, gareth evans wrote:
>>>> On 04/01/2021 13:08, Pancho wrote:
>>>>> On 04/01/2021 11:00, gareth evans wrote:
>>>>>> Thinking back to my first job, nearly 50 years ago now,
>>>>>> when I had to dis-assemble DEC's paper tape BASIC
>>>>>> interpreter in order to enhance it, I guess that
>>>>>> dis-assemblers and decompilers must now be ten-a-penny,
>>>>>> especially for programs running under Windows where
>>>>>> the structure of Windows programs is well-known with
>>>>>> an assumption that C was the source language?
>>>>>>
>>>>>> But I wonder if Artificial Intelligence could, after
>>>>>> being fed with numerous instruction sets, take a
>>>>>> block of binary, and analyse its source without
>>>>>> any prior knowledge of the instruction set?
>>>>>>
>>>>>> I am particularly interested in the Binary Blob
>>>>>> provided for Raspberry Pi computers, with a view to
>>>>>> getting detailed knowledge of the video processors
>>>>>> employed therein.
>>>>>>
>>>>> I think a lot of the problem is defining the question.
>>>>>
>>>>> What do you want it to do?
>>>>>
>>>> I don't want it to do anything. I want to play at a low level
>>>> with the thing ... large oaks from little acorns grow.
>>>>
>>>
>>> Play with what thing? What is an instruction set, what is the Binary
>>> Blob? Why do you need an AI?
>>>
>>> Most compilers leave fingerprints on executables you don't need an AI
>>> to detect them. I remember decompiling in the early 80's but complex
>>> modern code can often be a challenge to naively reverse engineer a
>>> high level understanding from even if you do have source code. Take
>>> away sensible variable and function names and you are stuffed.
>>
>> I've had more than one experience in putting those meaningful variable
>> names right back.  It's actually pretty easy, a somewhat rote process.
>> Find the read input instruction.  Since you know the layout of the input
>> record, you now have labels to many of the references to that input
>> area.
>>
>> I think you can work out how to proceed.
>>
>>
> Without the source how do you know any meaningful variable names in the
> first place?

Well you have hints. From what  the code does...lets say you have code
that loads data from two stack based memory locations adds them together
and used then to access what is clearly an array, - that gives a strong
hint that the original variables can be integers, and the index one is
simply a temporary way to get a value into that array, so you call that
'i' or 'arrayIndex' pro tem...

Then once you have an idea as to what data that array holds, you can
update it and the index to something more meaningful.

The whole process is actually covered in philosophy: It is the problem
of induction. How do you work back from results to causes?

Given that the answer to Life The Universe and Everything was '42', what
in fact was the question? (40+2)? (6x7)?

There are an infinite number of expressions that give that answer, and
an infinite number that don't.

This is where Karl Poppers philosophy of science steps in. Instead of
regarding there to be One True Reason why science works, namely that
scientists are in the business of discovering the Truth, he pointed out
that just because stuff worked (and 6x7 does indeed give 42) that was no
reason to suppose that some other completely different construct might
not work equally as well, and that had indeed happened with relativity
and Newtonian gravity.

The Problem of Induction is that many theories can give the same
predicted result. Sherlock Holmes is a sham. The Dog That Didnt Bark in
the Night didn't bark, allegedly, because it knew the thief. Why? It
might have been abducted by aliens, drugged, actually out hunting
rabbits, in a soundproof box, or the Russians did it using a robot. or
just too plumb wore out with old age to care.

The truth is not provable. All we have is stuff that works. Given
running machine code, there are an infinite number of source codes that
might have produced it, and an infinite number that did not.

We aren't there, ultimately, to reproduce *the* exact source, but to
arrive at *an* editable source, that we can use.
Like science, and religion, it doesn't have to be true, to be useful,
and like science, and religion, its ultimate content will be forever
truth-indecidable.

--
"First, find out who are the people you can not criticise. They are your
oppressors."
      - George Orwell

--- SoupGate-Win32 v1.05
* Origin: Agency HUB, Dunedin - New Zealand | FidoUsenet Gateway (3:770/3)

SOURCE: echomail via QWK@docsplace.org

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.