On 05/01/2021 10:28, The Natural Philosopher wrote:
> Yes. I am certain that certain compilers and certain languages leave a
> fingerprint, Always THAT resister, used to do THAT job, always that
> particular sequence of assembly to mimic that high level construct.
They certainly do, I wrote !ARMalyser to analyse RISC OS executables and
to aid the conversion from the old 26 bit ARM mode to modern Aarch32. It
was very obvious if Norcroft C, GCC or handwritten assembly had been
used by looking at any chunk of the code, not just the obvious file headers.
> I think it is up to a limited point entirely possible to make an AI that
> could replace machine code with editable and compilable source code.
> But there will always be the Problem Of Induction. Many many possible
> constructs in source using an infinite number of random variable and
> function names, could compile to the same object code. And there is no
> way to reinstate the comments either, so it becomes an exercise
> ultimately in hand editing and reinstating the comments manually -
> almost as big a job as writing from scratch.
I was not attempting to turn the executable in to a high level language,
but to give the user as much help understanding the assembler code as
possible, to aid the conversion.
At the lowest level identifying what was code and what was data, easy in
well defined executable formats produced by compilers, but hard in
handwritten assembler, which had often used every trick in the book to
squeeze out performance on a 8MHz ARM2 with 512MB of RAM.
The next step was using knowledge of the Standard C Library functions
and SWI APIs to annotate the registers passed and returned from the APIs
and where those registers contain static addresses, the data blocks they
point to.
To allow code to be modified with additional instructions to recreate
flag preserving behaviour of the 26 bit code (in the few cases it is
actually necessary) and data added to make the larger 32 bit file
headers, all code and data addresses are identified and converted in to
labels.
ARMalyser outputs in the standard Object Assembler syntax so it can be
reassembled to produce an identical executable, and subsequently
modified. It can also add syntax colouring in various formats such as
XML, HTML/CSS for viewing.
If you were in marketing you could say the code which does this is 'AI',
but its really a huge chunk of tangled heuristics, which works well most
of the time, but occasionally miss-identifies code or data. Its a bit
too eager to identify code, due to the tricks assembler programmers
used, if I ripped all that out and only worked on compiler generated
executables, it would be a lot more reliable.
---druck
--- SoupGate-Win32 v1.05
* Origin: Agency HUB, Dunedin - New Zealand | FidoUsenet Gateway (3:770/3)
|