TIP: Click on subject to list as thread! ANSI
echo: z3_pascal
to: Bob Lawrence
from: Frank Malcolm
date: 1996-05-09 11:31:04
subject: StringToInteger

Hi, Bob.

BL>  CG>  BL> function StrInt(s: string): integer;
BL>  CG>  BL> var
BL>  CG>  BL> i, size: integer;
BL>  CG>  BL> begin
BL>  CG>  BL>    i := 1; size := 0;
BL>  CG>  BL>    while s[i] in ['0'..'9'] do begin
BL>  CG>  BL>       size := size * 10 + (Ord(s[i]) - 48);
BL>  CG>  BL>       inc(i);
BL>  CG>  BL>    end;
BL>  CG>  BL>    StrInt := size;
BL>  CG>  BL> end;

BL>  CG> You might save a smidgen of time passing a pointer into the
BL>  CG> function rather than the string itself

BL>  FM> Colin is right here, it would save time and it would be a
BL>  FM> smidgin. :-)

BL>   Actually, it nearly halved it. I was amazed.

The speed increase would have come because the string had to be copied
to the stack which (at least in TP6) was a couple of function calls plus
the copying, from memory. The pointer is I think passed in DX:AX and
even if on the stack is only 4 bytes. I thought of that later but didn't
go back and revise my comment.

BL>  FM> Instead of a pointer, you could just declare the parameter
BL>  FM> const.

BL>   Eh? What's the point of converting a constant to an integer? I'd
BL> already know the answer.

You miss the point. And I don't know if BP7 allows parameters to be
declared as const anyway. Look it up in Delphi to see what I mean.
Basically if you declare a parameter const it's not passed on the stack
but a pointer is passed (transparently to you). It can do this because
the compiler can check that you're not modifying that string within the
called function.

If BP7 doesn't allow const parameters use the pointer anyway.

BL>  CG> I think a set might use a fixed number of bytes regardless of
BL>  CG> the number of elements you use. perhaps you should look into
BL>  CG> other methods of testing each value...
BL>  FM> He's right here, too - but it's probably worse than that. The
BL>  FM> constant set is probably stored in the minimum number of bytes,
BL>  FM> but then expanded to the maximum 32 bytes before being passed
BL>  FM> to the set membership routines - which itself is a function
BL>  FM> call. Use...

BL>  while (s[i] > = '0') and (s[i] <= '9') do

BL>   That's how I started, and it's slower than the set. In fact, a set
BL> was what I tried last... and it's the fastest of all. I declared a
BL> constant set and it's slower! God only knows why. Borland must do
BL> something fiendishly clever with their sets. I love 'em.

OK, the generated code must have changed from TP6, that would certainly
be possible as the set handling code was fairly atrocious there. Looks
like I'll have to check out the code from BP7 as I'm still surprised
that it's faster than the 2 tests. I'll let you know. Unfortunately I
think my dis-assembler only works on TP6 & TPW1.5 TPUs, so I'll have to
use the debugger. What was the time difference in your tests?

BL>  FM> This by itself will give you an amazing speed improvement, if
BL>  FM> the code in BP7/Delphi which you're using is anything like that
BL>  FM> which was generated in TP6, the last one I looked at in detail.

BL>   Delphi has a StrToInt function. I'm in BP7.

You'd have the same problem. I suspected that StrToInt just called Val
and I was right. Here's the complete code for StrToInt from the RTL
source...

function StrToInt(const S: string): Longint;
var
  E: Integer;
begin
  Val(S, Result, E);
  if E  0 then ConvertError(FmtLoadStr(SInvalidInteger, [S]));
end;

BL>  CG> You might also need something in there to tell it when it's
BL>  CG> reached the end of the string.

BL>  FM> And he's right here, too - add...

BL>  FM> and (i <= length (s))

BL>   I've structured my string with the numbers upfront to reduce the
BL> loop, and if it doesn't find any numbers it stops anyway.

That's not the point we're trying to cover here, it's the opposite -
what if your string contained *all* numbers, say '1234'? You'd have a
subtle bug which might not appear in your testing. The chance (246/256)
is that the string would be followed in memory by a non-digit and you'd
be right, mate. One day (or 10 days out of 256) it would be followed by
a digit and you'd end up with for example 12349 as your answer.

BL> If I were writing a proper StrToInt I'd have to stop the integer
BL> overflowing too.

BL>  FM> ... to that while statement - you *might* get a string with all
BL>  FM> numbers. Oh, and for safety you should be checking that you
BL>  FM> don't overflow 32767.

BL>   See?

Smartarse. :-)

I think you should include it, it won't be a very expensive test if you
do it as in my example - even though that's not perfect as I commented,
only allowing numbers up to 32759.

BL>  FM> Finally, don't declare Size, and use Result := Result + etc,
BL>  FM> Result := 0 before that and omit the final assignment.

BL>   Result? BP7 doesn't have one. This isn't Delphi, you know... this is
BL> the agricultural side of Pascal.

OK, I didn't know that. Actually BP7 does have it, but it's only
available from within BASM.

BL>  FM> A late-breaking thought - remove the overflow test and put it
BL>  FM> in an exception block.

BL>   ROFL! If I were using Delphi, I'd use StrToInt()! I keep telling you
BL> Pascal sucks but you don't listen!

StrToInt wouldn't help you, see above. Pascal doesn't suck, it evolves.
I bet all that stuff will be in BP8 - August?

Actually I wasn't sure at the time whether you were using BP7 or Delphi
on this one, so I didn't bother coding the variant using exceptions.

BL>  FM> Another one - have you already stripped whitespace *before* the
BL>  FM> number? If not you might want to put another for loop before
BL>  FM> that one, with a Break if s[i] isn't space or tab (and maybe
BL>  FM> even Cr & LF).

BL>   If I wanted something that slow, I could use an abacus.

Somehow you've got to get rid of whitespace before your numbers. If
that's already been done in the calling routine (or by the nature of the
data wouldn't happen anyway) fine, you obviously don't have to do it
here. You've now said above that you've "structured my string with the
numbers upfront to reduce the loop".

Now I just went and had a look at the (ASM) source for Val, and it
appears at a quick glance to step over leading spaces for you anyway.
But not other forms of whitespace.

BL>  FM> And if speed is *that* important, do it in BASM. :-)

BL>   Yair... I'll get around to that eventually, as soon as I can write a
BL> program that doesn't FORMAT C: every time I use pointers.

Why? What's wrong with that?

It's a piece of piss. I might even do it for you if I'm bored. After
I've checked out the BP7 generated code for sets.

You could start with the source from the RTL (which I just went and
looked at), which however is complicated by handling leading + & -, and
allowing for hex values (leading $ after a possible + or -) but you
could strip these things out. It's also complicated because it returns a
longint not an integer, so instead of just doing "* 10" they do a series
of shifts and rotates through AX & DX.

Regards, fIM.

 * * Why isn't phonetically spelled that way?
@EOT:

---
* Origin: Pedants Inc. (3:711/934.24)
SEEN-BY: 633/267 270
@PATH: 711/934 809 808 50/99 635/544 727 633/267

SOURCE: echomail via fidonet.ozzmosis.com

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.