TIP: Click on subject to list as thread! ANSI
echo: rberrypi
to: MARTIN@MYDOMAIN.INVALID
from: DAN CROSS
date: 2020-03-19 19:50:00
subject: Re: Regexes and C

In article ,
Martin Gregorie   wrote:
>On Thu, 19 Mar 2020 15:19:36 +0000, Dan Cross wrote:
>> What do you mean "doesn't provide any way to anchor a regex to either
>> end of a string"?  That's what the `^` and `$` metacharacters in the
>> regex are for, and they're fully supported by the library.
>>
>Just that:
>
>My original regex was
>
>"[a-zA-Z0-9][.a-zA-Z0-9_-]*@[a-zA-Z0-9][a-zA-Z0-9.]*[a-zA-Z0-9]*"
>
>and matched a string  containing "a bc@d.e", so I changed it to
>
>"^[a-zA-Z0-9][.a-zA-Z0-9_-]*@[a-zA-Z0-9][a-zA-Z0-9.]*[a-zA-Z0-9]*$"
>
>and it *still* matched that string.

Hmm.  Not on my system:

: gaja; cat re.c
#include 

#include 
#include 
#include 

const char *RE =
"^[a-zA-Z0-9][.a-zA-Z0-9_-]*@[a-zA-Z0-9][a-zA-Z0-9.]*[a-zA-Z0-9]*$";

int
main(int argc, char *argv[])
{
        regex_t re;

        int err = regcomp(&re, RE, 0);
        if (err != 0) {
                char errbuf[128];
                regerror(err, NULL, errbuf, sizeof(errbuf));
                fprintf(stderr, "regcomp failed: %s\n", errbuf);
                return EXIT_FAILURE;
        }
        for (int i = 1; i < argc; i++)
                if (regexec(&re, argv[i], 0, NULL, 0) == 0)
                        printf("The string %s matches\n", argv[i]);

        regfree(&re);

        return EXIT_SUCCESS;
}
: gaja; make re
cc -O2 -pipe    -o re re.c
: gaja; ./re 'a bc@d.e'
: gaja; ./re 'abc@d.e'
The string abc@d.e matches
: gaja;

Note that 'a bc@d.e' did NOT match.

>So I reread regex(7) and this time noticed:
>
>'^' (matching the null string at the beginning of a line),
>'$' (matching the null string at the end of a line)
>
>Which, by its discussion of lines, seems to imply that regcomp/regexec
>thinks strings, i.e. shell parameters are somehow different from strings
>that have been filled by reading lines from a file.

My system includes this in regex(3), when discussing newlines:

     REG_NEWLINE     Compile for newline-sensitive matching.  By default,
                     newline is a completely ordinary character with no
                     special meaning in either REs or strings.  With this
                     flag, `[^' bracket expressions and `.' never match
                     newline, a `^' anchor matches the null string after any
                     newline in the string in addition to its normal function,
                     and the `$' anchor matches the null string before any
                     newline in the string in addition to its normal function.

That is, newlines are ordinarily treated like any other line.

>> Could you clarify what you mean?  '$' will match the empty string at the
>> end of a line, '^' matches the empty string at the beginning of a line.
>
>Exactly so. But they don't match the ends of a string that was passed in
>as a command-line parameter.

Are you sure you're matching against the string you think you are?
In particular, are you sure the string your program is matching
against actually contains a space?

>> As far as other libraries, if you can link against C++ code, the RE2
>> library is very nice.
>
>I tried getting int C++ years ago when it first became common (think
>Borland C++) and hated it, found Bjarne Stoustrup's C++ far below the
>standard set by K&R and finally gave it up when I found all too much C++
>code was in face just ANSI C with // comment delimiters.

You don't have to program in C++ to use RE2.  Just be able to link
against a program that is written in C++.

>> You'd want something that covers the POSIX interfaces.
>
>Quite possibly, though I'm constantly surprised by how useful and
>relevant it still is. This is about the first time it hasn't come up with
>the goods, though that says at least as much about how stable the C
>standard library's APIs are.
>
>Would you care to recommend a POSIX book thats as good as the SVR4 one
>was in its time?

I think the latest version of "Programming in the Unix Environment"
is quite good.  It has been kept up to date since the unfortunately
premature death of W Richard Stevens.  I don't recall whether it
covers regular expressions, though.

It's been many years since I have used a book for that kind of thing,
so I'm afraid my recommendations for specific texts are dated.  :-(

 - Dan C.

--- SoupGate-Win32 v1.05
* Origin: Agency HUB, Dunedin - New Zealand | FidoUsenet Gateway (3:770/3)

SOURCE: echomail via QWK@docsplace.org

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.