In article ,
Martin Gregorie wrote:
>On Thu, 19 Mar 2020 15:19:36 +0000, Dan Cross wrote:
>> What do you mean "doesn't provide any way to anchor a regex to either
>> end of a string"? That's what the `^` and `$` metacharacters in the
>> regex are for, and they're fully supported by the library.
>>
>Just that:
>
>My original regex was
>
>"[a-zA-Z0-9][.a-zA-Z0-9_-]*@[a-zA-Z0-9][a-zA-Z0-9.]*[a-zA-Z0-9]*"
>
>and matched a string containing "a bc@d.e", so I changed it to
>
>"^[a-zA-Z0-9][.a-zA-Z0-9_-]*@[a-zA-Z0-9][a-zA-Z0-9.]*[a-zA-Z0-9]*$"
>
>and it *still* matched that string.
Hmm. Not on my system:
: gaja; cat re.c
#include
#include
#include
#include
const char *RE =
"^[a-zA-Z0-9][.a-zA-Z0-9_-]*@[a-zA-Z0-9][a-zA-Z0-9.]*[a-zA-Z0-9]*$";
int
main(int argc, char *argv[])
{
regex_t re;
int err = regcomp(&re, RE, 0);
if (err != 0) {
char errbuf[128];
regerror(err, NULL, errbuf, sizeof(errbuf));
fprintf(stderr, "regcomp failed: %s\n", errbuf);
return EXIT_FAILURE;
}
for (int i = 1; i < argc; i++)
if (regexec(&re, argv[i], 0, NULL, 0) == 0)
printf("The string %s matches\n", argv[i]);
regfree(&re);
return EXIT_SUCCESS;
}
: gaja; make re
cc -O2 -pipe -o re re.c
: gaja; ./re 'a bc@d.e'
: gaja; ./re 'abc@d.e'
The string abc@d.e matches
: gaja;
Note that 'a bc@d.e' did NOT match.
>So I reread regex(7) and this time noticed:
>
>'^' (matching the null string at the beginning of a line),
>'$' (matching the null string at the end of a line)
>
>Which, by its discussion of lines, seems to imply that regcomp/regexec
>thinks strings, i.e. shell parameters are somehow different from strings
>that have been filled by reading lines from a file.
My system includes this in regex(3), when discussing newlines:
REG_NEWLINE Compile for newline-sensitive matching. By default,
newline is a completely ordinary character with no
special meaning in either REs or strings. With this
flag, `[^' bracket expressions and `.' never match
newline, a `^' anchor matches the null string after any
newline in the string in addition to its normal function,
and the `$' anchor matches the null string before any
newline in the string in addition to its normal function.
That is, newlines are ordinarily treated like any other line.
>> Could you clarify what you mean? '$' will match the empty string at the
>> end of a line, '^' matches the empty string at the beginning of a line.
>
>Exactly so. But they don't match the ends of a string that was passed in
>as a command-line parameter.
Are you sure you're matching against the string you think you are?
In particular, are you sure the string your program is matching
against actually contains a space?
>> As far as other libraries, if you can link against C++ code, the RE2
>> library is very nice.
>
>I tried getting int C++ years ago when it first became common (think
>Borland C++) and hated it, found Bjarne Stoustrup's C++ far below the
>standard set by K&R and finally gave it up when I found all too much C++
>code was in face just ANSI C with // comment delimiters.
You don't have to program in C++ to use RE2. Just be able to link
against a program that is written in C++.
>> You'd want something that covers the POSIX interfaces.
>
>Quite possibly, though I'm constantly surprised by how useful and
>relevant it still is. This is about the first time it hasn't come up with
>the goods, though that says at least as much about how stable the C
>standard library's APIs are.
>
>Would you care to recommend a POSIX book thats as good as the SVR4 one
>was in its time?
I think the latest version of "Programming in the Unix Environment"
is quite good. It has been kept up to date since the unfortunately
premature death of W Richard Stevens. I don't recall whether it
covers regular expressions, though.
It's been many years since I have used a book for that kind of thing,
so I'm afraid my recommendations for specific texts are dated. :-(
- Dan C.
--- SoupGate-Win32 v1.05
* Origin: Agency HUB, Dunedin - New Zealand | FidoUsenet Gateway (3:770/3)
|