TIP: Click on subject to list as thread! ANSI
echo: rberrypi
to: MARTIN GREGORIE
from: A. DUMAS
date: 2020-03-19 14:29:00
subject: Re: Regexes and C

On 19/03/2020 14:18, Martin Gregorie wrote:
> I spent more time than I should have yesterday trying to understand
> regcomp(), regexec() and regerror() well enough to validate a string
> containing an e-mail address string to make sure that: its structure is
> correct and neither the username nor the domain contains characters they
> shouldn't.
>
> The upshot was that I couldn't do it because I could not write a regex
> that would detect spaces in the address because apparently regcomp
> doesn't provide any way to anchor a regex to either end of a string, so I
> ended up with a negated regex that detects invalid characters in the
> string and hasn't a clue whether its syntactically correct:
>
> [^.a-zA-Z0-9@_-]
>
> This does the trick, but no thanks to the man pages regex(3), which
> describes the C functions, and regex(7), which describes the regex syntax.
> Both are poorly formatted, hard to read, and seem to have omitted useful
> information, such as the inability of specifying anchor points in strincs
> that DO NOT contain newlines.
>
> So, can any of you do better, i.e. write a regex that CAN validate the
> syntax of an e-mail address in terms of its structure and the set of
> permitted characters on the username and domain parts (the permitted
> character sets are not the same).

More or less impossible. E.g. apparently you didn't think that + is a
valid character, which it is (in the part before the @). Also, domains
(and usernames) can be UTF8. Best way is: try to deliver, check reply.

--- SoupGate-Win32 v1.05
* Origin: Agency HUB, Dunedin - New Zealand | FidoUsenet Gateway (3:770/3)

SOURCE: echomail via QWK@docsplace.org

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.