In article ,
I R A Darth Aggie wrote:
>On Thu, 19 Mar 2020 13:18:58 -0000 (UTC),
>Martin Gregorie , in
> wrote:
>
>> So, can any of you do better, i.e. write a regex that CAN validate the
>> syntax of an e-mail address in terms of its structure and the set of
>> permitted characters on the username and domain parts (the permitted
>> character sets are not the same).
>
>You're already in a state of regex sin. There are far too many
>exceptions to the rules with respect to an email address. The "+" is a
>sendmail construct, and has been replicated in postfix and possibly
>(likely?) present in other MTAs.
It may have originated in sendmail, but it's firmly enshrined in the
standards - which originated way before MS, Google and even AOL started
to bastardise the standard and create their own 'standard'. Start with
RFC822
https://tools.ietf.org/html/rfc822
published in 1982 and work forwards to it's replacements/updates.
(which isn't easy reading, but you need to note that it specifies
characters that can't be used rather than ones that can, so +, {}, ~
and whatever else you want are valid characters in an email address -
see section 3.3 and look for 'atom')
>This is a thorny problem, and has been with us ever since someone put
>a webform asking for an email address on the web, and thought sanity
>checking the address was a good idea. In theory, a great idea, but in
>practice it will drive you to drink.
Or drive the poor user (ie us) to throw their drink including bottle down
the throats of the people who didn't even know standards existed let alone
use them
-Gordon
--- SoupGate-Win32 v1.05
* Origin: Agency HUB, Dunedin - New Zealand | FidoUsenet Gateway (3:770/3)
|