TIP: Click on subject to list as thread! ANSI
echo: rberrypi
to: MARTIN@MYDOMAIN.INVALID
from: ELI THE BEARDED
date: 2020-03-19 21:26:00
subject: Re: Regexes and C

In comp.sys.raspberry-pi, Martin Gregorie   wrote:
> I'm dealing with bog standard e-mails which can have been sent from
> almost any hardware using almost any software and at the immediate point
> of interest, are being passed between by processes written in Python, C
> and bash. My immediate concern is to sanitise sender addresses being
> passed through a bash script, which is the only piece of the puzzle
> written my myself apart, of course, from the sanitiser.

It's doable, but hard. My email address for this post is real and works.
But I selected it because it looks wrong to bad email address sanitizers.
Many years ago I used "#@..." instead, but I found out it was breaking
UUCP because the addresses were being passed unescaped to sh which saw
it as a comment. This was late 1990s, long after UUCP had mostly gone
away. My goal with the address to to thwart spammers, not legit users,
so I switched to the current style. Globbing rules mean that it would
only be risky with a file that matches the domain in the working
directory, which is fortunately very unlikely.

For proof of concept I created a  address once,
again real and deliverable. It was live for a year or so. In general,
the local part of an email address has very few hard and fast rules.
You'll find more rules in what commercial email providers are willing to
let you send as than in what the software needs to support.

You are best off if you can avoid ever letting email addresses be
interpreted by the shell. Stick them in "files" (stream for pipes
count as files) and have mail programs parse them out of the stream.

It's probably mentioned else-thread, and I haven't seen it yet, but
RFC821 (or is it 822?) comments in header lines are basically
regexp-proof. The simple cases can be handled, but not the full
complexity. The full complexity is basically only used by people being
deliberately difficult, so you don't run into it often. The part you
can't handle with regexp, at least in a single pass: the balanced
parenthesis for nested comments rule.

Elijah
------
years ago used to maintain the "+ addressing" FAQ

--- SoupGate-Win32 v1.05
* Origin: Agency HUB, Dunedin - New Zealand | FidoUsenet Gateway (3:770/3)

SOURCE: echomail via QWK@docsplace.org

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.