I know I already creamed you guys comprehensively last time
we discussed SOT/EOT, but I thought I'd post this anyway, just
to rub it in. BTW Andrew, the least you could do is admit that
you just made up that "can't guarantee that my system conforms
to all of the SOT/EOT spec" rubbish.
SOT/EOT rationale - originally written 1995-05-25 by Paul Edwards
This document is released to the public domain.
FSC-XXXX.000, available for FREQ from 3:711/934, contains the
spec for the SOT/EOT kludges. But since that is a spec, it
does not contain a lot of the background information that explains
why these kludges were introduced.
The problem that SOT/EOT attempts to solve
++++++++++++++++++++++++++++++++++++++++++
The fundamental problem is that there is not a clear distinction
between what is user-text and what is control information. When
FTS-1 was created, there was indeed a clear separation, with a
fixed header (control information), kludge lines (control
information clearly identifiable by "^A" at the beginning of a
line, and the rest is user-text. However this was completely
stuffed up by FTS-4 and it's introduction of control lines that
were not distinguishable from user-text, they were just plonked
right in the middle of it. It is far, far too late to remedy
that gross amateurism, but we can endeavour to help heal the
wounds. The control lines that introduced the problem are
"AREA", "SEENBY", "* Origin" and
"---". Also some people have
seen fit to exacerbate the problem with control lines such as
"# Origin".
Signs of sickness
+++++++++++++++++
This lack of a clear distinction between control information and
user text manifests itself whenever you try to use these control
words as part of your user text. E.g. Golded will hide a line
where it recognizes an AREA kludge, even if it is in the middle
of some text. When quoting a message from someone else, the
tearline and origin line are included as part of the quote, and
the tearline is changed from "---" to "-+-" in an attempt to stop
OTHER software from truncating the message on the first "---"
that is found!!! You will notice that if you type "+++" and then
quote that, golded will not touch it. What's so special about
"---" that they don't want the user to enter it? After all, it's
just NORMAL ASCII CHARACTERS! "---" could be the start of some
Ada or SQL comments, especially if you are posting an an Ada or
SQL echo!
None of this is golded's problem of course, it is merely making
an effort to help software, because software has so much trouble
trying to tell whether a "---" is a bit of user text or if it is
a control line, because there is NO CLEAR DISTINCTION. Which is
in stark contrast to kludge lines which start with a control
character, that can never be confused with NORMAL ASCII TEXT.
It is also in stark contrast to lines that start "+++", because
there is no confusion over whether that is user-text or a control
line, because there is no control line defined that starts "+++"!
The end user should never have to know what control lines are used
by the transport mechanism - to require them to do so is quite
obviously a sign of shoddy design. Which is why it is distressing
to see the following comment from the author of MM707.ZIP, a
cooking program! Some poor guy writes a program to deal with
cooking, and he finds that he has to learn about the FTS specs
in order to make his program "fido friendly"! What a disgrace!
"A new export format has been added which eliminates the
hyphens in the header and footer; this provides compatibility
with BBS systems that could not use hyphens."
I think we should all hang our heads in shame that we make authors
of recipe programs have to learn about mail standards because we
designed a system that is so shoddy that the poor guy can't even
transmit hyphens in peace. SHEESH!
The current algorithms
++++++++++++++++++++++
To determine the rogue kludge lines in a message, the best
algorithm we can come up with is this:
AREA - look on the very first line, ie don't skip any ^A kludges
to look for an AREA line, just look at the VERY first character,
and if you find "AREA", then you've got an AREA kludge. You hope
and pray that this is not a netmail message with no ^A kludges at
all, because if it IS a netmail message with no ^A kludges, then
you will be inspecting USER TEXT to see if the message starts
"AREA". So you hope and pray that the user hasn't entered some
text that starts off with "AREA". Someone in NET_DEV reported
that that's exactly what happened on his system. Regardless, any
algorithm that is in any way dependent on what a user enters, is
a very amateur algorithm. What we really need is a kludge, ANY
kludge, that will stop the algorithm from having to look at the
user text. It would be feasible to say that INTL kludge should
always be generated, and this would block off that problem.
^ASOT also does the job.
SEENBY - start searching from the bottom of the message. Go back
one line at a time until you get a SEENBY. Then go back one line
at a time until you get a line that DOESN'T start with SEENBY.
Stop searching backwards, and start reading the SEENBY lines going
down. This assumes that the SEENBYs are in a single block. No
FTS spec guarantees this. The SOT/EOT spec makes this compulsory,
but on a message without SOT/EOT, all you can do is hope and pray
that the SEENBY lines are in one block. Also you have to hope and
pray that when you were searching back looking for a non-SEENBY
line, that there was actually going to be a line there that would
terminate the search. The origin line may have terminated the
search. But some people say the origin line is not compulsory, so
you can't be sure that it will be present. Some people say that
the tearline is also not compulsory, so you can't guarantee that
that will be there either. Then you hit the user-text, and once
again you have to hope and pray that the user hasn't seen fit to
type "SEENBY" as the last line of text in their message (or more
likely, they have imported some text containing SEENBY lines).
EOT is designed to tighten this loop up, by guaranteeing such an
algorithm will not start inspecting user-text looking for SEENBY
lines.
Origin line - possibly a mandatory control line, possibly not,
different people interpret FTS-4 in different ways. I personally
read it as compulsory. Regardless, the SOT/EOT spec makes it
compulsory with no ambiguity. The best algorithm we have here is
to search for the LAST occurrence of "* Origin". Although this
means that you will be searching user-text for these characters,
you really don't have much choice. You *could* search from the
bottom up, but you don't know what textual control lines people
have invented (e.g. "# Origin") so you do not know when you have
exited the control line section anyway, so you achieve the same
thing by searching forward anyway. What this means is that if
someone is not generating an origin line, and a user imports
some text containing an origin line (the user didn't know any
better, practically all message exporters include tear/origin
line (because it's too difficult to strip them in the first place)
when they expoert the message, so it's only reasonable that the
user will not know that these will upset software. It's happened
to me a few times on my system, and even though there was a second
origin line (the real one), unfortunately squish (the mailprocessor
I use) doesn't look for the last origin line, so I get the wrong
address reported for that message. With a message that has EOT,
you can find an origin line by ignoring all text up to the EOT,
because that all belongs to the user. After that, there will only
be the one origin line [if the SOT/EOT spec hadn't specifically
disallowed an origin-less message, you would be able to tell that
a message didn't have an origin line if there was no origin line
after EOT].
tearline - best algorithm we have is the same as origin line,
except that this piece of text is far more common. I would have
liked to use them instead of "+++++++" to do underlining in this
message! SOT/EOT doesn't make tearline compulsory, although it
hints that FTS-4 may make it compulsory (personally I read it as
being compulsory, but many disagree). Regardless, with EOT in
place, if you don't find "---" after the EOT, you know that this
message does not contain a tearline. You will never be confused
by someone in an Ada conference posting a snippet of code, which
happens to end with a comment line (Ada comments begin "--" and
"----- don't forget to compile with +x5 -----" is a valid Ada
comment. Also it is a valid SQL comment). Unfortunately, without
SOT/EOT, when trying to find a tearline, you just have to hope and
pray that the originating system generated a tearline, so that the
last one you find is indeed a genuine tearline.
How SOT/EOT helps existing software
+++++++++++++++++++++++++++++++++++
The most important role that SOT/EOT serves is to bracket the user
text. Most existing mailprocessors, looking for an AREA line, will
not have to inspect the user text to look for the AREA line, as
the SOT stops them from getting that far. Also, existing
mailprocessors looking for the SEENBY lines by searching backwards,
will be sure to have something that stops them inspecting user
text.
How SOT/EOT helps new software
++++++++++++++++++++++++++++++
Finding control information is dead simple. Except for some very
rare exceptions that are not important to mailprocessing, all the
control information is either before SOT or after EOT. It greatly
simplifies the algorithms needed to pick out the relevant control
information. However, you still need your old algorithms for
messages that do not have SOT/EOT, but at least now you (or at
least the sender of the message) has a choice about whether to
allow robust algorithms or rely on the trusty "hope and pray"
algorithms - the choice is theirs. If they find that half of their
message disappears because they started a line with "---", that's
their choice for going the "hope and pray" path.
There are also a variety of other things that the SOT/EOT spec
includes, that aren't part of the mainstay of the spec, but
were included because it was a convenient place to put them.
This includes making sure that x'8d' is not used for soft-CRs
(there is no use for soft-CRs), and instead allows people
who's national characters include x'8d' to use fidonet technology.
x'8d' is a character in the Russian alphabet, just like x'41' is to
the English speaking world using ASCII.
Another way that SOT/EOT comes in really useful is when people
post a C program (or whatever) in the echo. The editor I use
(MSQ305.ZIP) only displays user-text in the user-text display
area (and the presence of EOT means that the control information
is stripped out with 100% accuracy instead of "hope and pray"
accuracy. When I do a "save to disk", it only saves the user-text
portion (ie not tearline and other control rubbish), and I can go
and directly compile my program. Considering that it was compilable
when the other person sent it to me, it should not be such an
amazing fact that MSQ305 will manage to do C program in -> C
program out, but alas, this is the exception, not the norm in
fidonet today!
Why *some* old fido dogs don't like new tricks
++++++++++++++++++++++++++++++++++++++++++++++
As the inventors of the original "hope and pray" abortion of
kludges without ^A prefixes, some people don't like having the
gaping hole in the transport mechanism being pointed out to
them. Presumably they think it makes them look silly for not
having realised for the last x years that telling users not to
enter "---" lest it interfere with mailprocessing software is
the sign of a very poorly design transport mechanism, not the
sign of stupid users. Besides, if they didn't think of it, it
can't be a very good idea, can it? I really don't know what
goes through their little minds. Personally I am trying to
make the current transport mechanism more robust, while they
are pushing the virtues of "hope and pray". As they say, "it
takes all sorts".
@EOT:
---
* Origin: P9 - Ten Minute Limit (3:711/934.9)
|