| TIP: Click on subject to list as thread! | ANSI |
| echo: | |
|---|---|
| to: | |
| from: | |
| date: | |
| subject: | Re: Character encodings, transfer encodings, etc |
From: "Rich"
This is a multi-part message in MIME format.
------=_NextPart_000_0464_01C5D758.5B873A50
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
There are multiple ways. The two most common are called =
quoted-printable and base64.
In quoted printable characters can be represented by =3DXX where XX =
are the hex digits for the byte. Because the '=3D' is an escape =
character it is expressed as =3D3D.
In base64 the byte sequence is divided in three byte groups which are =
subdivided into four six bit units. The six bit units are mapped to =
printable ASCII characters.
Note that I refer to bytes not characters for the source. This is =
because this transfer encoding is applied after any character encoding =
like UTF-8. For example, with UTF-8 a single character is represented = by
from one to four bytes. In quoted printable this becomes from one to = 12
ASCII characters in quoted printable. There are many character = encodings
in use for many reasons. Quoted printable and base64 are = usually
selected based on which results in a smaller size overall. At = least that
is the criterion used by the clients I have seen.
George's example is slightly different than what I describe above. =
Headers like the subject use a different mechanism to identify encoding =
than the message body and unlike the body allow mixing and matching in =
some ways. His example is using a character encoding of "ascii"
and = transfer encoding of base64. What bothered him is that the encoded
form = is used when it wasn't necessary and presumably some tool he is
using = doesn't understand this 12 year old standard.
Rich
"Ellen K." wrote in message =
news:485ml1121hsq9se5hg0l4d2tsci5c5vc6b{at}4ax.com...
Just curious, how are characters that require more than 7 bits encoded
into 7-bit?
On Thu, 20 Oct 2005 17:21:58 -0700, "Rich" wrote in message
:
> Email content is any encoding you want. The example you give is =
valid even if silly. It's not a security issue in any case.
>
> BTW, email is not 7-bit though it is encouraged to be encoded as =
such because that provides better compatibility. There is a standard = for
checking for 8-bit compatiblity. See =
http://www.ietf.org/rfc/rfc1652.txt. It's not necessary since anything =
can be encoded as 7-bit. It can be more efficient.
>
>Rich
>
> "Geo." wrote in message =
news:4357ff5e$1{at}w3.nls.net...
> Ok I don't understand so maybe someone can give me a rational =
explanation of
> this.
>
> Why would an email program accept
>
> Subject: =3D?ascii?B?W1NQQU1dICBPbmxpbmUgUGF5bWVu?=3D
> =3D?ascii?B?dHMgYW5kIG91ciBzZWN1cmUgc2l0?=3D =3D?ascii?B?ZSE=3D?=3D
>
> and decode it to
>
> [SPAM] Online Payments and our secure site!
>
> This just boggles the mind, I mean if you were trying to create =
secure
> application wouldn't you restrict to a least common instead of =
allow
> everything? Email is 7bit ascii not unicode correct? Is this =
somehow needed
> to allow unicode subject line where the RFC's don't allow it?
>
> Geo.
------=_NextPart_000_0464_01C5D758.5B873A50
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
There are
multiple =
ways. The two=20
most common are called quoted-printable and base64.
In quoted
printable =
characters can be=20
represented by =3DXX where XX are the hex digits for the byte. =
Because the=20
'=3D' is an escape character it is expressed as =3D3D.
In base64
the byte =
sequence is divided=20
in three byte groups which are subdivided into four six bit
units. = The six=20
bit units are mapped to printable ASCII characters.
Note that
I refer to bytes =
not=20
characters for the source. This is because this transfer encoding = is=20
applied after any character encoding like UTF-8. For example,
with = UTF-8 a=20
single character is represented by from one to four bytes. In = quoted=20
printable this becomes from one to 12 ASCII characters in quoted=20
printable. There are many character encodings in use for many=20
reasons. Quoted printable and base64 are usually selected based
on = which=20
results in a smaller size overall. At least that is the criterion
= used by=20
the clients I have seen.
George's
example is =
slightly different=20
than what I describe above. Headers like the subject use a = different=20
mechanism to identify encoding than the message body and unlike the body = allow=20
mixing and matching in some ways. His example is using a character =
encoding of "ascii" and transfer encoding of base64.
What bothered = him is=20
that the encoded form is used when it wasn't necessary and presumably = some tool=20
he is using doesn't understand this 12 year old standard.
Rich
* Origin: Barktopia BBS Site http://HarborWebs.com:8081 (1:379/45)SEEN-BY: 633/267 270 5030/786 @PATH: 379/45 1 106/2000 633/267 |
|
| SOURCE: echomail via fidonet.ozzmosis.com | |
Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.