| TIP: Click on subject to list as thread! | ANSI |
| echo: | |
|---|---|
| to: | |
| from: | |
| date: | |
| subject: | Re: Character encodings, transfer encodings, etc |
From: Ellen K.
Thanks for the explanation. :)
One picture (OK, one example in this case ) being worth the
proverbial 1000 words, a lower-case n with a tilde (¤ if it gets reproduced
correctly by the time people read this) is ascii 241, i.e. it uses the
first bit of an 8-bit byte. How is it expressed in quoted-printable and
how is it expressed in base64?
On Sat, 22 Oct 2005 22:31:34 -0700, "Rich" wrote in message
:
> There are multiple ways. The two most common are called quoted-printable
and base64.
>
> In quoted printable characters can be represented by =XX where XX are the
hex digits for the byte. Because the '=' is an escape character it is
expressed as =3D.
>
> In base64 the byte sequence is divided in three byte groups which are
subdivided into four six bit units. The six bit units are mapped to
printable ASCII characters.
>
> Note that I refer to bytes not characters for the source. This is because
this transfer encoding is applied after any character encoding like UTF-8.
For example, with UTF-8 a single character is represented by from one to
four bytes. In quoted printable this becomes from one to 12 ASCII
characters in quoted printable. There are many character encodings in use
for many reasons. Quoted printable and base64 are usually selected based on
which results in a smaller size overall. At least that is the criterion
used by the clients I have seen.
>
> George's example is slightly different than what I describe above. Headers
like the subject use a different mechanism to identify encoding than the
message body and unlike the body allow mixing and matching in some ways.
His example is using a character encoding of "ascii" and transfer
encoding of base64. What bothered him is that the encoded form is used
when it wasn't necessary and presumably some tool he is using doesn't
understand this 12 year old standard.
>
>Rich
>
> "Ellen K." wrote in message
news:485ml1121hsq9se5hg0l4d2tsci5c5vc6b{at}4ax.com...
> Just curious, how are characters that require more than 7 bits encoded
> into 7-bit?
>
> On Thu, 20 Oct 2005 17:21:58 -0700, "Rich" wrote in message
> :
>
> > Email content is any encoding you want. The example you give is valid
even if silly. It's not a security issue in any case.
> >
> > BTW, email is not 7-bit though it is encouraged to be encoded as such
because that provides better compatibility. There is a standard for
checking for 8-bit compatiblity. See http://www.ietf.org/rfc/rfc1652.txt.
It's not necessary since anything can be encoded as 7-bit. It can be more
efficient.
> >
> >Rich
> >
> > "Geo." wrote in message
news:4357ff5e$1{at}w3.nls.net...
> > Ok I don't understand so maybe someone can give me a rational explanation
of
> > this.
> >
> > Why would an email program accept
> >
> > Subject: =?ascii?B?W1NQQU1dICBPbmxpbmUgUGF5bWVu?=
> > =?ascii?B?dHMgYW5kIG91ciBzZWN1cmUgc2l0?= =?ascii?B?ZSE=?=
> >
> > and decode it to
> >
> > [SPAM] Online Payments and our secure site!
> >
> > This just boggles the mind, I mean if you were trying to create secure
> > application wouldn't you restrict to a least common instead of allow
> > everything? Email is 7bit ascii not unicode correct? Is this somehow
needed
> > to allow unicode subject line where the RFC's don't allow it?
> >
> > Geo.
--- BBBS/NT v4.01 Flag-5
* Origin: Barktopia BBS Site http://HarborWebs.com:8081 (1:379/45)SEEN-BY: 633/267 270 5030/786 @PATH: 379/45 1 106/2000 633/267 |
|
| SOURCE: echomail via fidonet.ozzmosis.com | |
Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.