TIP: Click on subject to list as thread! ANSI
echo: nthelp
to: Ellen K.
from: Rich
date: 2005-10-22 22:31:34
subject: Re: Character encodings, transfer encodings, etc

From: "Rich" 

This is a multi-part message in MIME format.

------=_NextPart_000_0464_01C5D758.5B873A50
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

   There are multiple ways.  The two most common are called =
quoted-printable and base64.

   In quoted printable characters can be represented by =3DXX where XX =
are the hex digits for the byte.  Because the '=3D' is an escape =
character it is expressed as =3D3D.

   In base64 the byte sequence is divided in three byte groups which are =
subdivided into four six bit units.  The six bit units are mapped to =
printable ASCII characters.

   Note that I refer to bytes not characters for the source.  This is =
because this transfer encoding is applied after any character encoding =
like UTF-8.  For example, with UTF-8 a single character is represented = by
from one to four bytes.  In quoted printable this becomes from one to = 12
ASCII characters in quoted printable.  There are many character = encodings
in use for many reasons.  Quoted printable and base64 are = usually
selected based on which results in a smaller size overall.  At = least that
is the criterion used by the clients I have seen.

   George's example is slightly different than what I describe above.  =
Headers like the subject use a different mechanism to identify encoding =
than the message body and unlike the body allow mixing and matching in =
some ways.  His example is using a character encoding of "ascii"
and = transfer encoding of base64.  What bothered him is that the encoded
form = is used when it wasn't necessary and presumably some tool he is
using = doesn't understand this 12 year old standard.

Rich

  "Ellen K."  wrote in message =
news:485ml1121hsq9se5hg0l4d2tsci5c5vc6b{at}4ax.com...
  Just curious, how are characters that require more than 7 bits encoded
  into 7-bit?

  On Thu, 20 Oct 2005 17:21:58 -0700, "Rich"  wrote in message
  :

  >   Email content is any encoding you want.  The example you give is =
valid even if silly.  It's not a security issue in any case.
  >
  >   BTW, email is not 7-bit though it is encouraged to be encoded as =
such because that provides better compatibility.  There is a standard = for
checking for 8-bit compatiblity.  See =
http://www.ietf.org/rfc/rfc1652.txt.  It's not necessary since anything =
can be encoded as 7-bit.  It can be more efficient.
  >
  >Rich
  >
  >  "Geo."  wrote in message =
news:4357ff5e$1{at}w3.nls.net...
  >  Ok I don't understand so maybe someone can give me a rational =
explanation of
  >  this.
  >
  >  Why would an email program accept
  >
  >  Subject: =3D?ascii?B?W1NQQU1dICBPbmxpbmUgUGF5bWVu?=3D
  >  =3D?ascii?B?dHMgYW5kIG91ciBzZWN1cmUgc2l0?=3D =3D?ascii?B?ZSE=3D?=3D
  >
  >  and decode it to
  >
  >   [SPAM]  Online Payments and our secure site!
  >
  >  This just boggles the mind, I mean if you were trying to create =
secure
  >  application wouldn't you restrict to a least common instead of =
allow
  >  everything? Email is 7bit ascii not unicode correct? Is this =
somehow needed
  >  to allow unicode subject line where the RFC's don't allow it?
  >
  >  Geo. 

------=_NextPart_000_0464_01C5D758.5B873A50
Content-Type: text/html;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable








   There are
multiple =
ways.  The two=20
most common are called quoted-printable and base64.
 
   In quoted
printable =
characters can be=20
represented by =3DXX where XX are the hex digits for the byte.  =
Because the=20
'=3D' is an escape character it is expressed as =3D3D.
 
   In base64
the byte =
sequence is divided=20
in three byte groups which are subdivided into four six bit
units.  = The six=20
bit units are mapped to printable ASCII characters.
 
   Note that
I refer to bytes =
not=20
characters for the source.  This is because this transfer encoding = is=20
applied after any character encoding like UTF-8.  For example,
with = UTF-8 a=20
single character is represented by from one to four bytes.  In = quoted=20
printable this becomes from one to 12 ASCII characters in quoted=20
printable.  There are many character encodings in use for many=20
reasons.  Quoted printable and base64 are usually selected based
on = which=20
results in a smaller size overall.  At least that is the criterion
= used by=20
the clients I have seen.
 
   George's
example is =
slightly different=20
than what I describe above.  Headers like the subject use a = different=20
mechanism to identify encoding than the message body and unlike the body = allow=20
mixing and matching in some ways.  His example is using a character =

encoding of "ascii" and transfer encoding of base64. 
What bothered = him is=20
that the encoded form is used when it wasn't necessary and presumably = some tool=20
he is using doesn't understand this 12 year old standard.
 
Rich
 

  "Ellen K." <72322.1016{at}compuserve.com&g=">mailto:72322.1016{at}compuserve.com">72322.1016{at}compuserve.com&g=
t;=20
  wrote in message news:485ml1121hs=
q9se5hg0l4d2tsci5c5vc6b{at}4ax.com...Just=20
  curious, how are characters that require more than 7 bits =
encodedinto=20
  7-bit?On Thu, 20 Oct 2005 17:21:58 -0700,
"Rich" <{at}> =
wrote in=20
  message<43583435{at}w3.nls.net>:&=">mailto:43583435{at}w3.nls.net">43583435{at}w3.nls.net>:&=
gt;  =20
  Email content is any encoding you want.  The example you give is =
valid=20
  even if silly.  It's not a security issue in any=20
  case.>>   BTW, email
is not 7-bit though it is =

  encouraged to be encoded as such because that provides better=20
  compatibility.  There is a standard for checking for 8-bit=20
  compatiblity.  See http://www.ietf.org/rfc/rfc1" target="new">http://www.ietf.org/rfc/rfc1=">http://www.ietf.org/rfc/rfc1652.txt">http://www.ietf.org/rfc/rfc1=
652.txt. =20
  It's not necessary since anything can be encoded as 7-bit.  It =
can be=20
  more
efficient.>>Rich>> 
"Geo." =
<fake{at}barkdom.com>">mailto:fake{at}barkdom.com">fake{at}barkdom.com>
wrote in =
message news:4357ff5e$1{at}w3.nls.net...=
> =20
  Ok I don't understand so maybe someone can give me a rational =
explanation=20
  of> 
this.>>  Why would an email =
program=20
  accept>>  Subject:=20
  =3D?ascii?B?W1NQQU1dICBPbmxpbmUgUGF5bWVu?=3D> =20
  =3D?ascii?B?dHMgYW5kIG91ciBzZWN1cmUgc2l0?=3D=20
  =3D?ascii?B?ZSE=3D?=3D>>  and
decode it=20
  to>>  
[SPAM]  Online Payments and our =
secure=20
  site!>>  This just boggles the
mind, I mean if you =
were=20
  trying to create secure>  application wouldn't you =
restrict to a=20
  least common instead of allow>  everything?
Email is 7bit =
ascii=20
  not unicode correct? Is this somehow needed>  to allow =
unicode=20
  subject line where the RFC's don't allow
it?>>  =
Geo.=20
  <confused and trying not to read conspiricy into=20
it>

------=_NextPart_000_0464_01C5D758.5B873A50--

--- BBBS/NT v4.01 Flag-5
* Origin: Barktopia BBS Site http://HarborWebs.com:8081 (1:379/45)
SEEN-BY: 633/267 270 5030/786
@PATH: 379/45 1 106/2000 633/267

SOURCE: echomail via fidonet.ozzmosis.com

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.