[Yanel-development] Yanel does not specify the charset in the
Content-Type header
Andreas Wuest
awuest at student.ethz.ch
Mon Dec 4 19:09:29 CET 2006
Hi
On 4.12.2006 18:32 Uhr, Andreas Wuest wrote:
> I just realised that Yanel does not set the charset in the Content-Type
> header.
>
> This is crucial for editing though, or even web browsing, if the
> character set is not specified in the document itself.
On a sidenote, the HTTP/1.1 specification requires the content body to
be encoded using ISO-8859-1 if the charset is not specified in the
Content-Type header (if the content is a subtype of type text).
Futhermore, the W3C reccomends the following procedure for user-agents
to detect the charset encoding used (from
http://www.w3.org/TR/REC-html40/charset.html):
"To sum up, conforming user agents must observe the following priorities
when determining a document's character encoding (from highest priority
to lowest):
1. An HTTP "charset" parameter in a "Content-Type" field.
2. A META declaration with "http-equiv" set to "Content-Type" and a
value set for "charset".
3. The charset attribute set on an element that designates an
external resource.
In addition to this list of priorities, the user agent may use
heuristics and user settings. For example, many user agents use a
heuristic to distinguish the various encodings used for Japanese text.
Also, user agents typically have a user-definable, local default
character encoding which they apply in the absence of other indicators."
This means that Yanel should produce the charset parameter. Of course,
this requires Yanel to know which encoding was used for a certain
document in the first place. I see no other way than to define meta-data
containing a charset property (which of course must be checked and
enforced by any code which allows the creation of new and the
modification of existing documents).
Note that for now, Yulup is not conformant and simply uses UTF-8 in case
no charset parameter is defined (Yulup does not look at META
declarations or the "encoding" parameter of an XML Processing Instruction).
This was done because most documents served by Yanel are in fact UTF-8,
but as described above, lack the charset parameter.
--
Kind regards,
Andi
More information about the Yanel-development
mailing list