Appendix E. Charsets

Table E-1 lists the suggested charset(s) for a number of languages. Charsets are used by servlets that generate multilingual output; they determine which character encoding a servlet's PrintWriter is to use. By default, the PrintWriter uses the ISO-8859-1 (Latin-1) charset, appropriate for most Western European languages. To specify an alternate charset, the charset value must be passed to the setContentType() method before the servlet retrieves its PrintWriter. For example:

res.setContentType("text/html; charset=Shift_JIS");  // A Japanese charset
PrintWriter out = res.getWriter();  // Writes Shift_JIS Japanese

Note that not all web browsers support all charsets or have the fonts available to represent all characters, although at minimum all clients support ISO-8859-1. Also, the UTF-8 charset can represent all Unicode characters and may be assumed a viable alternative for all languages.

Table E-1. Suggested Charsets

Language	Language Code	Suggested Charsets
Albanian	sq	ISO-8859-2
Arabic	ar	ISO-8859-6
Bulgarian	bg	ISO-8859-5
Byelorussian	be	ISO-8859-5
Catalan (Spanish)	ca	ISO-8859-1
Chinese (Simplified/Mainland)	zh	GB2312
Chinese (Traditional/Taiwan)	zh (country TW)	Big5
Croatian	hr	ISO-8859-2
Czech	cs	ISO-8859-2
Danish	da	ISO-8859-1
Dutch	nl	ISO-8859-1
English	en	ISO-8859-1
Estonian	et	ISO-8859-1
Finnish	fi	ISO-8859-1
French	fr	ISO-8859-1
German	de	ISO-8859-1
Greek	el	ISO-8859-7
Hebrew	he (formerly iw)	ISO-8859-8
Hungarian	hu	ISO-8859-2
Icelandic	is	ISO-8859-1
Italian	it	ISO-8859-1
Japanese	ja	Shift_JIS, ISO-2022-JP, EUC-JP[1]
Korean	ko	EUC-KR[2]
Latvian, Lettish	lv	ISO-8859-2
Lithuanian	lt	ISO-8859-2
Macedonian	mk	ISO-8859-5
Norwegian	no	ISO-8859-1
Polish	pl	ISO-8859-2
Portuguese	pt	ISO-8859-1
Romanian	ro	ISO-8859-2
Russian	ru	ISO-8859-5, KOI8-R
Serbian	sr	ISO-8859-5, KOI8-R
Serbo-Croatian	sh	ISO-8859-5, ISO-8859-2, KOI8-R
Slovak	sk	ISO-8859-2
Slovenian	sl	ISO-8859-2
Spanish	es	ISO-8859-1
Swedish	sv	ISO-8859-1
Turkish	tr	ISO-8859-9
Ukranian	uk	ISO-8859-5, KOI8-R

[1] First supported in JDK 1.1.6. Earlier versions of the JDK know the EUC-JP character set by the name EUCJIS, so for portability you can set the character set to EUC-JP and manually construct an EUCJIS PrintWriter.

[2] First supported in JDK 1.1.6. Earlier versions of the JDK know the EUC-KR character set by the name KSC_5601, so for portability you can set the character set to EUC-KR and manually construct a KSC_5601 PrintWriter.


Appendix D. Character Entities