AbstractThis article will help you write foreign language help for HTML Help (.chm) using FAR HTML authoring software.
Quick Overview- Shouldn't Unicode make this easy today?
HTML Help v1.0 was released 1997. It is old and not Unicode enabled. So all project files (.hhp, .hhc, .hhk) and HTML topic files (.htm, .html) all need to be saved as ANSI. If the HTML is encoded as Unicode (UTF-8 or UTF-16 aka UNICODE) non-English chars wont be handed correctly in the HH navigation (TOC, Index, Search). The embedded browser (content area on right of the help viewer) will however display the topic text fine since this is a UNICODE enabled control.
- What is ANSI Encoding?
ANSI chars are encoded using only 8 bits. Old ANSI applications rely on a setting in Windows Control Panel "Region & Language" to set the language. If you set this to Japanese then ANSI applications designed to handle Japanese characters will work. If you set to Arabic then ANSI applications that understand Arabic will work and so on. See also http://helpware.net/FAR/help/Unicode2.htm
- What does this mean for HTML Help?
To correctly compile and display say Japanese Help you will need to find a Japanese Windows PC, or change the PC Region settings to use Japanese. You can't mix foreign language help (although you can display English text along side any foreign language text).
- Right-To-Left text
HTML Help also supports right to left text. You don't have to do anything special for this to work (except implement the steps below). Note that the Help Window itself does not support placing the navigation pane (containing TOC/Index/Search) on the right side.
Example: Creating Japanese helpRegion Settings For HH to compile and display Japanese text correctly you need to set "Windows Control Panel > Region Settings" to Japanese.
You will be asked to restart Windows. Under Windows 7:
- Control Panel->Regional Options
- On the "Languages" page make sure you check the box "Install files for East Asian languages".
- On the "Advanced" page select "Japanese" from the language dropdown list.
- Reboot, recompile, re-run the test case
Windows 2K: - Control Panel->Regional Options
- At the bottom, where you can check the different languages to support, click on Set Default...
- Select Japanese from the list
- Reboot, recompile, re-run the test case
FontsFor a non-Japanese PC you will need to install the Japanese fonts. Under Windows 7 install the Japanese language Pack using Windows Update. Note: Language packs are only available for Windows 7 Ultimate and Windows 7 Enterprise. See also: http://support.microsoft.com/kb/972813
Here is some screen shots of Windows 7. Windows XP is different.
Under "Optional" you will find the language packs. Here my Japanese Language Pack is already installed so is no longer listed (but is listed now in Control Panel Uninstall).
Help Project SettingsOpen the FAR HTML > HH Project Editor and set both the language and Char Set to Japanese (ShiftJIS)
Project TOC / Index SettingsSimilarly open the TOC (.hhc) and Index (.hhk) files for the project (using FAR TOC & Index editors) and set their Char Set. For languages such as Russian, you may also need to set the TOC Font property. Example: Try Verdana, Arial or Times Roman.
HTML Topic FilesSet the Char Set in your Topic files as well. This will allow the browser to properly display the foreign language characters from your ANSI HTML encoded files.
To do this insert the appropriate Meta statement (see example below) into the head section of each HTML file. This can be done very quickly using FAR find and replace. Use "shift_jis" for Japanese. For other codes check the HH Workshop online help (see Appendix 1 below).
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=shift_jis">
</head>
ANSI FilesAgain remember that all text files -- HH project files (.hhp, .hhc, .hhk etc) as well as topic files (.html, .htm) -- need to be ANSI encoded. You can do this one at a time using the Windows NotePad SaveAs dialog. Or a better way is to use FAR's Encoding command to convert multiple files in a single sweep.
Other LanguagesThe above can be applied to any language help.
Here are some codes to help.
Language | HTML Meta statement | | | Arabic | <meta http-equiv="Content-Type" content="text/html; charset=windows-1256"> | | | Hebrew | <meta http-equiv ="Content-Type" content="text/html; charset=windows-1255"> | | | Japanese | <meta http-equiv ="Content-Type" content="text/html; charset=shift_jis"> | | |
For other Codes see Appendix 1 below.
Appendix 1The HH Workshop Help lists all the Char Set codes for you. Here is copy and paste of that page...
Character Set RecognitionInternet Explorer uses the character set specified for a document to determine how to translate the bytes in the document into characters on the screen or paper. By default, Internet Explorer uses the character set specified in the HTTP content type returned by the server to determine this translation. If this parameter is not given, Internet Explorer uses the character set specified by the META element in the document. It uses the user's preferences if no META element is given. You can use the META element to explicitly set the character set for a document. In this case, you set the HTTP-EQUIV= attribute to "Content-Type" and specify a character set identifier in the CONTENT= attribute. For example, the following META element identifies Windows-1251 as the character set for the document. <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=Windows-1251"> As long as you place the META element before the BODY element, it affects the whole document, including the TITLE element. For clarity it should appear as the first element after HEAD so that all readers know the encoding before the first displayable is parsed. Note that the META element applies to the document containing it. This means, for example, that a compound document (a document consisting of two or more documents in a set of frames) can use different character sets in different frames. Windows Codepage # | Display name | Preferred ID on SAVE | Aliases in Internet Explorer 4 | 1252 (See Note 1) | Western | iso-8859-1 except when 128-159 is used, use "Windows-1252" | iso-8859-1 | 28592 | Central European (ISO) | iso-8859-2 | iso8859-2, iso-8859-2, iso_8859-2, latin2, iso_8859-2:1987, iso-ir-101, l2, csISOLatin2 | 1250 | Central European (Windows) | Windows-1250 | Windows-1250, x-cp1250 | 1251 | Cyrillic (Windows) | Windows-1251 | Windows-1251, x-cp1251 | 1253 | Greek (Windows) | Windows-1253 | Windows-1253 | 1254 | Turkish (Windows) | Windows-1254 | Windows-1254 | 932 | Japanese (Shift-JIS) | shift_jis | shift_jis, x-sjis, ms_Kanji, csShiftJIS, x-ms-cp932 | 51932 | Japanese (EUC) | x-euc-jp | Extended_UNIX_Code_Packed_Format_for_Japanese, csEUCPkdFmtJapanese, x-euc-jp, x-euc | 50220 | Japanese (JIS) | iso-2022-jp | csISO2022JP, iso-2022-jp | 1257 | Baltic (Windows) | Windows-1257 | windows-1257 | 950 | Traditional Chinese (BIG5) | big5 | big5, csbig5, x-x-big5 | 936 | Simplified Chinese (GB2312) | gb2312 | GB_2312-80, iso-ir-58, chinese, csISO58GB231280, csGB2312, gb2312 | 20866 | Cyrillic (KOI8-R) | koi8-r | csKOI8R, koi8-r | 949 (See Note 2) | Korean (KSC5601) | ks_c_5601 | euc-kr | 1255 (logical) (See Note 3) | Hebrew (ISO-logical) | Windows-1255 | iso-8859-8i | 1255 (visual) | Hebrew (ISO-Visual) | iso-8859-8 | ISO-8859-8 Visual, ISO-8859-8 , ISO_8859-8, visual | 862 | Hebrew (DOS) | dos-862 | dos-862 | 1256 | Arabic (Windows) | Windows-1256 | Windows-1256 | 720 | Arabic (DOS) | dos-720 | dos-720 | 874 | Thai | Windows-874 | Windows-874 | 1258 | Vietnamese | Windows-1258 | Windows-1258 | 65001 | Unicode UTF-8 | UTF-8 | UTF-8, unicode-1-1-utf-8, unicode-2-0-utf-8 | 65000 | Unicode UTF-7 | UNICODE-1-1-UTF-7 | utf-7, UNICODE-1-1-UTF-7, csUnicode11UTF7, utf-7 | 50225 | Korean (ISO) | ISO-2022-KR | ISO-2022-KR, csISO2022KR | 52936 (See Note 4) | Simplified Chinese (HZ) | HZ-GB-2312 | HZ-GB-2312 | 28594 | Baltic (ISO) | iso-8869-4 | ISO_8859-4:1988, iso-ir-110, ISO_8859-4, ISO-8859-4, latin4, l4, csISOLatin4 | 28585 | Cyrillic (ISO) | iso_8859-5 | ISO_8859-5:1988, iso-ir-144, ISO_8859-5, ISO-8859-5, cyrillic, csISOLatinCyrillic, csISOLatin5 | 28597 | Greek (ISO) | iso-8859-7 | ISO_8859-7:1987, iso-ir-126, ISO_8859-7, ISO-8859-7, ELOT_928, ECMA-118, greek, greek8, csISOLatinGreek | 28599 | Turkish (ISO) | iso-8859-9 | ISO_8859-9:1989, iso-ir-148, ISO_8859-9, ISO-8859-9, latin5, l5, csISOLatin5 |
Notes: Source documents Note 1: us-ascii, ascii, iso8859-1, iso_8859-1, iso-8859-1, ANSI_X3.4-1968, iso-ir-6, ANSI_X3.4-1986, ISO_646, irv:1991, ISO646-US, us, IBM367, cp367, csASCII, latin1, iso_8859-1:1987, iso-ir-100, ibm819, cp819, Windows-1252, x-ansi Note 2: ks_c_5601, ks_c_5601-1987, korean, csKSC56011987, euc-kr Note 3: ISO_8859-8:1988, iso-ir-138, hebrew, csISOLatinHebrew, Windows-1255, ISO_8859-8i , ISO_8859-8e, ISO-8859-8i, ISO-8859-8e , logical Note 4: http://www.internic.net/rfc/rfc1843.txt |