Page 1 of 1
Replace western text with Japanese text?
Posted: Sat Dec 11, 2004 3:33 pm
by UkSeo
Hello,
I need to do mass s&r operations where western text in html files gets replaced by japanese text, and later kyrillian, baltic etc.
This obviously touches on the encoding setting in Wildedit Replace panel. Problem is whatever setting I choose it either says:
Character conversion: Illegal input sequence/combination of input units, or:
Character conversion: Unmappable input sequence
With some files however it did work, choosing either Shift_JIS os UTF-8 as encoding options.
Any help and ideas much appreciated!
Posted: Sun Dec 12, 2004 3:43 pm
by bbadmin
Hello,
WildEdit reads and writes files using the single specified character encoding, so you will need to convert your files to UTF-8, if you want to replace English text with Japanese, etc. You must do that prior to running WildEdit.
Don't forget to change the http encoding in the HTML files as well. For example, change:
Code: Select all
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
to
Code: Select all
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
in the head section.
Keith MacDonald
Helios Software Solutions
Posted: Sun Dec 12, 2004 11:43 pm
by UkSeo
Dear Keith,
thanks for your reply. I'm not really sure what to make of it though.
Some of my files are shown by Textpad as being unicode, some ANSI. The problem with Wildedit is with both types.
What I obviously can do is using character representations like ナ ;ビ ;ゲ ; (spaces added to prevent rendering), search & replace with wildedit and other tools works just fine. The problem with that approach is just that the source files can not be read by humans in that way. So what I need is real japanese text representation in the source files, like ナビゲ
I am aware that the problem might very well be on my side, and may not be related to Wildedit specifically. I would still appreciate ideas and input.
Thanks again!
Posted: Mon Dec 13, 2004 11:30 am
by bbadmin
Unicode files work for me. There is only a problem with files containing character encodings that do not match the selected encoding. For Unicode files created on MS Windows, that should be UTF-16LE.
Note that unlike WildEdit, TextPad works internally in a specified code page, so it will not be able to open Unicode files containing characters from more than one code page.
Keith MacDonald
Helios Software Solutions
Posted: Mon Dec 13, 2004 4:15 pm
by UkSeo
>it will not be able to open Unicode files containing characters from different code pages.
That might be the explanation.
I have tested creating unicode files with japanese text and you are right, I can work with them in wildedit without problems.
So when I have files authored as ANSI files containing japanese text and save those files as Unicode / utf-8 files, the japanese text still does not render in wildedit?
So I'll have to find out a way to convert my files to unicode. Any easy way to go about that?
I am however not sure if this is really safe for all browsers, i.e. older IE versions seem to support unicode less than perfect.
Btw I have found a way around the problem for now, I do the japanese replacements I need in one file manually in textpad, then open that file as trial file in Wildedit. The japanese text renders like this: X?g?b?N?z??????A????? (lot of strange characters with accents etc)
I then copy and paste those replacements from the trial file to the replace field and let it run. When opening the changed files in Textpad I have perfect japanese text, ANSI encoded. Certainly not the best method though.
Posted: Tue Dec 14, 2004 12:43 pm
by bbadmin
I've uploaded a command line tool that can be used to convert the encoding of files. It can be downloaded from here:
www.textpad.com/download/wildedit/uconv.zip (346KB)
Extract the contents of the zip file into your WildEdit installation folder, which already contains some DLLs it requires. To run it, start a command prompt, then type:
Code: Select all
CD folder-containing-your-files
"C:\Program Files\WildEdit\uconv" -f windows-1252 -t UTF-8 -o newfile.html oldfile.html
The supported from (-f) and to (-t) encodings are any of those on the drop-down combobox in WildEdit. (Note that case is significant.) Just run uconv with parameter "-h" to get a listing of its other options.
If you have a lot of files to convert, set up all the uconv commands in a batch file, then run that. To get a directory listing of the relevant files in TextPad, choose Run from the Tools menu, and set up the command as follows:
Code: Select all
Command: DIR
Parameters: /b *.html
Initial folder: folder-containing-your-files
DOS Command: checked
Capture output: checked
Copy and paste the relevant lines from the Command Results window to your batch file, then edit it to insert all the uconv commands.
Keith MacDonald
Helios Software Solutions
Posted: Tue Dec 14, 2004 1:46 pm
by MudGuard
I just downloaded the tool (I hope that was ok) and tried it.
I got a message box saying
The Dynamic Link Library icuuc28.dll was not found in the path [followed by the path variable of my system]
Ok, I searched for it and found it in the Wildedit folder. So I moved uconv there.
Now it is missing icuin28.dll
I did not find it on my system here ...
Posted: Tue Dec 14, 2004 3:02 pm
by bbadmin
Andreas,
Thanks for reporting the missing DLL. It was in another folder on my search path, so I did not notice that it was required. I have uploaded uconv.zip again, with icuin28.dll, so you (and anyone else) are welcome to use it.
Keith MacDonald
Helios Software Solutions
Posted: Tue Dec 14, 2004 4:09 pm
by MudGuard
Thanks - now it works!
except that
uconv -s -h
still prints the help message although it clearly says that -s suppresses messages
8)
(too much wine in me, getting silly)
Posted: Tue Dec 14, 2004 4:54 pm
by UkSeo
Fantastic, many thanks! I'm going to try it later today as I'm currently knee-deep in Portuguese right now..