UTF-8 encoding

Ideas for new features

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
jeanflash
Posts: 3
Joined: Mon Oct 25, 2004 3:01 pm

UTF-8 encoding

Post by jeanflash »

When you clic save as and choose the UTF8 option in the code section, textpad actually encode special character in UTF-8 format BUT doesn't prefix the UTF-8 stream with the character U+FEFF (ZERO WIDTH NO-BREAK SPACE), or Byte-Order Mark (BOM).
Some programs like flash (and maybe others...) won't read the file as a UTF stream but like a standard ASCII file.

I read this page as reference before posting this suggestion :
http://www.cl.cam.ac.uk/~mgk25/unicode.html#ucsutf

It is mentionned that, i quote :
A good encoding converter will also offer options for adding or removing the BOM:

* Unconditionally prefix the output text with U+FEFF.
* Prefix the output text with U+FEFF unless it is already there.
* Remove the first character if it is U+FEFF.


I hope this help, and congratulation for the editor, it is really a good one, my favorite actually...so i hope this bug could be fix ! ;)
Thanks.
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Did you consider searching the help for "BOM"?

Configure | Preferences | Document Classes | <YourDocumentClass>

[X] Write Unicode and UTF-8 BOM
jeanflash
Posts: 3
Joined: Mon Oct 25, 2004 3:01 pm

Post by jeanflash »

No, I didn't check :? , because i thought it would be a option to check in the save window. Sorry and thanks for your advice.
jeanflash
Posts: 3
Joined: Mon Oct 25, 2004 3:01 pm

Post by jeanflash »

But something still seems not logical to me because the document class is based on file extension so if I check the use UTF BOM box in class preferences, all my text files will be prefixed with the BOM and it's really not what it should be, dont you think ? Or i should make a new document class which will open all *.utf file extension, but it's not an ideal solution, a text file encoded in UTF-8 still is a text file, isn't it ?
Kim Steinhaug
Posts: 1
Joined: Mon Jul 18, 2005 9:27 am

Post by Kim Steinhaug »

This is infact very annoying, i experience it myself. When saving a file as UTF-8, when I open it again its opened as ANSI or DOS.

Why cant this be detected? It means I have to manually use the open dialog each time I need to edit one of theese files. Working with alot of javascript, many of the files are UTF-8 for the moment, but not all - and I think its a bad idea using the class idea making all files UTF-8 by default.
neurotwit
Posts: 2
Joined: Wed May 12, 2004 1:40 am

Visual Studio 2005 Solution Files

Post by neurotwit »

Just a heads up, VS2005 won't recognize solution files saved without the BOM. So people who like to edit their solution files with Textpad are going to run into this issue.
User avatar
MudGuard
Posts: 1295
Joined: Sun Mar 02, 2003 10:15 pm
Location: Munich, Germany
Contact:

Post by MudGuard »

jeanflash wrote:because the document class is based on file extension
Wrong.
Whether a file belongs to a document class or not is not determined by the extension.
It belongs to the alphabetically last document class that has a filename pattern matching the filename.

It is no problem to have e.g. bla.ext in a different document class than blubb.ext - just give the full file name as pattern - just because most patterns have the form *.ext does not mean that bla.*, bla*.ext, bla.ext ... are not allowed.
Post Reply