Page 1 of 1

UTF-8 encoding

Posted: Mon Oct 25, 2004 3:25 pm
by jeanflash
When you clic save as and choose the UTF8 option in the code section, textpad actually encode special character in UTF-8 format BUT doesn't prefix the UTF-8 stream with the character U+FEFF (ZERO WIDTH NO-BREAK SPACE), or Byte-Order Mark (BOM).
Some programs like flash (and maybe others...) won't read the file as a UTF stream but like a standard ASCII file.

I read this page as reference before posting this suggestion :
http://www.cl.cam.ac.uk/~mgk25/unicode.html#ucsutf

It is mentionned that, i quote :
A good encoding converter will also offer options for adding or removing the BOM:

* Unconditionally prefix the output text with U+FEFF.
* Prefix the output text with U+FEFF unless it is already there.
* Remove the first character if it is U+FEFF.


I hope this help, and congratulation for the editor, it is really a good one, my favorite actually...so i hope this bug could be fix ! ;)
Thanks.

Posted: Mon Oct 25, 2004 3:33 pm
by ben_josephs
Did you consider searching the help for "BOM"?

Configure | Preferences | Document Classes | <YourDocumentClass>

[X] Write Unicode and UTF-8 BOM

Posted: Mon Oct 25, 2004 6:38 pm
by jeanflash
No, I didn't check :? , because i thought it would be a option to check in the save window. Sorry and thanks for your advice.

Posted: Mon Oct 25, 2004 6:53 pm
by jeanflash
But something still seems not logical to me because the document class is based on file extension so if I check the use UTF BOM box in class preferences, all my text files will be prefixed with the BOM and it's really not what it should be, dont you think ? Or i should make a new document class which will open all *.utf file extension, but it's not an ideal solution, a text file encoded in UTF-8 still is a text file, isn't it ?

Posted: Mon Jul 18, 2005 9:32 am
by Kim Steinhaug
This is infact very annoying, i experience it myself. When saving a file as UTF-8, when I open it again its opened as ANSI or DOS.

Why cant this be detected? It means I have to manually use the open dialog each time I need to edit one of theese files. Working with alot of javascript, many of the files are UTF-8 for the moment, but not all - and I think its a bad idea using the class idea making all files UTF-8 by default.

Visual Studio 2005 Solution Files

Posted: Tue Jan 17, 2006 5:41 pm
by neurotwit
Just a heads up, VS2005 won't recognize solution files saved without the BOM. So people who like to edit their solution files with Textpad are going to run into this issue.

Posted: Tue Jan 17, 2006 8:15 pm
by MudGuard
jeanflash wrote:because the document class is based on file extension
Wrong.
Whether a file belongs to a document class or not is not determined by the extension.
It belongs to the alphabetically last document class that has a filename pattern matching the filename.

It is no problem to have e.g. bla.ext in a different document class than blubb.ext - just give the full file name as pattern - just because most patterns have the form *.ext does not mean that bla.*, bla*.ext, bla.ext ... are not allowed.