Puzzling behaviour - 'BOM' characters?

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
terrypin
Posts: 172
Joined: Wed Jul 11, 2007 7:50 am

Puzzling behaviour - 'BOM' characters?

Post by terrypin »

I'm using the 32 bit 4.7.3 version of TextPad.

I'm baffled why certain spurious characters are appearing in my files. Without truly grasping this topic, I believe they are 'BOM' characters. Having had obscure problems in this area before, I have that setting switched OFF:

Image

Here is the first section of a text file opened in TextPad (created by a freeware tool called Directory Lister, [url]http://download.cnet...4-10397036.html[/url] also showing how it looks in a hex editor:

Image

I then deleted the first two lines and resaved. As you see, the edited file now has those spurious characters at the beginning, spoiling the first line for further processing:

Image

Can anyone suggest what's causing this please?

--
Terry, East Grinstead, UK
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Are *.txt files in your Text class?

What is the Write Unicode and UTF-8 BOM setting in that class?
terrypin
Posts: 172
Joined: Wed Jul 11, 2007 7:50 am

Post by terrypin »

Yes, and that setting was disabled there too. In fact it seems it was disabled for all document classes.

But meanwhile I tried enabling the BOM setting and the result rather adds to the puzzle:

Image

I now have to try recalling what problem enabling it caused a year or two ago, prompting its disabling! I'd have hoped all this obscure BOM and Unicode/UTF stuff would work without my involvement, because it's largely a black art to me.


--
Terry, East Grinstead, UK
terrypin
Posts: 172
Joined: Wed Jul 11, 2007 7:50 am

Post by terrypin »

Bizarre!

I have just stepped through these tests again ... and TextPad is now working as expected! IOW, with the BOM option checkmarked I get the odd characters, with it unmarked I don't.

It's as if the setting had somehow got itself reversed.

For those that appreciate the details:

1. I saved the list from Directory Lister (DL).

2. I examined the hex; no spurious characters, ruling out the possibility that DL was the problem.

3. I opened it in Notepad and saved it with a new name. The hex of that was fine.

5. I opened the original in TextPad. Write Unicode and UTF-8 BOM was disabled. I saved it with a new name. The hex of that was fine.

6. I opened the original in TextPad. I enabled Write Unicode and UTF-8 BOM and I saved with a new name. The hex of that showed the spurious characters.

So, until it happens again (and spotting it will be the challenge), my dilemma is resolved: I'll leave it disabled.

--
Terry, East Grinstead, UK
sosimple
Posts: 30
Joined: Sat May 16, 2009 6:54 am

Post by sosimple »

I had a similar problem previously.

To begin with, I would save (export) a section of the registry. Then I would edit it with Textpad and save it. Then, if I would try to "import" the edited file back into the registry using the registry editor, the import would fail with complaints that the file was not of the proper format.

It turns out Textpad was writing the file in Unicode (probably UTF-8), and was including the BOM characters. I don't recall what the state of the setting: "Write Unicode and UTF-8 BOM" was at that time.

The solution for me was to do a "File-save-as" and change the "Encoding" to "ANSI" (or DOS would probably have been OK) on the "File-save-as" dialog box. I believe it was usually defaulted to "UTF-8" but I couldn't say it was always the case.

For me, this is a better solution because it allows editing "Unicode" files that already contain BOM characters to be correctly read/edited/written.
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Windows registry files are Unicode, encoded in UTF-16LE (what TextPad calls simply Unicode) with BOM.
Post Reply