Problems with line terminations with all document types.

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
User avatar
ineuw
Posts: 191
Joined: Sun Mar 18, 2007 3:23 pm

Problems with line terminations with all document types.

Post by ineuw »

I set my default document properties to save as PC and UTF-8, and it applies to all document types.

When opening a new document, the document properties indicates it to be so, but, when I copy and paste a page of text from a Wikipedia website, (edit view), the document properties (Alt-Enter) change to ANSI-1252 and PC. This is obvious when I paste text with characters like "éó".

How can I correct this?
TextPad 8.16.0 64bit in English and TextPad 9.1.0 64bit in French, on two separate Windows installations
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

The subject of your post refers to problems with line terminations, but the body of the post describes problems with the character encoding.

Either way, I can't reproduce this problem with TextPad 8.1.1 on Windows 10.

Are you checking the properties before or after you save the new file?
User avatar
ineuw
Posts: 191
Joined: Sun Mar 18, 2007 3:23 pm

Post by ineuw »

The properties are checked before and after saving a new file.

I was under the impression that page encoding and line termination are related/connected. After testing, I see that I am wrong. The line termination issue affected my work when working on the same documents in Linux. (on a dual boot desktop). I resolved this by setting Textpad line termination to Unix, since this does not affect my work in Windows.

However, I do have a problem with the page encoding. My default encoding is always UTF-8 with Unix line termination, but when I save an accented word like "Elisée", on reopening the same document the word changes to "Elisée" and the document encoding changes to 1252 - (ANSI - Latin 1). and I don't know what I am doing wrong.
TextPad 8.16.0 64bit in English and TextPad 9.1.0 64bit in French, on two separate Windows installations
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

You wrote:
    The properties are checked before and after saving a new file.
and

    when I save an accented word like "Elisée", on reopening the same document ...

There is some ambiguity here.

The Unicode value of the character é is 0x00E9. The UTF-8 encoding of this value is the byte sequence 0xC3, 0xA9. The Windows Latin-1 decoding of these values is the character sequence é, which is what you are seeing.

In the absence of an explicit indication of the encoding of your text the editor must examine it and make a guess. If the text contains only a small proportion of non-ASCII characters the editor might conclude that the text is encoded in Windows Latin-1. That is what is happening here.

To solve this you could do one of these things:

    Increase the proportion of non-ASCII characters.
    But this is something you might have no control over.

    Include a byte order mark (BOM: Unicode 0xFEFF; UTF-8 0xEF, 0xBB, 0xBF) at the beginning of your document:
        File | Save As...
            Encoding: UTF-8        [X] UNICODE BOM
    But not all text-handling software is happy with a BOM at the beginning of the text.

    Save your session in a workspace and open the file by opening the workspace.
    This is probably the best solution.

Edit: Corrected typo.
Last edited by ben_josephs on Tue May 21, 2019 5:16 pm, edited 1 time in total.
User avatar
ineuw
Posts: 191
Joined: Sun Mar 18, 2007 3:23 pm

Post by ineuw »

ben_josephs, can't thank you enough for this explanation. It's very clear and concise.
TextPad 8.16.0 64bit in English and TextPad 9.1.0 64bit in French, on two separate Windows installations
User avatar
ineuw
Posts: 191
Joined: Sun Mar 18, 2007 3:23 pm

Post by ineuw »

ineuw wrote:ben_josephs, can't thank you enough for this explanation. It's very clear and concise.
Addendum: Your explanation about a single UTF-8 character in a document
is validated. It changes the code to 1252 (ANSI - Latin 1)

In another TP doc, in which there were a number of UTF-8 characters, the encoding remained as it is set in the Prefs = UTF-8.
TextPad 8.16.0 64bit in English and TextPad 9.1.0 64bit in French, on two separate Windows installations
Post Reply