Diacritics disappear

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
User avatar
Mike Olds
Posts: 226
Joined: Wed Sep 30, 2009 3:27 pm
Contact:

Diacritics disappear

Post by Mike Olds »

Hello,

Still another problem!

Windows 7, 64 bit, TP 8.5.

Working on a .txt file (but this is also happening on the same material in an .htm file). Very long: 52763 lines.

A few characters with diacriticals and the m-dash, ṃ ṭ � —

and not consistantly, that is, only a very few cases
appear with the �
when corrected, and saved, and closed, and reopened, they are still �


Here is one example:

meaning "two," e.g. yuga�� v� n�vaṃ two boats

what belongs there is the ṃ (not two letters) all these missing characters are being indicated by two of these � sometimes three, sometimes four.

I retyped the whole word, and also retyped the whole line and it is still happening.

I have tried different methods for inputting the characters: via the TP tool; and through copy and paste from a set of characters. Same thing. Additionally, this file has thousands of these characters and they have no problem. Originally the file had 18 cases, this morning only one. But that corrected, file closed, reopened and a new case appeared in a different word.

Yesterday I managed to upload a file without errors to the web and checked it there. No errors overnight, but the source file on my PC has the above.

I am wondering if there is some problem with saving a very large file. But I think this is not that large compared to others.

EDIT: Just now I tried downloading the on-line file with no errors back to my desktop.
The error that is in the original did not appear, but another one did!
Up�sana<sup>1</sup> (neuter) [from up��sati]
here the missing character is the �

This was the first error I found this morning in the desktop source file. When corrected there it stuck, but the 'yugam' error appeared.

In other words the problem seems to be happening in specific words.

EDIT 2: I tried something else: I made a copy of the file and reduced its size deleting lines above and below the problem, corrected the problem, saved the file, closed the file, opened the file and the problem was solved.

The initial conclusion is that the problem relates to file size. ... but why would it occur only in specific words? And, of course, what can I do about it?

Any help for this maddening problem will be much appreciated.
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhamma ... ts.new.htm
User avatar
AmigoJack
Posts: 533
Joined: Sun Oct 30, 2016 4:28 pm
Location: グリーン ヒル ゾーン
Contact:

Post by AmigoJack »

MOST LIKELY both you and TextPad just assume the encoding of that text/html file. But none of you made sure, right?

To be on the safe side:
  1. start TextPad
  2. select File > Open or press CTRL+O to load an existing file
  3. in the Open File dialog make sure in the combobox "Encoding" an appropriate entry is chosen
  4. when the file is loaded, look again if there are problems
Otherwise (i.e. just double clicking your text/html files in the Explorer, or right-clicking on them and choosing "open in TextPad") the encoding is guessed, which is not bulletproof - it can go wrong from times to times.

When neither you nor the file itself somewhere DEFINES which encoding is used this may happen. But when you open files and EXPLICITLY SAY in which encoding it should be read everything should work as expected.
User avatar
Mike Olds
Posts: 226
Joined: Wed Sep 30, 2009 3:27 pm
Contact:

Post by Mike Olds »

Hello Amigo,

Apologies for the late response. I was not getting notifications about responses and changed my email and found myself inactivated and was only activated just now.

I tried your solution but the problem still exists.
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhamma ... ts.new.htm
User avatar
AmigoJack
Posts: 533
Joined: Sun Oct 30, 2016 4:28 pm
Location: グリーン ヒル ゾーン
Contact:

Post by AmigoJack »

I was not getting notifications about responses
This has happened multiple times - I even suggested to just visit this board regularly instead of relying to notifications and I fear you yet have no concept to keep track of what you posted where. It could be easy, tho.
changed my email and found myself inactivated
This is how the board works: you have to confirm that change, and the instruction should have told you so. It will also make sure you havent mistyped the address.
I tried your solution but the problem still exists.
I dont believe that - if you tried that with a file that is already not stored correctly then (of course) TextPad cant magically fix it. Again: wherever you got your file from, save it anew to be sure it has the correct encoding (UTF-8 hopefully). It can be as easy as creating a new file with TextPad, copying the text from anywhere, pasting it to the new file, then save it in UTF-8 encoding (not "default", select it explicitly).

Then try again what I suggested. It would be far easier if we could tell you "then open the as binary, make a screenshot of the relevant position in there and attach it here" but I fear we will never get you there...
Last edited by AmigoJack on Mon Jan 18, 2021 4:31 pm, edited 1 time in total.
User avatar
bbadmin
Site Admin
Posts: 879
Joined: Mon Feb 17, 2003 8:54 pm
Contact:

Post by bbadmin »

With regard to this problem can you please send a sample file that demonstrates the problem to "support@textpad.com"

Thanks.
User avatar
Mike Olds
Posts: 226
Joined: Wed Sep 30, 2009 3:27 pm
Contact:

Post by Mike Olds »

Hello,

I just sent you the file (5.5 MB!) and a screenshot of one example.
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhamma ... ts.new.htm
User avatar
Mike Olds
Posts: 226
Joined: Wed Sep 30, 2009 3:27 pm
Contact:

Post by Mike Olds »

Hello Amigo,

Quote:
I was not getting notifications about responses
This has happened multiple times - I even suggested to just visit this board regularly instead of relying to notifications and I fear you yet have no concept to keep track of what you posted where. It could be easy, tho.

I was not getting responses apparently because ATT was putting them into the spam or trash which is why I decided to change the e-mail address. I am getting responses just fine now. It is no difficult thing for me to keep track of what I have posted where as this is vitrually the only forum where I am a member and post. Further the problem was that I was sometimes getting responses, so the expectation was that I would get a response which was the reason I did not check the previous times. Further than that there are a number of posts of mine which have got no responses at all so that no response is also expected.

Quote:
changed my email and found myself inactivated
This is how the board works: you have to confirm that change, and the instruction should have told you so. It will also make sure you havent mistyped the address.

Well it was a surprise to me. But I was not complaining, I was explaining why it took me a week to get back to you.

Quote:
I tried your solution but the problem still exists.
I dont believe that - if you tried that with a file that is already not stored correctly then (of course) TextPad cant magically fix it. Again: wherever you got your file from, save it anew to be sure it has the correct encoding (UTF-8 hopefully). It can be as easy as creating a new file with TextPad, copying the text from anywhere, pasting it to the new file, then save it in UTF-8 encoding (not "default", select it explicitly).

Well you can believe what you want. What I see is that the problem is still there. What I know is that I tried your method and did all the variations suggested and a bunch more and the problem is still there.
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhamma ... ts.new.htm
User avatar
Mike Olds
Posts: 226
Joined: Wed Sep 30, 2009 3:27 pm
Contact:

Post by Mike Olds »

I have tried two more things:

I retrieved the source file from the on-line version of the .htm file. With all instances of TP closed, I opened a new instance and pasted in the source code, saved UNIX utf-8 BOM (the file has the proper headings and has no html errors per HTML Validator) the file and searched for the �, No instances.
Then I closed TP and reopened it and loaded the file, searched for the � and found one instance. I corrected the instance and searched for the � and found no instance. Closed/opened and another instance of the � shows up in a different place.

The second thing I tried was taking this very same file and opening it in Notepad++ searching for the � there was no instance found. Closed/opened, still no instance.
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhamma ... ts.new.htm
User avatar
AmigoJack
Posts: 533
Joined: Sun Oct 30, 2016 4:28 pm
Location: グリーン ヒル ゾーン
Contact:

Post by AmigoJack »

Then I closed TP and reopened it and loaded the file
And you are sure that you first started TextPad (no need to close it before) without loading the file at the same? That you really pressed CTRL+O or chose File > Open to select an existing file and choosing "UTF-8" as encoding (and not rely on the default entry "default")?

Because it sounds otherwise.
User avatar
Mike Olds
Posts: 226
Joined: Wed Sep 30, 2009 3:27 pm
Contact:

Post by Mike Olds »

Hello Amigo,

Yes I am sure, and because I am never sure of anything any more I just did it twice again just now.

Once on the .htm file, which has the proper headings and passed HTML Validator and has the BOM as well

and

Once with the .txt file which opened with the error.
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhamma ... ts.new.htm
User avatar
AmigoJack
Posts: 533
Joined: Sun Oct 30, 2016 4:28 pm
Location: グリーン ヒル ゾーン
Contact:

Post by AmigoJack »

Once on the .htm file, which has the proper headings and passed HTML Validator
This is irrelevant: TextPad does not care if the file contains HTML or anything else. As a result, if you write "UTF-8" into a HTML tag_s attribute it has nothing to do with how TextPad treats it.
has the BOM as well
Even that is in rare cases not enough. Stick to the "encoding" combobox in the "open file" dialog and select "UTF-8":

Image
User avatar
Mike Olds
Posts: 226
Joined: Wed Sep 30, 2009 3:27 pm
Contact:

Solved

Post by Mike Olds »

This problem was taken up by TextPad support and has been solved by them.
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhamma ... ts.new.htm
User avatar
AmigoJack
Posts: 533
Joined: Sun Oct 30, 2016 4:28 pm
Location: グリーン ヒル ゾーン
Contact:

Post by AmigoJack »

With regard to this problem can you please send a sample file
This problem was taken up by TextPad support and has been solved by them
Can I please have more details on this? I_d like to know if Mike has indeed found a bug, or if it was an unlucky combination of issues and TextPad acted as expected. Or how to reproduce it easily.
User avatar
Mike Olds
Posts: 226
Joined: Wed Sep 30, 2009 3:27 pm
Contact:

Post by Mike Olds »

Hello Amigo,

I am assuming the problem was solved with the updated 8.5.1:

When opening files, 3-byte UTF-8 characters that straddled multiples of 4KB may have been replaced with "?".

You can view/download the files from this link:

http://buddhadust.net/dhammatalk/dhamma ... O.01.12.21

Or to open the files directly:

http://buddhadust.net/backmatter/glosso ... ed/ped.htm

http://buddhadust.net/backmatter/glosso ... d.utf8.txt

The problem does not show up when the file is simply downloaded; you need to save the file, close TP, and open it again and then search for �

and you need to not have upgraded your TP from 8.5

And, who knows? This problem may have been unique to Windows 7 64bit
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhamma ... ts.new.htm
User avatar
AmigoJack
Posts: 533
Joined: Sun Oct 30, 2016 4:28 pm
Location: グリーン ヒル ゾーン
Contact:

Post by AmigoJack »

If I interpret all that correctly:
  • downloaded your TXT file
  • chose "save as"
  • closed TextPad, then opened that new file
  • "straddled multiple of 4KB" is meaning the filesize and 4 KiB is meant (4096, not 4000)
  • one such position is the character Ä�, followed by riyapakkha (cf <i>sub voce</i> in the logical line 1592 at byte positions 0x23FFE thru 0x24000
...then I_m unable to reproduce any problem with TextPad 8.4.2. I read the changelog before asking for details because it was so vague, and right now I_m still not entirely sure. But it seems like ONLY TextPad 8.5 is affected.
Post Reply