Converting Website construction to UNICODE

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
User avatar
Mike Olds
Posts: 226
Joined: Wed Sep 30, 2009 3:27 pm
Contact:

Converting Website construction to UNICODE

Post by Mike Olds »

Greetings,

I am having a devil of a time figuring out how to convert my site to Unicode.

By Unicode I mean being able to see diacriticals in the font used in the textpad window, (versus & # 000; number ids), being able to input the diacriticals by typing in something like ALT000 or as in IAST Unicode & # 000; and having them appear correctly in the textpad window; being able to view the diacriticals in my web browser using 'view in web browser' (by way of an internal apache web server); and finally to view the correct diacriticals in the browser window from the installed web page.

I am using TP 8.1.2 64bit on a Windows 7 ultimate PC.

Currently I use a custom font (specifying charset="windows-1252") and call out the special characters with "<span class="mozp"></span>"
the files are then converted to IAST Unicode prior to uploading to the site:
http://obo.genaud.net

The advantage of having the diacriticals appear in the source htm files is obvious (I need to read the copy as I go along!), but so far I have not figured out how to manage it.

I have taken sample files and have tried converting the charset to "UTF-8" and leaving it as "Windows 1252", and have tried 'save as' UTF-8 and the default which is ANSI.

1. If I set up as charset="utf-8"; and 'save as' "utf-8"; and copy a selection of text with diacriticals from the browser window from my site:
a. I see the correct diacriticals in the textpad window
b. I am unable to input diacriticals in any way I have tried
c. view in web browser I get jibberish: Ekaɱ samayaɱ Bhagav� S�vatthiyaɱ
d. uploading to on-line site, ditto jibberish

2. Same situation using charset="windows 1252"

3. If I set up as charset="utf-8"; and 'save as' "utf-8"; and copy a selection of text with diacriticals from the source code page from my site:
a. I see only the character number id in the textpad window: Eka& # 625; samaya& # 625; Bhagav& # 257; S& # 257;vatthiya& # 625; viharati<br />
FOR
Ekaɱ samayaɱ Bhagavā Sāvatthiyaɱ viharati

b. I am able to input diacriticals only by copying and pasting the entire number id.
c. view in web browser comes up with the diacriticals properly displayed
d. uploading to on-line site, ditto diacriticals properly displayed

I have a macro keypad to input diacriticals. If I were able to see the diacriticals correctly in Textpad, and have them display properly in a web browser, I would not be concerned about how I went about inputting the characters.

I apologize for the way I present this problem. I am completely confused by this issue. Please feel free to ask for more information.

Any help you can provide with this issue will be appreciated.
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhamma ... ts.new.htm
User avatar
AmigoJack
Posts: 515
Joined: Sun Oct 30, 2016 4:28 pm
Location: グリーン ヒル ゾーン
Contact:

Post by AmigoJack »

Let's separate all the problems you're throwing into one bowl:
  1. TextPad 8 is fully capable of Unicode: it can read it, save it, and display it. Displaying text is bound to which fonts are used and unbound to TextPad specifically - if you use i.e. "Courier New" then you shouldn't wonder why it cannot render more than the ASCII range characters. But using i.e. "Tahoma" will display most of them. Read Emoji support.
  2. "Entering" text that you don't find on your keyboard is not a problem - either change your keyboard layout or copy from the internet. This is unbound to TextPad. Read How to add special characters.
  3. I have no clue why you have problems copying from a/your website or vice versa. Instead of linking to your homepage only you could have linked to a specific page that you would have left online. This is no TextPad problem either. Let's first try a minimum working and reconstructable example:
    1. Create a new file with this content:

      Code: Select all

      <!DOCTYPE html>
      <html><head>
       <meta charset="UTF-8">
       <title>グリーン</title>
      </head><body>
       PaṇṇÄ�saɱ, �—�˜�™�š�›, á„�á„Žá„�á„�á„‘
      </body></html>
    2. Save it with the encoding "UTF-8".
    3. Upload it to your server.
    4. Browse to it and publish its URI here.
    It should display fine or at least give me/us the chance to analyze something.
User avatar
Mike Olds
Posts: 226
Joined: Wed Sep 30, 2009 3:27 pm
Contact:

Post by Mike Olds »

Thanks Jack,

I appreciate your reply and it looks like you have solved a couple of things. Just now I have no time to deal with this so will get back sometime later. Just to let you know I am paying attention to your help.

best,
mo
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhamma ... ts.new.htm
User avatar
Mike Olds
Posts: 226
Joined: Wed Sep 30, 2009 3:27 pm
Contact:

Post by Mike Olds »

Hello Jack,

The first problem is that I do not have access to my own website so I cannot post the images you request as you requested. Here, I hope, are the images posted on Google Pictures:

Images of source .html document, saved as UTF-8 with BOM. The second as served by an internal web server.

https://photos.google.com/photo/AF1QipP ... hk9BHASLUC

https://photos.google.com/photo/AF1QipP ... qSNYcZwm2S

As you can see the .html file shows up with the characters you provided; through the server, jibberish.

I was using Courier New and if I can actually make this transition to UNICODE I will change that to Tahoma per your suggestion.


EDIT:
Here are two more screenshots with a different setup showing the second half of the problem: enter number Id appears OK, but does not show gliph in textpad:

https://photos.google.com/photo/AF1QipO ... RgQMiK5hfI

https://photos.google.com/photo/AF1QipO ... j_R4In4J86

A source-code page using a bit from another site using Unicode showing the character number Id. The second showing the result through my server. So when entered by number ID the characters show up properly on the web page, but I cannot read the source page.

How do I go about entering the code: ṃ so that it will appear as M underdot in my textpad window.

SECOND EDIT:

Re input. Is there a monospaced Unicode font available for use on Textpad? And what is happening that when I copy the web output from a page where the source-code had only character number ids (as per the sample) it appears with the correct diacriticals when pasted into a new Textpad document. What font is being used there that I can also use?

THIRD EDIT:
Copied from served web page, pasted into utf-8 saved textpad page comes up just fine in text pad, but ... see next:
1. 1. 1
Evaṃ me sutaṃ ekaṃ samayaṃ bhagav� s�vatthiyaṃ viharati jetavane an�thapiṇ�ikassa �r�me tatra kho bhagav� bhikkhū �mantesi bhikkhavoti. Bhadanteti kho te bhikkhū bhagavato paccassosuṃ. Bhagav� etadavoca.

N�haṃ bhikkhave aññaṃ ekarūpampi samanupass�mi, yaṃ evaṃ purisassa cittaṃ pariy�d�ya tiṭṭhati. Yathayidaṃ bhikkhave itthirūpaṃ. Itthirūpaṃ bhikkhave purisassa cittaṃ pariy�d�ya tiṭṭhatīti.

1. 1. 2
N�haṃ bhikkhave aññaṃ ekasaddampi samanupass�mi, yaṃ evaṃ purisassa cittaṃ pariy�d�ya tiṭṭhati yathayidaṃ bhikkhave itthisaddo. Itthisaddo bhikkhave purisassa cittaṃ pariy�d�ya tiṭṭhatīti.

Using View in Web Browser from Textpad the above comes up like this:

1. 1. 1 Evaṃ me sutaṃ ekaṃ samayaṃ bhagav� s�vatthiyaṃ viharati jetavane an�thapiṇ�ikassa �r�me tatra kho bhagav� bhikkhū �mantesi bhikkhavoti. Bhadanteti kho te bhikkhū bhagavato paccassosuṃ. Bhagav� etadavoca. N�haṃ bhikkhave aññaṃ ekarūpampi samanupass�mi, yaṃ evaṃ purisassa cittaṃ pariy�d�ya tiṭṭhati. Yathayidaṃ bhikkhave itthirūpaṃ. Itthirūpaṃ bhikkhave purisassa cittaṃ pariy�d�ya tiṭṭhatīti. 1. 1. 2 N�haṃ bhikkhave aññaṃ ekasaddampi samanupass�mi, yaṃ evaṃ purisassa cittaṃ pariy�d�ya tiṭṭhati yathayidaṃ bhikkhave itthisaddo. Itthisaddo bhikkhave purisassa cittaṃ pariy�d�ya tiṭṭhatīti.
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhamma ... ts.new.htm
User avatar
AmigoJack
Posts: 515
Joined: Sun Oct 30, 2016 4:28 pm
Location: グリーン ヒル ゾーン
Contact:

Post by AmigoJack »

Mike Olds wrote:I cannot post the images you request as you requested
I don't want picutes, I want plain documents (aka the "website"). Also I'd have to log in to Google to view your pictures, which I'm unable to. Plus: pictures can't be analyzed, only viewn. If you have no access then reply when you do have.
Mike Olds wrote:enter number Id
Stop that. Either you use HTML entities all along and you don't have to care for text encodings (such as UTF-8) or you do use text encodings but then don't need entities anyway. Mixing both makes no sense.
Mike Olds wrote:Is there a monospaced Unicode font available for use on Textpad?
TextPad is not bound to fonts, your operating system is. Whatever font you find and install it in Windows will be available to TextPad.
Mike Olds wrote:the web output from a page where the source-code had only character number ids
The "output" is what each internet browser calls "view source". Anything else is the "source" being interpreted. Do you understand that? Your HTML entities are displayed as the appropriate interpretations. Display versus source. The source is what you edit in TextPad.
Mike Olds wrote:it appears with the correct diacriticals when pasted into a new Textpad document. What font is being used there that I can also use?
I have no clue which context you mean here. TextPad uses different fonts based on which document class you currently use: Configure > Preferences > Document Classes > Default > Font may be what you want to hear.
Mike Olds wrote:View in Web Browser from Textpad the above comes up like this
Do that after steps 3.a and 3.b of my instructions.
User avatar
Mike Olds
Posts: 226
Joined: Wed Sep 30, 2009 3:27 pm
Contact:

Post by Mike Olds »

Thanks Jack,

I see that I am trying your patience here. I am afraid I am unable to work around the obstacles. I understand your reluctance to use Google, it was a last option; You can see my website by using the link in my signature, but I am not sure what that will gain you as it has nothing to do with what I do.

To clarify: I work on a desktop PC with an internal web server. I write/code using textpad, use a custom designed font to input diacriticals. I then upload this work to a second web server I have here at home to see what the final product will look like (and keep the final work separate from the working files). Periodically I send an SD card with the whole web to the site admin at obo.genaud.net. He runs a script on my work to produce the content of the obo site.

Thank you for your effort, I will let this issue go and remain with my old method meanwhile digging around to see if I can learn more about what I need to understand to make this work.

To forestall the frequent suggestion to change editors: This I cannot do. I have been using TP for a very long time and see nowhere anything equal to the clip libraries for convenience in editing content and coding.
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhamma ... ts.new.htm
User avatar
AmigoJack
Posts: 515
Joined: Sun Oct 30, 2016 4:28 pm
Location: グリーン ヒル ゾーン
Contact:

Post by AmigoJack »

Mike Olds wrote:You can see my website by using the link in my signature
I did already, otherwise the example in my first reply wouldn't have included Paṇṇ�saɱ.
Mike Olds wrote:a desktop PC with an internal web server
That's way too vague: it could be misconfigured, as in sending wrong HTTP headers that let clients (speak: the internet browser) handle it wrongly (i.e. not as UTF-8). I'm unable to analyze that, hence my suggestion to do that on what I can access: your website.
Mike Olds wrote:use a custom designed font to input diacriticals
This sounds like creating more problems than to solve any. Use well-known fonts, just like Tahoma or Courier New, as both can handle lots of characters with diacritics and beyond.
User avatar
Mike Olds
Posts: 226
Joined: Wed Sep 30, 2009 3:27 pm
Contact:

Post by Mike Olds »

Hello Jack

Custom Font inconvenient. Of course this is why I am interested in setting up Unicode. This goes back 20 some years to where inputting diacriticals was a big problem and a custom font was the best solution.

No offense, but I don't see allowing a stranger to dig into my PC setup. I could send files. I have checked my conf. file and it looks like it should have no problems with unicode. I also copied a unicode page from another site, saved it utf-8 with Arial Unicode font and it displays properly on both my desktop PC (via apache) and my home server.

This far I could say I had set up use of Unicode in Textpad (I could enter the numerical character code, no problem) but the issue is the display in the working source document.

The point I don't seem to be able to state well is that I need to be able to read the text in a file I am working on: that is that what I see in the file is the glyph with the accent, not it's number id. I cannot read a page full of numbers. The mystery to me is that when I copy a page with diacriticals it displays them properly in textpad. This tells me that I can display the glyphs properly, but when I go to view the file through a browser, it gets messed up.
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhamma ... ts.new.htm
User avatar
AmigoJack
Posts: 515
Joined: Sun Oct 30, 2016 4:28 pm
Location: グリーン ヒル ゾーン
Contact:

Post by AmigoJack »

Mike Olds wrote:dig into my PC setup
copied a unicode page from another site, saved it utf-8 with Arial Unicode font
enter the numerical character code
not it's number id
when I copy a page
All those phrases indicate you're either missing basics or you misunderstood something essential.

Publishing one of your test documents to your website nowhere means I dig thru your computer. Copying a "page" is a wrong term - you either want to copy pure text or you want to entirely save the HTML document being displayed in your internet browser as a file (HTML format, obviously). Saving a document in an encoding (such as UTF-8) is unbound to fonts. "Numerical character codes" don't exist, they're entities for code points. "It's" versus "its".

Start off with this very page: save it as HTML file (hit CTRL+S) and view it in TextPad - all non-latin letters should show up fine. Similarily: just publish this saved document thru any of your "intern" HTTP servers and go check the outcome - if it's weird then it's their fault.
User avatar
Mike Olds
Posts: 226
Joined: Wed Sep 30, 2009 3:27 pm
Contact:

Post by Mike Olds »

Hello Jack,
All those phrases indicate you're either missing basics or you misunderstood something essential.
Haha. No doubt.

It's and its is a long story: victim of an early-grade experiment in scan reading which has only begun to be corrected by computer's spell-checkers.

Publishing a unicode html document with many diacriticals was what I just described doing. On both the server on my PC and my other server the diacriticals come up just fine.

Again, the problem is not that part. It's the viewing of the diacriticals in the textpad document that is the issue.

I have no problem inputting ɱ in an html document and seeing the 'mg' 'ɱ' character in the published web page even without saving as utf-8 or specifying such in the meta-data. What I want to see is the 'mg' in the .htm 'pure text' file I am working on in Textpad.

If I publish the page with the ɱ (set up for utf-8 all around) it shows up properly in the browser. If I copy the character from the browser and paste it into the same document I see it in the pure text html file also correctly, but it displays when published as "ɱ". Neither way do I get what I need.

I have tried changing the save-as parameters, changing the font, changing the default character encoding, changing the meta-data. Nothing gives me what I want.

I am feeling very guilty about using your time like this. Please know that I really appreciate the effort. We just don't seem to be connecting on the issue bothering me. Please feel free to abandon the effort. No problem.
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhamma ... ts.new.htm
User avatar
Mike Olds
Posts: 226
Joined: Wed Sep 30, 2009 3:27 pm
Contact:

Solution

Post by Mike Olds »

Greetings, and thanks for your help.

The problem boiled down to two things:

1. Windows apparently has several ways of inputting uincode. In my case, to get A+macron, I must type: ALT(plus the numpad +)(no 'x') 100(no semicolon).
Ä€.

The next problem is that for some reason this does not display correctly on Internet Explorer. It shows up fine with Firefox and Chrome.
In I.E., this appears as: Ā

So the problem needs to be corrected in I.E. (or my version thereof).

Again thanks for the effort.

EDIT: So long I.E. I.E. 11 will properly display the unicode characters if 'Encoding' is manually set each time to 'utf-8'. Setting 'Auto-Select' does not work properly apparently (from a Google search).
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhamma ... ts.new.htm
User avatar
Mike Olds
Posts: 226
Joined: Wed Sep 30, 2009 3:27 pm
Contact:

Follow-up issues

Post by Mike Olds »

Greetings,

I am able now to input unicode characters into TP and see the proper gliph.

This is by no means a straight-forward issue. Windows sometimes accepts ALT (keyboard +) 00a0 /ALT and sometimes does not and sometimes substitutes another character altogether. For the most part all characters past the alt+ must be input from the regular keyboard. Sometimes a character that cannot be entered as per the above can be entered pressing ALT u (keyboard +) 00a0 /ALT. Sometimes it is necessary to use the IAST Unicode.

I am using Courier New as my TP default font and it has all the glyphs I require.

However there is a problem of the entry code of numerous characters conflicting with built-in-but-removable shortcuts, namely:

The shortcut that brings up the 'Edit' menue,
and the shortcut that brings up the 'Configure' menu.

However! Removing the shortcuts under Configure>preferences does not solve the problem of the menues popping up.

For example  when input into TP brings up the Configure menu. The code is ALT u (numpad +) 00c2 /ALT
Presumably the ALT and the C are the conflict, but I have no shortcut to the configure menue!



Any suggestions?
Thank you, and
Best Wishes,
Obo
http://buddhadust.net/
check out the What's New? Oblog:
http://buddhadust.net/dhammatalk/dhamma ... ts.new.htm
User avatar
AmigoJack
Posts: 515
Joined: Sun Oct 30, 2016 4:28 pm
Location: グリーン ヒル ゾーン
Contact:

Re: Solution

Post by AmigoJack »

Mike Olds wrote:Windows
Which version? They differ a lot.
Mike Olds wrote:to get A+macron, I must type: ALT(plus the numpad +)(no 'x') 100(no semicolon)
Of course neither 'x', nor semicolon - that's what HTML wants, not everything else as well. And you forgot the '+' and checking if your registry settings are correct.
Mike Olds wrote:I.E. I.E. 11 will properly display the unicode characters if 'Encoding' is manually set each time to 'utf-8'
Make sure to save your document in UTF-8 encoding with a BOM.
Post Reply