UTF-8 problems with RTL languages
Posted: Sat Apr 16, 2022 4:29 pm
I'm trying to prepare property files for internationalising a Java application, and I've noticed some issues with Textpad's UTF-8 support for right-to-left languages (Arabic, Hebrew).
The issues I'm seeing are:
1) A sequence of words is displayed left-to-right rather than right-to-left, even though the characters within each word are correctly displayed right-to-left.
2) Selecting the leftmost character in a word and copying it actually results in the first (rightmost) character being copied.
3) Cursor movement is erratic -- pressing right-arrow moves the caret from left to right, but it sometimes ends up in the middle of characters. I thought this might be because when moving right over the leftmost character, the caret is moving by the width of the first (rightmost) character in the word. However, the line length is also wrong. Consider this line:
(which displays in Textpad with the two Hebrew words in the opposite order to the correct order as shown here). When you press End to go to the end of the line, the cursor position ends up about one character width short of the end. Some characters seem to have a width of zero when you move the caret across them.
I know Textpad is coming rather late to the party with Unicode, so although I prefer Textpad overall, I still end up needing Notepad++ when dealing with multi-alphabetic text, since Notepad++ handles it correctly (and also has text direction commands to display a file as LTR or RTL). This is rather a pity, and I hope that some more effort can be put into Unicode support to fix these problems.
The issues I'm seeing are:
1) A sequence of words is displayed left-to-right rather than right-to-left, even though the characters within each word are correctly displayed right-to-left.
2) Selecting the leftmost character in a word and copying it actually results in the first (rightmost) character being copied.
3) Cursor movement is erratic -- pressing right-arrow moves the caret from left to right, but it sometimes ends up in the middle of characters. I thought this might be because when moving right over the leftmost character, the caret is moving by the width of the first (rightmost) character in the word. However, the line length is also wrong. Consider this line:
Code: Select all
background\ colour = צבע רקע
I know Textpad is coming rather late to the party with Unicode, so although I prefer Textpad overall, I still end up needing Notepad++ when dealing with multi-alphabetic text, since Notepad++ handles it correctly (and also has text direction commands to display a file as LTR or RTL). This is rather a pity, and I hope that some more effort can be put into Unicode support to fix these problems.