Page 1 of 2

some characters cannot be converted to code page 1252

Posted: Fri May 01, 2020 8:31 pm
by jschwartz13@att.net
I keep getting this dialog box pop up.
how do I use regex to find the characters
or to replace them ?
[\x00-\x09\x0B-\x0C\x0E-\x1F]+
did not find the characters.
running textpad 8.3

Posted: Sat May 02, 2020 5:27 am
by MudGuard
codepage 1252 has defined characters for all 256 code points from 0 to 255.

You seem to have characters outside this range, i.e. characters with codepoint beyound 255.

Your search only looks for characters in range 0-31 ...

Posted: Sat May 02, 2020 7:50 am
by jschwartz13@att.net
please share a regex that will help me to find the characters are
not in code page 1252 so that I can stop seeing this error.

Posted: Sat May 02, 2020 4:29 pm
by ben_josephs
[\x{0100}-\x{FFFF}]

Posted: Sat May 02, 2020 7:07 pm
by jschwartz13@att.net
ty !!!!!!!
that is strange.
when i copy paste from the webpage textpad
does not covert the apostrophes and quotation marks
and such .
that is why i get the error when i try to save.
how would i fix it?
should i select a different code page ?
pasted in text is :
You want things to make sense. But they won’t.

Posted: Sat May 02, 2020 7:51 pm
by jschwartz13@att.net
i changed set the 'convert tabs to spaces setting ' and now the error
went away. hmmmmmm

Warning: Some characters cannot be converted to code page

Posted: Wed Aug 05, 2020 9:00 pm
by jschwartz13@att.net
Warning: Some characters cannot be converted to code page 1252
I keep getting this Textpad warning when saving , but cannot find the characters to replace.
Please Help.
[\x{0100}-\x{FFFF}]
this search does not find any results

Posted: Wed Aug 05, 2020 10:20 pm
by ben_josephs
Did you select Regular expression when you searched?

Posted: Wed Aug 05, 2020 10:57 pm
by jschwartz13@att.net
yes

Posted: Thu Aug 06, 2020 8:45 am
by AmigoJack
jschwartz13@att.net wrote:when i copy paste from the webpage
Is that website also available to us? If yes: which part should we copy so we might be able to reproduce what you experience?

Alternatively save the file in UTF-8 encoding, upload it somewhere, then provide the download link here.

ben_josephs wrote:[\x{0100}-\x{FFFF}]
Won't that skip Emojis and other codepoints beyond 0x10000? How about searching for

Code: Select all

[^\x00-\xFF]

Posted: Thu Aug 06, 2020 12:16 pm
by ben_josephs
That's a good point about code points ≥ U+10000, which I don't normally have cause to deal with. TextPad doesn't behave well with them. Unfortunately, and I don't have the time to investigate this further.

Posted: Fri Aug 07, 2020 2:25 pm
by jschwartz13@att.net
you guys are soooo smart.
I will try [^\x00-\xFF] .

Posted: Mon Sep 28, 2020 4:16 pm
by jschwartz13@att.net
the find phrase did not find this non code 1252 chars.
any ideas on a search phrase to find these pesky chars ?
the tabs and bullets get found, but the plus plus does not get found.

• performing core mathematical and statistical model development/validation;
• using Python, C++, R, SAS, and SQL or other programming languages and mathematical/statistical packages;
• contributing code to analytics libraries;

Posted: Mon Sep 28, 2020 6:46 pm
by ben_josephs
I am confused. Each of these regexes:
[^\x00-\xFF]
[\x{0100}-\x{FFFF}]
matches a Unicode BULLET (U+2022), which is not a Windows-1525 character, but not a CHARACTER TABULATION (U+0009) or a PLUS SIGN (U+002B), both of which are Windows-1252 characters. Is that not what you require?

Precisely which characters are not matched by these regexes that you would like to be matched by some other regex?

Posted: Wed Sep 30, 2020 2:42 pm
by jschwartz13@att.net
I am not good at binary, so I will try to add images of what is not being found.
basically , i am trying to find and replace with blanks all non 1252 chars so that i can stop seeing these warnings every time i save the file.

how can i post 3 images to this forum stream ?











[/img]