some characters cannot be converted to code page 1252

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

jschwartz13@att.net
Posts: 11
Joined: Tue Mar 31, 2020 5:33 pm

some characters cannot be converted to code page 1252

Post by jschwartz13@att.net »

I keep getting this dialog box pop up.
how do I use regex to find the characters
or to replace them ?
[\x00-\x09\x0B-\x0C\x0E-\x1F]+
did not find the characters.
running textpad 8.3
User avatar
MudGuard
Posts: 1295
Joined: Sun Mar 02, 2003 10:15 pm
Location: Munich, Germany
Contact:

Post by MudGuard »

codepage 1252 has defined characters for all 256 code points from 0 to 255.

You seem to have characters outside this range, i.e. characters with codepoint beyound 255.

Your search only looks for characters in range 0-31 ...
jschwartz13@att.net
Posts: 11
Joined: Tue Mar 31, 2020 5:33 pm

Post by jschwartz13@att.net »

please share a regex that will help me to find the characters are
not in code page 1252 so that I can stop seeing this error.
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

[\x{0100}-\x{FFFF}]
jschwartz13@att.net
Posts: 11
Joined: Tue Mar 31, 2020 5:33 pm

Post by jschwartz13@att.net »

ty !!!!!!!
that is strange.
when i copy paste from the webpage textpad
does not covert the apostrophes and quotation marks
and such .
that is why i get the error when i try to save.
how would i fix it?
should i select a different code page ?
pasted in text is :
You want things to make sense. But they won’t.
jschwartz13@att.net
Posts: 11
Joined: Tue Mar 31, 2020 5:33 pm

Post by jschwartz13@att.net »

i changed set the 'convert tabs to spaces setting ' and now the error
went away. hmmmmmm
jschwartz13@att.net
Posts: 11
Joined: Tue Mar 31, 2020 5:33 pm

Warning: Some characters cannot be converted to code page

Post by jschwartz13@att.net »

Warning: Some characters cannot be converted to code page 1252
I keep getting this Textpad warning when saving , but cannot find the characters to replace.
Please Help.
[\x{0100}-\x{FFFF}]
this search does not find any results
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Did you select Regular expression when you searched?
jschwartz13@att.net
Posts: 11
Joined: Tue Mar 31, 2020 5:33 pm

Post by jschwartz13@att.net »

yes
User avatar
AmigoJack
Posts: 550
Joined: Sun Oct 30, 2016 4:28 pm
Location: グリーン ヒル ゾーン
Contact:

Post by AmigoJack »

jschwartz13@att.net wrote:when i copy paste from the webpage
Is that website also available to us? If yes: which part should we copy so we might be able to reproduce what you experience?

Alternatively save the file in UTF-8 encoding, upload it somewhere, then provide the download link here.

ben_josephs wrote:[\x{0100}-\x{FFFF}]
Won't that skip Emojis and other codepoints beyond 0x10000? How about searching for

Code: Select all

[^\x00-\xFF]
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

That's a good point about code points ≥ U+10000, which I don't normally have cause to deal with. TextPad doesn't behave well with them. Unfortunately, and I don't have the time to investigate this further.
jschwartz13@att.net
Posts: 11
Joined: Tue Mar 31, 2020 5:33 pm

Post by jschwartz13@att.net »

you guys are soooo smart.
I will try [^\x00-\xFF] .
jschwartz13@att.net
Posts: 11
Joined: Tue Mar 31, 2020 5:33 pm

Post by jschwartz13@att.net »

the find phrase did not find this non code 1252 chars.
any ideas on a search phrase to find these pesky chars ?
the tabs and bullets get found, but the plus plus does not get found.

• performing core mathematical and statistical model development/validation;
• using Python, C++, R, SAS, and SQL or other programming languages and mathematical/statistical packages;
• contributing code to analytics libraries;
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

I am confused. Each of these regexes:
[^\x00-\xFF]
[\x{0100}-\x{FFFF}]
matches a Unicode BULLET (U+2022), which is not a Windows-1525 character, but not a CHARACTER TABULATION (U+0009) or a PLUS SIGN (U+002B), both of which are Windows-1252 characters. Is that not what you require?

Precisely which characters are not matched by these regexes that you would like to be matched by some other regex?
jschwartz13@att.net
Posts: 11
Joined: Tue Mar 31, 2020 5:33 pm

Post by jschwartz13@att.net »

I am not good at binary, so I will try to add images of what is not being found.
basically , i am trying to find and replace with blanks all non 1252 chars so that i can stop seeing these warnings every time i save the file.

how can i post 3 images to this forum stream ?











[/img]
Post Reply