Finding Non-ASCII characters

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
redcairo
Posts: 39
Joined: Fri May 06, 2011 6:34 am

Finding Non-ASCII characters

Post by redcairo »

Hi. Search did not turn up anything helpful on this topic.

I work with a system that is choking on any file that has a non-ascii character in the text such as a MS Word "smartquote" as one example. But these are fiendishly difficult to "see" plainly in a lot of text.

I'm looking for a regular expression which will basically "find" any character beyond the standard keyboard characters, so I can find whatever might be buried in some files and throwing the error.

Would be SUPER appreciative if anyone could give me a clue in how to go about this. I've worked hard on regex stuff but so far it's always been on things that were regular chars.

RC (PJ)
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Is this what you need:
[\x80-\xFF]
?
redcairo
Posts: 39
Joined: Fri May 06, 2011 6:34 am

Post by redcairo »

Thank you! I was just coming back here to paste in:

[^\x00-\x7F]

Which I found elsewhere and believed was the answer. I note they're a bit diff...
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

[\x80-\xFF] means every character in the range hex 80 (128) to FF (255).
[^\x00-\x7F] means every character not in the range hex 00 (0) to 7F (127).

Thery are equivalent if the text consists entirely of 8-bit characters. Yours is better because it works with characters of arbitrary width.
Post Reply