Cleaning hi-ascii out of a database

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
Paul

Cleaning hi-ascii out of a database

Post by Paul »

I regularly have to parse new editions of a database to use in my system, and one problem that has to be solved is to get rid of any characters higher than dec126 (hex 7E) because my system chokes when it sees these. This doesn't need to be fancy, i.e., just replacing everything with a tilde would be fine.

Try as I might, though, I haven't figured out a way to scan through the whole database and find these offending characters. I thought maybe [\x73-\x255] would work, but it doesn't seem to. Any ideas?
Andreas

Re: Cleaning hi-ascii out of a database

Post by Andreas »

[\x7e-\xff]

you got mixed up between hex and dec.
Paul

Re: Cleaning hi-ascii out of a database

Post by Paul »

You're right. I wrote it wrong in the message. The way you wrote it, though, is the way I tried and the way that doesn't work. So it doesn't change anything with respect to my problem or my real question.
Ed Orchard

Re: Cleaning hi-ascii out of a database

Post by Ed Orchard »

From TextPad help [:token:] will search for "Any of the characters defined on the Syntax page for the document class, or in the syntax definition file if syntax highlighting is enabled for the document class."

However, this does not seem to work - maybe I'm not setting it up right. But if all used punctuation characters were added to the syntax page. then a search on [^[:alnum:][:cntrl:][:token:]] ahould do it
Post Reply