Cleaning hi-ascii out of a database

Paul · Post by **Paul** » Sun Nov 11, 2001 12:05 pm

I regularly have to parse new editions of a database to use in my system, and one problem that has to be solved is to get rid of any characters higher than dec126 (hex 7E) because my system chokes when it sees these. This doesn't need to be fancy, i.e., just replacing everything with a tilde would be fine.

Try as I might, though, I haven't figured out a way to scan through the whole database and find these offending characters. I thought maybe [\x73-\x255] would work, but it doesn't seem to. Any ideas?

Andreas · Post by **Andreas** » Mon Nov 12, 2001 2:02 pm

[\x7e-\xff]

you got mixed up between hex and dec.

Paul · Post by **Paul** » Mon Nov 12, 2001 3:43 pm

You're right. I wrote it wrong in the message. The way you wrote it, though, is the way I tried and the way that doesn't work. So it doesn't change anything with respect to my problem or my real question.

Ed Orchard · Post by **Ed Orchard** » Tue Nov 13, 2001 8:01 am

From TextPad help [:token:] will search for "Any of the characters defined on the Syntax page for the document class, or in the syntax definition file if syntax highlighting is enabled for the document class."

However, this does not seem to work - maybe I'm not setting it up right. But if all used punctuation characters were added to the syntax page. then a search on [^[:alnum:][:cntrl:][:token:]] ahould do it

Community

Cleaning hi-ascii out of a database

Cleaning hi-ascii out of a database

Re: Cleaning hi-ascii out of a database

Re: Cleaning hi-ascii out of a database

Re: Cleaning hi-ascii out of a database