Find values not matching a pattern

encleadus · Post by **encleadus** » Wed Jul 25, 2007 5:20 pm

Hello,
I am trying to find values that do not match a particular pattern. My data is in the format:

123P45678
107N94861
132P59304

where:
the first character should always be a 1,
the second character should always be a 0, 1, 2, or 3,
the third character should be any number 0-9,
the fourth character should be a capital P or a capital N,
the last five characters should each be any number 0-9.

So I came up with:
^[1]{1}[0-3]{1}[0-9]{1}[PN][0-9]{5}
which seems to match all of my data in the current format. What I can't figure out is how to match any data that doesn't fit this pattern, i.e.

S23P45678
107Z94861
132P59B04

I tried putting the caret (^) before the brackets, i.e. [^1], [^0-9] but this doesn't find the incorrect values above. Any ideas?

Cheers,
Justin

BenjaminB · Post by **BenjaminB** » Mon Jul 30, 2007 12:53 pm

What do you want to do with the lines not matching? If you just want to delete them, you could try this:

Use your RegEx to "Mark All"
Search->Invert all Bookmarks
Edit->Delete->Bookmarked Lines

I don't know of other ways to match lines that don't match. Sounds kind of contrary to the concept of Regular Expressions.

encleadus · Post by **encleadus** » Mon Jul 30, 2007 4:33 pm

I wanted to identify any data that didn't match the pattern as a validation step. If there was any data that didn't match this pattern, I could identify it and correct the problem.

I hadn't used bookmarks before, but that seems to work. Instead of Edit->Delete->Bookmarked Lines, I just did Edit->Copy Other->Bookmarked Lines into a new document to see if there were any lines that didn't match the pattern.

I guess I was wondering if it is possible to write something that would match anything besides the original pattern of the data. Orginal regex:
^[1]{1}[0-3]{1}[0-9]{1}[PN][0-9]{5}

So something crazy like:
^[2-9A-z]]{1}[4-9A-z]{1}[A-z]{1}[A-M|O|Q-Z][A-z]{5}

Thanks for the bookmark tip though!

Kaizyn · Post by **Kaizyn** » Tue Jul 31, 2007 8:08 pm

encleadus, are you looking for a pattern that behaves like this one?

^[^1][^0-3][^0-9][^PN][^0-9]{5}

(Also, if you're only matching one occurrence of something, there's no need for the {1} following that part of the pattern.)

ben_josephs · Post by **ben_josephs** » Tue Jul 31, 2007 9:36 pm

That matches only those lines that are wrong in every position. Encleadus wants a regex that matches lines that are wrong in any position.

This matches an entire line that is wrong in the first position or the second position or the third position...

Find what: ^([^1]|.[^0-3]|..[^0-9]|...[^PN]|....[^0-9]|.....[^0-9]|......[^0-9]|.......[^0-9]|........[^0-9]).*

[X] Match case
[X] Regular expression

This assumes you are using Posix regular expression syntax:

Configure | Preferences | Editor

[X] Use POSIX regular expression syntax

In WildEdit it's much simpler:

Find what: ^(?!1[0-3][0-9][PN][0-9]{5}).+

[X] Regular expression
[X] Match case

Options
[X] '.' does not match a newline character

encleadus · Post by **encleadus** » Wed Aug 01, 2007 3:00 pm

@ben_josephs: Thanks so much, that works great! I was missing using the | for alternate matching. It seems so simple now looking at it

Anything but 1 as the first character, or anything but 0-3 as second or anything but 0-9 as third or ... and so on.

@Kaizyn: That did work, but only when the values were all wrong in all parts of the field. Thanks for the {1} tip though, I'm still learning regular expressions.

Cheers,
Justin