Find values not matching a pattern

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
encleadus
Posts: 5
Joined: Wed Jul 25, 2007 5:02 pm

Find values not matching a pattern

Post by encleadus »

Hello,
I am trying to find values that do not match a particular pattern. My data is in the format:

123P45678
107N94861
132P59304

where:
the first character should always be a 1,
the second character should always be a 0, 1, 2, or 3,
the third character should be any number 0-9,
the fourth character should be a capital P or a capital N,
the last five characters should each be any number 0-9.

So I came up with:
^[1]{1}[0-3]{1}[0-9]{1}[PN][0-9]{5}
which seems to match all of my data in the current format. What I can't figure out is how to match any data that doesn't fit this pattern, i.e.

S23P45678
107Z94861
132P59B04

I tried putting the caret (^) before the brackets, i.e. [^1], [^0-9] but this doesn't find the incorrect values above. Any ideas?

Cheers,
Justin
BenjaminB
Posts: 10
Joined: Tue Sep 12, 2006 3:09 pm

Post by BenjaminB »

What do you want to do with the lines not matching? If you just want to delete them, you could try this:

Use your RegEx to "Mark All"
Search->Invert all Bookmarks
Edit->Delete->Bookmarked Lines

I don't know of other ways to match lines that don't match. Sounds kind of contrary to the concept of Regular Expressions. ;-)
encleadus
Posts: 5
Joined: Wed Jul 25, 2007 5:02 pm

Post by encleadus »

I wanted to identify any data that didn't match the pattern as a validation step. If there was any data that didn't match this pattern, I could identify it and correct the problem.

I hadn't used bookmarks before, but that seems to work. Instead of Edit->Delete->Bookmarked Lines, I just did Edit->Copy Other->Bookmarked Lines into a new document to see if there were any lines that didn't match the pattern.

I guess I was wondering if it is possible to write something that would match anything besides the original pattern of the data. Orginal regex:
^[1]{1}[0-3]{1}[0-9]{1}[PN][0-9]{5}

So something crazy like:
^[2-9A-z]]{1}[4-9A-z]{1}[A-z]{1}[A-M|O|Q-Z][A-z]{5}

Thanks for the bookmark tip though!
Kaizyn
Posts: 6
Joined: Tue Jul 31, 2007 7:50 pm

Post by Kaizyn »

encleadus, are you looking for a pattern that behaves like this one?

^[^1][^0-3][^0-9][^PN][^0-9]{5}

(Also, if you're only matching one occurrence of something, there's no need for the {1} following that part of the pattern.)
ben_josephs
Posts: 2459
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

That matches only those lines that are wrong in every position. Encleadus wants a regex that matches lines that are wrong in any position.

This matches an entire line that is wrong in the first position or the second position or the third position...
Find what: ^([^1]|.[^0-3]|..[^0-9]|...[^PN]|....[^0-9]|.....[^0-9]|......[^0-9]|.......[^0-9]|........[^0-9]).*

[X] Match case
[X] Regular expression
This assumes you are using Posix regular expression syntax:
Configure | Preferences | Editor

[X] Use POSIX regular expression syntax
In WildEdit it's much simpler:
Find what: ^(?!1[0-3][0-9][PN][0-9]{5}).+

[X] Regular expression
[X] Match case

Options
[X] '.' does not match a newline character
encleadus
Posts: 5
Joined: Wed Jul 25, 2007 5:02 pm

Post by encleadus »

@ben_josephs: Thanks so much, that works great! I was missing using the | for alternate matching. It seems so simple now looking at it :D

Anything but 1 as the first character, or anything but 0-3 as second or anything but 0-9 as third or ... and so on.

@Kaizyn: That did work, but only when the values were all wrong in all parts of the field. Thanks for the {1} tip though, I'm still learning regular expressions.

Cheers,
Justin
Post Reply