I would like to search for lines in a textfile that looks like this...
word1"NN<STR1><STR2><STR3>"transword1$en"topic*transword2$et
... but the strings (STR#) in the angle brackets can not be DEFAULT, COMB or NO.
So a follower question is; how do I exclude a string from a regular expression?
Thx,
/Tommy
How do I exclude a string from an RE search?
Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard
-
Ed
Re: How do I exclude a string from an RE search?
1. It depends on what you want to do with the lines when you find them but the regex
\(<DEFAULT>\|<COMB>\|<NO>\)
will find lines where any one or more of these words exist, you can then "mark all" and from the edit->"cut other"->"marked lines" menu you can remove them or search->"invert all bookmarks" will set you up to remove the other lines.
This may or may not be useful for your purposes. But if you wanted to do some changes that only affected the selected lines you could start by
prefixing each line with an incrementing number by replacing ^ with \i(10000)
cutting the lines using the above procedure and pasting to a new file
edit the lines to do what you want
paste them back
sort the lines on the first 5 characters (numeric)
remove the index numbers by replacing ^[0-9]\{5\} with nothing
10000 and 5 above assume files up to 90000 lines
2. "how do I exclude a string from a regular expression?" - I don't know.
\(<DEFAULT>\|<COMB>\|<NO>\)
will find lines where any one or more of these words exist, you can then "mark all" and from the edit->"cut other"->"marked lines" menu you can remove them or search->"invert all bookmarks" will set you up to remove the other lines.
This may or may not be useful for your purposes. But if you wanted to do some changes that only affected the selected lines you could start by
prefixing each line with an incrementing number by replacing ^ with \i(10000)
cutting the lines using the above procedure and pasting to a new file
edit the lines to do what you want
paste them back
sort the lines on the first 5 characters (numeric)
remove the index numbers by replacing ^[0-9]\{5\} with nothing
10000 and 5 above assume files up to 90000 lines
2. "how do I exclude a string from a regular expression?" - I don't know.
-
Tommy Svensson
Re: How do I exclude a string from an RE search?
Hi,
Sorry, actually str# can be COMB, NO or DEFAULT, as long as there is a str# that is not COMB, NO, DEFAULT. See what I mean?
This is not OK:
word"NN<DEFAULT><NO>"word1"word2"word3
These are OK:
word"NN<MAMMAL>"word1"word2"word3
word"NN<DEFAULT><VEHICLES>"word1"word2"word3
Now, the problem is that str#s that are not DEFAULT, COMB or NO have more than thousand combinations why I can't match them all out in the reg exp... I need to find all lines in my text files where only okey entries are found.
What I would like to do is an RE that goes something like this (pseudo):
Find all lines with one or more occurences of "<str>" where str = DEFAULT | COMB | NO iff there is a str in <str> that is NOT equal to DEFAULT | COMB | NO.
Thx,
/Tommy
Sorry, actually str# can be COMB, NO or DEFAULT, as long as there is a str# that is not COMB, NO, DEFAULT. See what I mean?
This is not OK:
word"NN<DEFAULT><NO>"word1"word2"word3
These are OK:
word"NN<MAMMAL>"word1"word2"word3
word"NN<DEFAULT><VEHICLES>"word1"word2"word3
Now, the problem is that str#s that are not DEFAULT, COMB or NO have more than thousand combinations why I can't match them all out in the reg exp... I need to find all lines in my text files where only okey entries are found.
What I would like to do is an RE that goes something like this (pseudo):
Find all lines with one or more occurences of "<str>" where str = DEFAULT | COMB | NO iff there is a str in <str> that is NOT equal to DEFAULT | COMB | NO.
Thx,
/Tommy
-
Ed
Re: How do I exclude a string from an RE search?
Convert <DEFAULT> to an unused character (say ¬), do the same to <COMB> and <NO> (different characters of course)
Then a search for
<[^>]+>
and "Mark all" will mark all lines that are OK
then convert ¬ back to <DEFAULT> etc
Then a search for
<[^>]+>
and "Mark all" will mark all lines that are OK
then convert ¬ back to <DEFAULT> etc