Page 1 of 1

Multiple regular expression

Posted: Thu May 15, 2003 8:38 pm
by George W. Bush
Hello,

I have been using TextPad for a few days now. I have been trying to use regular expressions to find certain instances of words.

I do have the following example to easily illustrate what I am trying to do.

Assume the following log file:

-------------------------------------------------------

http://www.xml.cnn.com/test.html
http://www.run.cnn.com/space.htm
http://www.cnn.com/its/page1.html
http://www.cnn.com/its/page2.shtml
http://www.cnn.com/its/page3.htm
http://www.antiwar.jp/its/page_2.htm
http://www.its.cnn.com/test.shtml
http://www.its.cnn.com/cubic.htm
http://www.cnn.com/~wwwccs/test.shtml
http://www.cnn.com/~wwwother/test.shtml
http://www.cnn.com/super/test.html
http://www.cnn.com/xml/space.htm

-------------------------------------------------------

I am trying to do a search on only the lines which are bolded. What these bolded lines have in common is:

- they have the www.its or .com/its or .com/~wwwcss

so that only the following will be displayed in a new search window:

http://www.cnn.com/its/page1.html
http://www.cnn.com/its/page2.shtml
http://www.cnn.com/its/page3.htm
http://www.its.cnn.com/test.shtml
http://www.its.cnn.com/cubic.htm
http://www.cnn.com/~wwwccs/test.shtml

I can perform a regular expression on one item at a time, but not on all three at once. I do have a log file from which I would like to extract information that has that kind of structure.

I would appreciate any assistance. Thank you.

Posted: Thu May 15, 2003 9:19 pm
by George W. Bush
Ok, so I finally found what was going on. I needed to escape the or (|) pipe-line. Here is the details as to what you need to do if you need a search as I initially wanted to do.

Hope this will be of help to someone:

assume:

http://www.cnn.com/its/page1.html
http://www.cnn.com/its/page2.shtml
http://www.cnn.com/its/page3.htm
http://www.antiwar.jp/its/page_2.htm
http://www.its.cnn.com/test.shtml
http://www.its.cnn.com/cubic.htm
http://www.its.cnn.uk/test.htm
http://www.cnn.com/~wwwccs/test.shtml
http://www.othersite.com/test.htm
http://www.another.com/test.htm
http://www.unusualsite.com/test.html
http://www.whatisthis.com/space.htm

----------------------------------------------------

to find all instances of:

.its.cnn.com
.com/its
.com/~wwwcs

use the following regular expression:

\(\.its\.cnn\.com\)\|\(\.com/its\)\|\(\.com/~wwwcs\)

----------------------------------------------------

the results will be:

Searching for: \(\.its\.cnn\.com\)\|\(\.com/its\)\|\(\.com/~wwwccs\)
test.txt(1): http://www.cnn.com/its/page1.html
test.txt(2): http://www.cnn.com/its/page2.shtml
test.txt(3): http://www.cnn.com/its/page3.htm
test.txt(5): http://www.its.cnn.com/test.shtml
test.txt(6): http://www.its.cnn.com/cubic.htm
test.txt(8): http://www.cnn.com/~wwwccs/test.shtml
Found 6 occurrence(s) in 1 file(s)