Page 1 of 1

Find x in log file, find url, eliminate all lines with that

Posted: Tue Apr 16, 2019 6:56 pm
by Mike Olds
Greetings,

I am trying to clean up my log files for analysis to eliminate crackers which slip by the various ways I can eliminate listings (robots, etc) and would like to create a regex that says:

Find any line containing x, identify the url from that line, and eliminate all other lines with that url.

This is orders of magnitude beyond my knowledge of regex and would appreciate any help offered.

Thanks in advance!

Posted: Wed Apr 17, 2019 5:19 am
by MudGuard
in my opinion, this can not be done using one regex.

I'd probably use a perl script with roughly this algorithm:

Code: Select all

open a copy of the file for writing
open the log file for reading
while not at end of log file
  read a line
  if it contains x
    find the url in this line and remember it
  if it does not contain the url
    write the line
delete log file
rename copy to original name
Or, if you want to do it without script:

Code: Select all

load the file into textpad, 
search for x, 
select the url in the first found line 
search for the using mark all
then delete bookmarked lines

Posted: Wed Apr 17, 2019 12:18 pm
by Mike Olds
EDIT 2: A simpler way to do this is to do a sort first on the IP# (first col) and then do the search, then just delete all the other lines (now easily identifiable) from the same IP#.

EDIT: Thank you again MudGuard. I am successfully using the two=stroke regex routine.

Best,
mo


Thanks Mudguard, scripts are out for me, but I will give your regex suggestions a try and report back.