Remove numbers, letters and commas left of a xml tag

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
djp6
Posts: 3
Joined: Thu Mar 29, 2012 8:50 am

Remove numbers, letters and commas left of a xml tag

Post by djp6 »

Hi

I'm new to reg exp and I searched for hours but am more confused now than before!
I have an xml file with thousands of lines, many containing examples like:-

Code: Select all

<b203>e-Study Guide for: Western Humanities, Complete by Roy Matthews, ISBN 9780073376622</b203>
I need to remove the unique ISBN numbers plus the word ISBN and space and comma from all of the title tags

Code: Select all

<b203></b203>
, this is what I need to remove:-

, ISBN 978??????????

leaving only, in this example:-

Code: Select all

<b203>e-Study Guide for: Western Humanities, Complete by Roy Matthews</b203>
Each ISBN is different (as is title and author) but I thought some reg ex such as:-
remove 20 char (numbers, letters, commas and whitespace) to the left of </b203>
would be the easiest way but I can’t find the reg exp for this.
Can anyone help please?

Thanks in advance
Dave
ak47wong
Posts: 703
Joined: Tue Aug 12, 2003 9:37 am
Location: Sydney, Australia

Post by ak47wong »

First, enable POSIX regular expression syntax in Configure > Preferences > Editor.

This will delete all the ISBN numbers in the document regardless of what tag they're in:

Find what: ,_ISBN_[0-9]{13} (replace the underscores with spaces)
Replace with: [nothing]

Select Regular expression and click Replace All.

If you need to restrict the deletion to <b203> tags, do this:

Find what: ,_ISBN_[0-9]{13}(</b203>) (replace the underscores with spaces)
Replace with: \1

Or, you can do it the way you described and delete the 20 characters before the end tag:

Find what: .{20}(</b203>)
Replace with: \1
djp6
Posts: 3
Joined: Thu Mar 29, 2012 8:50 am

Post by djp6 »

Many thanks ak47wong, worked perfectly, I used the second option as seemed safer and was interested in the \1 replace.
Great, thanks again.
Post Reply