RegEx replace

General questions about using TextPad

Moderators: AmigoJack, bbadmin, helios, Bob Hansen, MudGuard

Post Reply
User avatar
13Qiao
Posts: 14
Joined: Thu May 19, 2005 8:15 pm
Location: Toronto, ON

RegEx replace

Post by 13Qiao »

I have the following text.

... (More lines with similar format)
commandRef ENCBOMEditAll 2' has been deleted.
commandRef ENCBOMCopyFrom 3' has been deleted.
commandRef AEFSeparator 4' has been deleted.
... (More lines with similar format)
commandRef ENCEBOMTableEditAll 3' has been added.
commandRef ENCBOMCopyFrom 4' has been added.
commandRef AEFSeparator 5' has been added.
... (More lines with similar format)

I want to remove the "deleted-added" pair in the text. For example, I want to remove: (The number in the line is their order which I dont care)

commandRef ENCBOMCopyFrom 3' has been deleted.
commandRef ENCBOMCopyFrom 4' has been added.

How can I do that with regex replace?

Thanks,
User avatar
13Qiao
Posts: 14
Joined: Thu May 19, 2005 8:15 pm
Location: Toronto, ON

Post by 13Qiao »

I can find the "deleted" line and "added" line.

Code: Select all

commandRef \(.*\) [0-9]*' has been deleted.\n
<what should be here?>
commandRef \1 [0-9]*' has been added.\n
I real problem is, how to represent the multiple lines between these 2 lines?
User avatar
Bob Hansen
Posts: 1516
Joined: Sun Mar 02, 2003 8:15 pm
Location: Salem, NH
Contact:

Post by Bob Hansen »

Don't have a complete solution yet, but here is an overview approach that allows a manual delete process for lines with matching numbers:

1. Reformat the lines so the numbers are at the front of the line.
2. Sort the lines by the first 10 chars to combine the matching numbers.
3. Remove the duplicate lines where the first 1o chars are duplicated.
4. Reformat the lines to original format


Start with:
commandRef ENCBOMEditAll 2' has been deleted.
commandRef ENCBOMCopyFrom 3' has been deleted.
commandRef AEFSeparator 4' has been deleted.
commandRef ENCEBOMTableEditAll 3' has been added.
commandRef ENCBOMCopyFrom 4' has been added.
commandRef AEFSeparator 5' has been added.

Step 1:
Search for: ^(.*)([0-9]+' has.*\.)
Replace with: \2\1

Result:
2' has been deleted.commandRef ENCBOMEditAll
3' has been deleted.commandRef ENCBOMCopyFrom
4' has been deleted.commandRef AEFSeparator
3' has been added.commandRef ENCEBOMTableEditAll
4' has been added.commandRef ENCBOMCopyFrom
5' has been added.commandRef AEFSeparator

Step 2:
Tools/Sort/
From 1, Length 10
Ascending.

Result:
2' has been deleted.commandRef ENCBOMEditAll
3' has been added.commandRef ENCEBOMTableEditAll
3' has been deleted.commandRef ENCBOMCopyFrom
4' has been added.commandRef ENCBOMCopyFrom
4' has been deleted.commandRef AEFSeparator
5' has been added.commandRef AEFSeparator

Step3:
Can delete matching number lines manually now, but need a solution to do automatically\
Result:
2' has been deleted.commandRef ENCBOMEditAll
5' has been added.commandRef AEFSeparator

Step:
Search for: ^(.*d\.)(.*)
Replace with: \2\1
Result:
commandRef ENCBOMEditAll 2' has been deleted.
commandRef AEFSeparator 5' has been added.
Last edited by Bob Hansen on Thu Oct 23, 2008 10:39 pm, edited 1 time in total.
Hope this was helpful.............good luck,
Bob
User avatar
13Qiao
Posts: 14
Joined: Thu May 19, 2005 8:15 pm
Location: Toronto, ON

need a solution to do automatically

Post by 13Qiao »

Thanks Bob, but I need a automatic solution, not delete manually. If I would do it manually, I will just highlight those lines, sort, and delete manually.
There are more other lines in the text, like
<Beginning text>
...
====== 'menu' 'APPRouteContentActionsToolBar' ======
commandRef APPRouteContentUploadExternalFile 2' has been deleted.
menuRef APPRouteContentSummaryCreateNew 2' has been added.
====== 'menu' 'APPDiscussionDocumentSummaryActionsToolBar' ======
commandRef APPDocumentCreateNew 1' has been deleted.
menuRef APPContentSummaryCreateNew 1' has been added.
====== 'menu' 'APPFileSummaryActionsToolBar' ======
commandRef APPSeparator 3' has been deleted.
commandRef APPCommonDocumentDownloadActionLink 4' has been deleted.
commandRef APPCommonDocumentCheckOutActionLink 5' has been deleted.
commandRef APPVersionDelete 6' has been deleted.
commandRef APPFileDelete 7' has been deleted.
commandRef APPCDMTOVCConversionFilesActionConnect 3' has been added.
commandRef APPCDMTOVCConversionFilesActionCheckIn 4' has been added.
commandRef APPCDMTOVCConversionFilesActionCopy 5' has been added.
commandRef APPSeparator 6' has been added.
commandRef APPCommonDocumentDownloadActionLink 7' has been added.
commandRef APPCommonDocumentCheckOutActionLink 8' has been added.
commandRef APPVersionDelete 9' has been added.
commandRef APPFileDelete 10' has been added.
...
<Ending text>
I am looking for a way to represent multiple lines between 2 regex, as I mentioned in my second note.
Thanks,
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

This sort of thing has been discussed many times before. There is no way to automate it fully in TextPad.

Unfortunately, TextPad's aged regex engine is incapable of matching text containing an arbitrary number of newlines. Also, back-references such as \1 in TextPad's regexes can't refer back over a newline. That is, you can't have a newline between a captured subexpression and a reference back to it.

I don't believe that reordering the lines as Bob suggests will work in this case. I understand from "The number in the line is their order which I dont care" that it's the numbers you don't care about; I presume that you want the lines left in their original order. You could insert line numbers at the beginnings of the lines, sort the lines by the text after commandRef, do something very tricky (I'm not sure what, but it would involve removing and restoring newlines), sort the lines by the line numbers you added earlier, and remove those line numbers. But this would be absurdly complicated.

Alternatively, you could try WildEdit (http://www.textpad.com/products/wildedit/), which uses a far more powerful regex engine.

I would be far easier to do this with a script. TextPad doesn't support scripts, so you would have to use a suitable scripting language, such as Perl, Python, Ruby or Tcl.
User avatar
13Qiao
Posts: 14
Joined: Thu May 19, 2005 8:15 pm
Location: Toronto, ON

I got it worked by myself

Post by 13Qiao »

Hi Ben,

Your notes reminded me to put all commandRef/menuRef in one single line, and I got it worked.
First, remove \n between commandRef/menuRef.

Code: Select all

replace:
\([commandmenu]*\)Ref\(.*\)\n\([commandmenu]*\)Ref
with:
\1Ref\2\3Ref
Then, remove "deleted-added" pair.

Code: Select all

replace:
\([commandmenu]*\)Ref \([[:alnum:]_]*\) [0-9]*' has been deleted.\(.*\)\1Ref \2 [0-9]*' has been added.
with
\3
Last, add \n back.

Code: Select all

replace:
\([commandmenu]*\)Ref\(.*\)\.\([commandmenu]*\)Ref
with
\1Ref\2\.\n\3Ref
Thanks for the help.
ben_josephs
Posts: 2461
Joined: Sun Mar 02, 2003 9:22 pm

Post by ben_josephs »

Ah, I see. I slighthly misunderstood your problem.

You will help yourself by using Posix syntax:
Configure | Preferences | Editor

[X] Use POSIX regular expression syntax
Then, for example, your regex
\([commandmenu]*\)Ref\(.*\)\n\([commandmenu]*\)Ref
becomes
([commandmenu]*)Ref(.*)\n([commandmenu]*)Ref
which is somewhat clearer.

But
[commandmenu]*
is incorrect. It means: any number (possibly zero) of the characters c, o, m, m, a, n, d, m, e, n or u (that is, the characters c, o, m, a, n, d, e or u). I think you mean
command|menu
which means either command or menu.
So the whole becomes
(command|menu)Ref(.*)\n(command|menu)Ref
User avatar
13Qiao
Posts: 14
Joined: Thu May 19, 2005 8:15 pm
Location: Toronto, ON

Post by 13Qiao »

I know exactly what [commandmenu]* means, I just didn't know I can use command|menu. Silly me. Thank for helping me on this.
BTW, I always use the default syntax, but it seems POSIX syntax is simpler than default syntax. I will try POSIX from now on.
Post Reply