RegEx replace

13Qiao · Post by **13Qiao** » Thu Oct 23, 2008 2:25 pm

I have the following text.

... (More lines with similar format)
commandRef ENCBOMEditAll 2' has been deleted.
commandRef ENCBOMCopyFrom 3' has been deleted.
commandRef AEFSeparator 4' has been deleted.
... (More lines with similar format)
commandRef ENCEBOMTableEditAll 3' has been added.
commandRef ENCBOMCopyFrom 4' has been added.
commandRef AEFSeparator 5' has been added.
... (More lines with similar format)

I want to remove the "deleted-added" pair in the text. For example, I want to remove: (The number in the line is their order which I dont care)

commandRef ENCBOMCopyFrom 3' has been deleted.
commandRef ENCBOMCopyFrom 4' has been added.

How can I do that with regex replace?

Thanks,

13Qiao · Post by **13Qiao** » Thu Oct 23, 2008 2:52 pm

I can find the "deleted" line and "added" line.

Code: Select all

commandRef \(.*\) [0-9]*' has been deleted.\n
<what should be here?>
commandRef \1 [0-9]*' has been added.\n

I real problem is, how to represent the multiple lines between these 2 lines?

Bob Hansen · Post by **Bob Hansen** » Thu Oct 23, 2008 3:31 pm

Don't have a complete solution yet, but here is an overview approach that allows a manual delete process for lines with matching numbers:

1. Reformat the lines so the numbers are at the front of the line.
2. Sort the lines by the first 10 chars to combine the matching numbers.
3. Remove the duplicate lines where the first 1o chars are duplicated.
4. Reformat the lines to original format

Start with:
commandRef ENCBOMEditAll 2' has been deleted.
commandRef ENCBOMCopyFrom 3' has been deleted.
commandRef AEFSeparator 4' has been deleted.
commandRef ENCEBOMTableEditAll 3' has been added.
commandRef ENCBOMCopyFrom 4' has been added.
commandRef AEFSeparator 5' has been added.

Step 1:
Search for: ^(.*)([0-9]+' has.*\.)
Replace with: \2\1
Result:
2' has been deleted.commandRef ENCBOMEditAll
3' has been deleted.commandRef ENCBOMCopyFrom
4' has been deleted.commandRef AEFSeparator
3' has been added.commandRef ENCEBOMTableEditAll
4' has been added.commandRef ENCBOMCopyFrom
5' has been added.commandRef AEFSeparator

Step 2:
Tools/Sort/
From 1, Length 10
Ascending.
Result:
2' has been deleted.commandRef ENCBOMEditAll
3' has been added.commandRef ENCEBOMTableEditAll
3' has been deleted.commandRef ENCBOMCopyFrom
4' has been added.commandRef ENCBOMCopyFrom
4' has been deleted.commandRef AEFSeparator
5' has been added.commandRef AEFSeparator

Step3:
Can delete matching number lines manually now, but need a solution to do automatically\
Result:
2' has been deleted.commandRef ENCBOMEditAll
5' has been added.commandRef AEFSeparator

Step:
Search for: ^(.*d\.)(.*)
Replace with: \2\1
Result:
commandRef ENCBOMEditAll 2' has been deleted.
commandRef AEFSeparator 5' has been added.

13Qiao · Post by **13Qiao** » Thu Oct 23, 2008 4:05 pm

Thanks Bob, but I need a automatic solution, not delete manually. If I would do it manually, I will just highlight those lines, sort, and delete manually.
There are more other lines in the text, like

<Beginning text>
...
====== 'menu' 'APPRouteContentActionsToolBar' ======
commandRef APPRouteContentUploadExternalFile 2' has been deleted.
menuRef APPRouteContentSummaryCreateNew 2' has been added.
====== 'menu' 'APPDiscussionDocumentSummaryActionsToolBar' ======
commandRef APPDocumentCreateNew 1' has been deleted.
menuRef APPContentSummaryCreateNew 1' has been added.
====== 'menu' 'APPFileSummaryActionsToolBar' ======
commandRef APPSeparator 3' has been deleted.
commandRef APPCommonDocumentDownloadActionLink 4' has been deleted.
commandRef APPCommonDocumentCheckOutActionLink 5' has been deleted.
commandRef APPVersionDelete 6' has been deleted.
commandRef APPFileDelete 7' has been deleted.
commandRef APPCDMTOVCConversionFilesActionConnect 3' has been added.
commandRef APPCDMTOVCConversionFilesActionCheckIn 4' has been added.
commandRef APPCDMTOVCConversionFilesActionCopy 5' has been added.
commandRef APPSeparator 6' has been added.
commandRef APPCommonDocumentDownloadActionLink 7' has been added.
commandRef APPCommonDocumentCheckOutActionLink 8' has been added.
commandRef APPVersionDelete 9' has been added.
commandRef APPFileDelete 10' has been added.
...
<Ending text>

I am looking for a way to represent multiple lines between 2 regex, as I mentioned in my second note.
Thanks,

ben_josephs · Post by **ben_josephs** » Fri Oct 24, 2008 9:52 am

This sort of thing has been discussed many times before. There is no way to automate it fully in TextPad.

Unfortunately, TextPad's aged regex engine is incapable of matching text containing an arbitrary number of newlines. Also, back-references such as \1 in TextPad's regexes can't refer back over a newline. That is, you can't have a newline between a captured subexpression and a reference back to it.

I don't believe that reordering the lines as Bob suggests will work in this case. I understand from "The number in the line is their order which I dont care" that it's the numbers you don't care about; I presume that you want the lines left in their original order. You could insert line numbers at the beginnings of the lines, sort the lines by the text after commandRef, do something very tricky (I'm not sure what, but it would involve removing and restoring newlines), sort the lines by the line numbers you added earlier, and remove those line numbers. But this would be absurdly complicated.

Alternatively, you could try WildEdit (http://www.textpad.com/products/wildedit/), which uses a far more powerful regex engine.

I would be far easier to do this with a script. TextPad doesn't support scripts, so you would have to use a suitable scripting language, such as Perl, Python, Ruby or Tcl.

13Qiao · Post by **13Qiao** » Fri Oct 24, 2008 2:19 pm

Hi Ben,

Your notes reminded me to put all commandRef/menuRef in one single line, and I got it worked.
First, remove \n between commandRef/menuRef.

Code: Select all

replace:
\([commandmenu]*\)Ref\(.*\)\n\([commandmenu]*\)Ref
with:
\1Ref\2\3Ref

Then, remove "deleted-added" pair.

Code: Select all

replace:
\([commandmenu]*\)Ref \([[:alnum:]_]*\) [0-9]*' has been deleted.\(.*\)\1Ref \2 [0-9]*' has been added.
with
\3

Last, add \n back.

Code: Select all

replace:
\([commandmenu]*\)Ref\(.*\)\.\([commandmenu]*\)Ref
with
\1Ref\2\.\n\3Ref

Thanks for the help.

ben_josephs · Post by **ben_josephs** » Fri Oct 24, 2008 3:02 pm

Ah, I see. I slighthly misunderstood your problem.

You will help yourself by using Posix syntax:

Configure | Preferences | Editor

[X] Use POSIX regular expression syntax

Then, for example, your regex
\([commandmenu]*\)Ref\(.*\)\n\([commandmenu]*\)Ref
becomes
([commandmenu]*)Ref(.*)\n([commandmenu]*)Ref
which is somewhat clearer.

But
[commandmenu]*
is incorrect. It means: any number (possibly zero) of the characters c, o, m, m, a, n, d, m, e, n or u (that is, the characters c, o, m, a, n, d, e or u). I think you mean
command|menu
which means either command or menu.
So the whole becomes
(command|menu)Ref(.*)\n(command|menu)Ref

13Qiao · Post by **13Qiao** » Fri Oct 24, 2008 3:57 pm

I know exactly what [commandmenu]* means, I just didn't know I can use command|menu. Silly me. Thank for helping me on this.
BTW, I always use the default syntax, but it seems POSIX syntax is simpler than default syntax. I will try POSIX from now on.

Community

RegEx replace

RegEx replace

need a solution to do automatically

I got it worked by myself